-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SQLTransforms to compute SQL Source schemas on database #246
Conversation
b41e673
to
257c657
Compare
I actually don't think this will currently work if there's multiple enums since I'm not sure what DISTINCT will do there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple clarification Q's, otherwise, All good from me with mind to the points that @TvieiraB brought up. Appreciate the quick turn on this one.
Codecov Report
@@ Coverage Diff @@
## master #246 +/- ##
==========================================
+ Coverage 63.90% 64.11% +0.20%
==========================================
Files 56 56
Lines 5054 5113 +59
==========================================
+ Hits 3230 3278 +48
- Misses 1824 1835 +11
Continue to review full report at Codecov.
|
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Currently all sources load the data into memory to compute the schema. For many sources this is okay because it can at least leverage dask to lazily load and compute the unique categories or range of a numerical or date value. Unfortunately for most SQL sources this can be extremely inefficient. So for SQL sources we leverage the
SQLTransform
s recently added to Lumen to let the database itself compute the ranges and unique categories for each column.The approach can be described as follows:
LIMIT 1
MIN/MAX
andDISTINCT
SQL statements respectively