-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: [OmniSciDB] Add method parameter to load_data #2165
FEAT: [OmniSciDB] Add method parameter to load_data #2165
Conversation
|
this PR is done for review. thanks! |
60d7344
to
b71be44
Compare
b71be44
to
35f2a7e
Compare
0ae2554
to
6e52e16
Compare
|
this PR is ready again for a new review. |
|
a friendly reminder about this PR :) |
|
a friendly reminder about this PR :) thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @xmnlab
|
Release is conflicting. |
6e52e16
to
ba54417
Compare
|
thanks @datapythonista ! rebased! |
ba54417
to
f34c65a
Compare
|
branch rebased. thanks. |
|
@xmnlab can you address this comment please? #2165 (comment) |
|
@datapythonista sure. sorry I forgot that one. I am working on that. thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aace172
to
0753dd8
Compare
|
@jreback this is ready for another review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xmnlab for this. Looks good. You've got some conflicts, and I added few suggestions that I think will make the code better.
ibis/omniscidb/client.py
Outdated
| obj: Union[pd.DataFrame, pyarrow.Table], | ||
| database: Optional[str] = None, | ||
| method: str = 'rows', | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is removing **kwargs intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the initial idea for kwarg here was to allow a similar signature for all load_data functions across all the backends, and as it seems it is not necessary and it is not used inside the function, I am just removing that for now.
ibis/omniscidb/tests/test_client.py
Outdated
| if not isinstance(data, pd.DataFrame): | ||
| data = data.to_pandas() # pyarrow.Table | ||
|
|
||
| pd.testing.assert_frame_equal(data, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pd.testing.assert_frame_equal(data, result) | |
| pd.testing.assert_frame_equal(result, data) |
I think it's clearer and more standard to use result == expected than the other way round.
ibis/omniscidb/tests/test_client.py
Outdated
| result = con.table(temp_table).execute() | ||
|
|
||
| if not isinstance(data, pd.DataFrame): | ||
| data = data.to_pandas() # pyarrow.Table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this, if you always want the pandas, we not simply use assert_frame_equal(result, df_salary) below instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, as we are using the same data for both, it works. thanks
ibis/omniscidb/tests/test_client.py
Outdated
| ) | ||
| def test_load_data(con, temp_table, method, data): | ||
| con.create_table(temp_table, schema=sch_salary) | ||
| con.load_data(temp_table, data, method=method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if you define df_salary, arrow_salary... inside the function, and you use the strings pandas and arrow for the data parameter (that can be renamed to format). Then you can do:
| con.load_data(temp_table, data, method=method) | |
| con.load_data(temp_table, df_salary if format == 'pandas' else arrow_salary, method=method) |
This will save memory in the testing job and will keep things more compact.
538afc4
to
9f6b33d
Compare
|
@datapythonista I applied the suggestion and CI is green. let me know if I missed anything here pls. Thanks! |
| 'infer', | ||
| 'arrow', | ||
| 'arrow', | ||
| 'arrow', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to make a lot of sense, not sure if I'm missing something. method,format means that there should be two elements, so I'm wondering if pytest is iterating over the strings to get two values, and method is r and format is o for rows.
Also, if doesn't make sense that we have pandas, arrow... repeated several times, it will run the same exact test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep ... it doesn't make any sense ... sorry ... no idea why it is in this format :X ... I will pay more attention next time ... also .. No idea neither why the tests passed :X
but the way, the "matrix" of possibilities should be the same of the initial commit:
('rows', df_salary),
('columnar', df_salary),
('infer', df_salary),
('infer', arrow_salary),
('arrow', arrow_salary)
so it should be something like
('rows', 'pandas'),
('columnar', 'pandas'),
('infer', 'pandas'),
('infer', 'arrow'),
('arrow', 'arrow')
I will fix this issue .. and sorry again for the noise ... I really don't know what happened
setup.cfg
Outdated
| @@ -16,7 +16,7 @@ inherit = false | |||
| convention = numpy | |||
|
|
|||
| [isort] | |||
| known_third_party = asv,click,clickhouse_driver,dateutil,google,graphviz,impala,kudu,mock,multipledispatch,numpy,pandas,pkg_resources,plumbum,psycopg2,pyarrow,pydata_google_auth,pygit2,pymapd,pymysql,pyspark,pytest,pytz,regex,requests,setuptools,sphinx_rtd_theme,sqlalchemy,thrift,toolz | |||
| known_third_party = click,clickhouse_driver,dateutil,google,graphviz,impala,kudu,mock,multipledispatch,numpy,pandas,pkg_resources,plumbum,psycopg2,pyarrow,pydata_google_auth,pygit2,pymapd,pymysql,pyspark,pytest,pytz,regex,requests,setuptools,sphinx_rtd_theme,sqlalchemy,thrift,toolz | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we fixed this, and isort/the precommit hook should start changing it. Can you revert please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok ... I will rebase the code again and try :) thanks!
9f6b33d
to
856dbe5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @xmnlab
|
@datapythonista , is there anything else I should address here? or can it be already merged? thanks |
lgtm as I mentioned in the past comment. I'll let @jreback have a look before merging. |
856dbe5
to
583632f
Compare
|
thanks @datapythonista ! branch rebased! |
|
Thanks @xmnlab |
Resolve #2164