Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: [OmniSciDB] Add method parameter to load_data #2165

Conversation

xmnlab
Copy link
Contributor

@xmnlab xmnlab commented Apr 1, 2020

Resolve #2164

@xmnlab xmnlab requested a review from jreback April 1, 2020 19:49
@xmnlab
Copy link
Contributor Author

xmnlab commented Apr 1, 2020

this PR is done for review. thanks!

@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from 60d7344 to b71be44 Compare April 5, 2020 01:41
ibis/omniscidb/tests/test_client.py Outdated Show resolved Hide resolved
@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from b71be44 to 35f2a7e Compare April 9, 2020 16:40
@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch 2 times, most recently from 0ae2554 to 6e52e16 Compare May 28, 2020 15:30
@xmnlab
Copy link
Contributor Author

xmnlab commented May 28, 2020

this PR is ready again for a new review.

@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 18, 2020

a friendly reminder about this PR :)

@xmnlab
Copy link
Contributor Author

xmnlab commented Jul 1, 2020

a friendly reminder about this PR :) thanks

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @xmnlab

@datapythonista
Copy link
Contributor

Release is conflicting.

@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from 6e52e16 to ba54417 Compare July 15, 2020 19:49
@xmnlab
Copy link
Contributor Author

xmnlab commented Jul 15, 2020

thanks @datapythonista ! rebased!

@xmnlab
Copy link
Contributor Author

xmnlab commented Jul 26, 2020

branch rebased. thanks.

@datapythonista
Copy link
Contributor

@xmnlab can you address this comment please? #2165 (comment)

@xmnlab
Copy link
Contributor Author

xmnlab commented Jul 27, 2020

@datapythonista sure. sorry I forgot that one. I am working on that. thanks

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xmnlab

@jreback I think this should be ready now.

@xmnlab xmnlab requested a review from jreback August 13, 2020 15:46
@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from aace172 to 0753dd8 Compare August 13, 2020 15:47
setup.py Outdated Show resolved Hide resolved
@kcpevey
Copy link

kcpevey commented Sep 3, 2020

@jreback this is ready for another review. Thanks!

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xmnlab for this. Looks good. You've got some conflicts, and I added few suggestions that I think will make the code better.

obj: Union[pd.DataFrame, pyarrow.Table],
database: Optional[str] = None,
method: str = 'rows',
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is removing **kwargs intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the initial idea for kwarg here was to allow a similar signature for all load_data functions across all the backends, and as it seems it is not necessary and it is not used inside the function, I am just removing that for now.

if not isinstance(data, pd.DataFrame):
data = data.to_pandas() # pyarrow.Table

pd.testing.assert_frame_equal(data, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pd.testing.assert_frame_equal(data, result)
pd.testing.assert_frame_equal(result, data)

I think it's clearer and more standard to use result == expected than the other way round.

result = con.table(temp_table).execute()

if not isinstance(data, pd.DataFrame):
data = data.to_pandas() # pyarrow.Table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, if you always want the pandas, we not simply use assert_frame_equal(result, df_salary) below instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we are using the same data for both, it works. thanks

)
def test_load_data(con, temp_table, method, data):
con.create_table(temp_table, schema=sch_salary)
con.load_data(temp_table, data, method=method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you define df_salary, arrow_salary... inside the function, and you use the strings pandas and arrow for the data parameter (that can be renamed to format). Then you can do:

Suggested change
con.load_data(temp_table, data, method=method)
con.load_data(temp_table, df_salary if format == 'pandas' else arrow_salary, method=method)

This will save memory in the testing job and will keep things more compact.

@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from 538afc4 to 9f6b33d Compare October 15, 2020 15:21
@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 19, 2020

@datapythonista I applied the suggestion and CI is green. let me know if I missed anything here pls. Thanks!

'infer',
'arrow',
'arrow',
'arrow',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to make a lot of sense, not sure if I'm missing something. method,format means that there should be two elements, so I'm wondering if pytest is iterating over the strings to get two values, and method is r and format is o for rows.

Also, if doesn't make sense that we have pandas, arrow... repeated several times, it will run the same exact test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep ... it doesn't make any sense ... sorry ... no idea why it is in this format :X ... I will pay more attention next time ... also .. No idea neither why the tests passed :X

but the way, the "matrix" of possibilities should be the same of the initial commit:

       ('rows', df_salary),
        ('columnar', df_salary),
        ('infer', df_salary),
        ('infer', arrow_salary),
        ('arrow', arrow_salary)

so it should be something like

       ('rows', 'pandas'),
        ('columnar', 'pandas'),
        ('infer', 'pandas'),
        ('infer', 'arrow'),
        ('arrow', 'arrow')

I will fix this issue .. and sorry again for the noise ... I really don't know what happened

setup.cfg Outdated
@@ -16,7 +16,7 @@ inherit = false
convention = numpy

[isort]
known_third_party = asv,click,clickhouse_driver,dateutil,google,graphviz,impala,kudu,mock,multipledispatch,numpy,pandas,pkg_resources,plumbum,psycopg2,pyarrow,pydata_google_auth,pygit2,pymapd,pymysql,pyspark,pytest,pytz,regex,requests,setuptools,sphinx_rtd_theme,sqlalchemy,thrift,toolz
known_third_party = click,clickhouse_driver,dateutil,google,graphviz,impala,kudu,mock,multipledispatch,numpy,pandas,pkg_resources,plumbum,psycopg2,pyarrow,pydata_google_auth,pygit2,pymapd,pymysql,pyspark,pytest,pytz,regex,requests,setuptools,sphinx_rtd_theme,sqlalchemy,thrift,toolz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we fixed this, and isort/the precommit hook should start changing it. Can you revert please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ... I will rebase the code again and try :) thanks!

@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from 9f6b33d to 856dbe5 Compare October 19, 2020 16:25
Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @xmnlab

@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 21, 2020

@datapythonista , is there anything else I should address here? or can it be already merged? thanks

@datapythonista
Copy link
Contributor

@datapythonista , is there anything else I should address here? or can it be already merged? thanks

lgtm as I mentioned in the past comment. I'll let @jreback have a look before merging.

@datapythonista
Copy link
Contributor

Sorry @xmnlab this conflicted once more. Can you merge master and fix the conflicts please? I'll merge as soon as this is green.

@jreback I think your review comments are all addressed. But if you want to have a look...

@xmnlab xmnlab force-pushed the omniscidb-add-method-parameter-to-load-data branch from 856dbe5 to 583632f Compare October 26, 2020 17:52
@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 26, 2020

thanks @datapythonista ! branch rebased!

@datapythonista datapythonista merged commit e10e386 into ibis-project:master Oct 26, 2020
@datapythonista
Copy link
Contributor

Thanks @xmnlab

@xmnlab xmnlab deleted the omniscidb-add-method-parameter-to-load-data branch October 27, 2020 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEAT: OmniScidb - Add "method" parameter to OmniSciDB load_table
4 participants