Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Implement read_csv for omniscidb backend #2062

Merged
merged 32 commits into from
Mar 14, 2020

Conversation

anmyachev
Copy link
Contributor

continuation of #1977

The advantage of this approach is the avoidance of additional memory costs.

Problem:

  • for test purpose I should have dataset inside omniscidb container. What the best way for this?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to add a test on CI for this?

ibis/omniscidb/client.py Outdated Show resolved Hide resolved
ibis/omniscidb/client.py Outdated Show resolved Hide resolved
@anmyachev
Copy link
Contributor Author

is there a way to add a test on CI for this?

@jreback I can create new omnisci container with needed test datasets inside. After that, it remains only to write a test

@xmnlab
Copy link
Contributor

xmnlab commented Feb 3, 2020

@anmyachev thanks for work on that!

maybe you can add a test that creates a dataframe, use that to create a csv file (to_csv), load this csv using the backend read_csv and check if the result is the same.

@xmnlab xmnlab changed the title [FEAT] read_csv for omniscidb backend FEAT: Implement read_csv for omniscidb backend Feb 5, 2020
@xmnlab
Copy link
Contributor

xmnlab commented Feb 5, 2020

hey @anmyachev it seems there are some errors on CI related to black format. to avoid that you can install git pre-commit hooks using make develop

@anmyachev
Copy link
Contributor Author

clickhouse-sqlalchemy 0.1.2 is not compatible with clickhouse-driver 0.1.1

@anmyachev
Copy link
Contributor Author

@jreback @xmnlab PR is ready for review

Copy link
Contributor

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmyachev in general this PR looks very good.

I liked your approach to create the csv file inside the OmniSciDB container!

I just have some questions/comments :

  • related to the clickhouse-driver, why was it pinned to >=0.1.3? doesn't work >=0.1.2?
  • maybe you could change from quoted to quotechar .. so it could be more generic.
  • not sure, maybe @jreback could give us a direction, but maybe you can also add read_csv to ir.TableExpr just raising NotImplementedError, so another backend could implement the same method with the same parameters. Also you move your tests from omniscidb/tests to tests/all ... so it would be easier when another backend implements this function. @jreback what is your thoughts about this?

@jreback
Copy link
Contributor

jreback commented Feb 24, 2020

IIRC this is just efficiently puts a file into the db from a csv, so a data-loading operation? I know postgresql supports this as well, not sure if its implemented.

This is an alternative way of constructing a table (e.g. we normally introspect an existing one), or use the Client to create ones.

Have to think about if we actually should add any api for this. For now I would not, but certainly raise an issue.

ibis/omniscidb/client.py Outdated Show resolved Hide resolved
ibis/omniscidb/client.py Outdated Show resolved Hide resolved
@anmyachev
Copy link
Contributor Author

@xmnlab PR is ready for review. While there is no exact decision whether all backends need to have this function, I suggest leaving it only for Omnisci and adding it already in release 1.3. This way we can get feedback from users.

@xmnlab
Copy link
Contributor

xmnlab commented Feb 28, 2020

@jreback any feedback about this ?

@anmyachev
Copy link
Contributor Author

Green CI depends on #2104

@xmnlab
Copy link
Contributor

xmnlab commented Mar 2, 2020

@anmyachev could you check CI for py36 y py37
it seems tests are failing just for read_csv:

FAILED ibis/omniscidb/tests/test_client.py::test_read_csv[/omnisci/test_read_csv.csv]
FAILED ibis/omniscidb/tests/test_client.py::test_read_csv[filename1] - Except...

py38 is green because currently it skips tests for omniscidb

@anmyachev
Copy link
Contributor Author

@anmyachev could you check CI for py36 y py37
it seems tests are failing just for read_csv:

FAILED ibis/omniscidb/tests/test_client.py::test_read_csv[/omnisci/test_read_csv.csv]
FAILED ibis/omniscidb/tests/test_client.py::test_read_csv[filename1] - Except...

py38 is green because currently it skips tests for omniscidb

These two tests were fixed in last commit.

@anmyachev
Copy link
Contributor Author

@jreback PR is ready for review.

ibis/omniscidb/client.py Show resolved Hide resolved
ibis/omniscidb/ddl.py Outdated Show resolved Hide resolved
ibis/omniscidb/ddl.py Outdated Show resolved Hide resolved
('month_', 'int32'),
]
)
con.create_table(t_name, schema=schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a fixture to handle resource creation & destruction rather than using a try/finally (you can do that in the fixture itself)

@jreback jreback added this to the Next Feature Release milestone Mar 14, 2020
@jreback jreback merged commit cc0ebb6 into ibis-project:master Mar 14, 2020
@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

thanks @anmyachev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants