Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: rework registration / file loading #5005

Merged
merged 1 commit into from
Jan 6, 2023

Conversation

gforsyth
Copy link
Member

@gforsyth gforsyth commented Dec 12, 2022

Introduces explicit read_ methods for various types of file /
in-memory object loading.

Removes the RegexDispatcher from the DuckDB, polars, and datafusion backends

Should be backwards compatible with existing register functionality,
plus users can now pass in iterables of CSV or parquet files (duckdb only), and also
make use of the explicit read_ methods if they choose (say if filenames
are non-standard)

Somehow, Datafusion can't handle a count so I've added a gross
conditional to the tests for the moment.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 12, 2022

Test Results

       43 files         43 suites   1h 48m 27s ⏱️
13 285 tests 10 018 ✔️   3 267 💤 0
45 994 runs  34 314 ✔️ 11 680 💤 0

Results for commit 2ef837d.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Dec 12, 2022

Codecov Report

Merging #5005 (a3b2256) into master (4faef5a) will decrease coverage by 12.14%.
The diff coverage is 69.04%.

❗ Current head a3b2256 differs from pull request most recent head d1a025d. Consider uploading reports for the commit d1a025d to get more accurate results

@@             Coverage Diff             @@
##           master    #5005       +/-   ##
===========================================
- Coverage   95.08%   82.94%   -12.15%     
===========================================
  Files         401      210      -191     
  Lines       44763    23259    -21504     
  Branches     4391     3245     -1146     
===========================================
- Hits        42563    19292    -23271     
- Misses       1693     3579     +1886     
+ Partials      507      388      -119     
Impacted Files Coverage Δ
ibis/backends/duckdb/datatypes.py 100.00% <ø> (ø)
ibis/backends/duckdb/__init__.py 78.57% <66.23%> (-4.43%) ⬇️
ibis/expr/api.py 96.78% <100.00%> (+0.07%) ⬆️
ibis/backends/bigquery/version.py 0.00% <0.00%> (-100.00%) ⬇️
ibis/backends/bigquery/rewrites.py 0.00% <0.00%> (-100.00%) ⬇️
ibis/backends/bigquery/operations.py 0.00% <0.00%> (-100.00%) ⬇️
ibis/backends/bigquery/udf/rewrite.py 0.00% <0.00%> (-100.00%) ⬇️
ibis/backends/datafusion/datatypes.py 0.00% <0.00%> (-100.00%) ⬇️
ibis/backends/bigquery/registry.py 0.00% <0.00%> (-97.65%) ⬇️
ibis/backends/bigquery/compiler.py 0.00% <0.00%> (-97.02%) ⬇️
... and 374 more

@jayceslesar
Copy link
Contributor

This will close #4906

This is definitely the best way to go about it, regex dispatcher was an issue when I was looking at this

@gforsyth gforsyth force-pushed the gil/read_explicit branch 3 times, most recently from ec8117c to 455b677 Compare January 4, 2023 22:25
@gforsyth gforsyth changed the title refactor(duckdb): rework registration / file loading refactor: rework registration / file loading Jan 4, 2023
@gforsyth gforsyth marked this pull request as ready for review January 4, 2023 22:26
@gforsyth gforsyth force-pushed the gil/read_explicit branch 3 times, most recently from 404ffde to 34f327e Compare January 4, 2023 22:38
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! A few questions about coverage/testing.

ibis/backends/duckdb/__init__.py Outdated Show resolved Hide resolved
ibis/backends/duckdb/__init__.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/__init__.py Outdated Show resolved Hide resolved
ibis/backends/duckdb/__init__.py Outdated Show resolved Hide resolved
ibis/backends/duckdb/__init__.py Show resolved Hide resolved
ibis/backends/tests/test_register.py Show resolved Hide resolved
@gforsyth gforsyth force-pushed the gil/read_explicit branch 2 times, most recently from 48fd0e8 to d1a025d Compare January 5, 2023 14:33
@gforsyth gforsyth force-pushed the gil/read_explicit branch 10 times, most recently from 87ecd8a to 4ac4045 Compare January 5, 2023 18:41
Introduces explicit `read_` methods for various types of file /
in-memory object loading.

Removes the `RegexDispatcher` from the DuckDB, polars, and datafusion backends

Should be backwards compatible with existing register functionality,
plus users can now pass in iterables of CSV or parquet files (duckdb only), and also
make use of the explicit `read_` methods if they choose (say if filenames
are non-standard)

Somehow, Datafusion can't handle a `count` so I've added a gross
conditional to the tests for the moment.

This also removes the last usage of RegexDispatcher, so I've removed
that from `dispatch.py` and also removed the odo license file, which is
now no longer required.
@cpcloud cpcloud added this to the 4.0.0 milestone Jan 5, 2023
@cpcloud cpcloud added the duckdb The DuckDB backend label Jan 6, 2023
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@cpcloud cpcloud merged commit c60e30d into ibis-project:master Jan 6, 2023
@gforsyth gforsyth deleted the gil/read_explicit branch January 6, 2023 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duckdb The DuckDB backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants