Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(examples): add nycflights13 dataset to examples #8746

Merged
merged 3 commits into from
Mar 23, 2024

Conversation

gforsyth
Copy link
Member

Resolves #8718

As an example:

[ins] In [1]: from ibis.interactive import *

[ins] In [2]: ibis.examples.nycflights13_airports.fetch()
Out[2]: 
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━┓
┃ faanamelatlonalttz    ┃ … ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━┩
│ stringstringfloat64float64int64int64 │ … │
├────────┼────────────────────────────┼───────────┼─────────────┼───────┼───────┼───┤
│ 04GLansdowne Airport41.130472-80.6195831044-5 │ … │
│ 06AMoton Field Municipal Air… │ 32.460572-85.680028264-6 │ … │
│ 06CSchaumburg Regional41.989341-88.101243801-6 │ … │
│ 06NRandall Airport41.431912-74.391561523-5 │ … │
│ 09JJekyll Island Airport31.074472-81.42777811-5 │ … │
│ 0A9Elizabethton Municipal Ai… │ 36.371222-82.1734171593-5 │ … │
│ 0G6Williams County Airport41.467306-84.506778730-5 │ … │
│ 0G7Finger Lakes Regional Air… │ 42.883565-76.781232492-5 │ … │
│ 0P2Shoestring Aviation Airfi… │ 39.794824-76.6471911000-5 │ … │
│ 0S9Jefferson County Intl48.053809-122.810644108-8 │ … │
│ …      │ …                          │         … │           … │     … │     … │ … │
└────────┴────────────────────────────┴───────────┴─────────────┴───────┴───────┴───┘

To do this I've added an entry to gen_registry.py that pulls the flight data and converts it to zstd-compressed parquet.

I've also added a pixi.toml and pixi.lock and a short README for the examples folder so other maintainers can start with a little more context.

I should also note that this is very far from perfect -- currently, when running gen_registry.py, the nycflights data DOES get added, but it also removes several entries from the metadata.json file. I've manually removed all of those changes.

The short version, though, is that what gen_registry currently does only provides a subset of our current examples offering.

Description of changes

Issues closed

Download, unzip, convert to zstd parquet
@gforsyth gforsyth added the examples Issues/PRs related to `ibis.examples` label Mar 22, 2024
In `ibis/examples`, run `pixi shell` to drop into a shell with the
required dependencies to run both `gen_registry.py` and `gen_examples.R`

After running `pixi shell`, move up to the root of the repo before
running

`python ibis/examples/gen_registry.py -b ibis-examples`
@gforsyth
Copy link
Member Author

Also, the lock file is enormous and if we'd rather it didn't get checked in, I'm happy to remove it and to pin down the dependencies in the pixi.toml file a little more closely.

@cpcloud cpcloud added the developer-tools Tools related to ibis development label Mar 23, 2024
@cpcloud cpcloud added this to the 9.0 milestone Mar 23, 2024
@cpcloud cpcloud merged commit 6181114 into ibis-project:main Mar 23, 2024
94 checks passed
@gforsyth gforsyth deleted the nycflights_data branch March 25, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer-tools Tools related to ibis development examples Issues/PRs related to `ibis.examples`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(examples): add NYC flights data to R registry
2 participants