New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update flight #175

Merged
merged 11 commits into from Jun 26, 2018

Conversation

Projects
None yet
3 participants
@Seth-Rothschild
Contributor

Seth-Rothschild commented Jun 25, 2018

Rework the load_flight data to use newly uploaded csvs from 2017. New functionality:

  1. Can take in a month_filter to restrict which months are used (e.g. [3, 5] to use March and May)
  2. Can take in a categorical_filter to restrict which categorical values are loaded. As an example {'dest_city': ['Boston, MA'], 'origin_city': ['Washington, DC']} will take all flights into Boston and all flights out of Washington DC).
  3. An optional verbose argument which gives a progress bar while loading the data.

Seth-Rothschild added some commits Jun 19, 2018

if demo:
filename = 'flight_dataset_sample.csv.zip'
key = 'bots_flight_data_2017/data_2017_jan_feb.csv.zip'
rows = 860457.0

This comment has been minimized.

@kmax12

kmax12 Jun 25, 2018

Member

does this have to be a float?

This comment has been minimized.

@Seth-Rothschild

Seth-Rothschild Jun 25, 2018

Contributor

We do rows/ 99 earlier. Not sure if the cleanest solution is to do this, 99. or float(rows) when we divide.

This comment has been minimized.

@kmax12

kmax12 Jun 25, 2018

Member

not sure it matters because you take the ceil of it, but move any casting as close as possible to where it is required

if month_filter is not None:
tmp = False
for month in month_filter:
tmp = tmp | (clean_data['scheduled_dep_time'].apply(lambda x: x.month) == month)

This comment has been minimized.

@kmax12

kmax12 Jun 25, 2018

Member

can this be clean_data['scheduled_dep_time'].dt.month == month? if that works, it's probably much faster than apply

This comment has been minimized.

@kmax12

kmax12 Jun 25, 2018

Member

also, you should try to use an isin here rather than iterate over each month

@codecov-io

This comment has been minimized.

codecov-io commented Jun 25, 2018

Codecov Report

Merging #175 into master will increase coverage by 0.28%.
The diff coverage is 89.01%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
+ Coverage   92.86%   93.15%   +0.28%     
==========================================
  Files          69       70       +1     
  Lines        7623     7551      -72     
==========================================
- Hits         7079     7034      -45     
+ Misses        544      517      -27
Impacted Files Coverage Δ
featuretools/tests/demo_tests/test_demo_data.py 100% <100%> (ø) ⬆️
featuretools/demo/flight.py 88.5% <87.65%> (+65.92%) ⬆️
featuretools/entityset/entity.py 85.81% <0%> (-1.44%) ⬇️
featuretools/utils/wrangle.py 66.98% <0%> (-0.91%) ⬇️
featuretools/variable_types/variable.py 91.66% <0%> (-0.65%) ⬇️
featuretools/tests/dfs_tests/test_dfs_method.py 98.07% <0%> (-0.34%) ⬇️
featuretools/entityset/entityset.py 93.55% <0%> (-0.22%) ⬇️
featuretools/tests/entityset_tests/test_entity.py 100% <0%> (ø) ⬆️
...ols/tests/feature_function_tests/test_agg_feats.py 98.52% <0%> (ø) ⬆️
featuretools/synthesis/dfs.py 100% <0%> (ø) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2aead5...2301f68. Read the comment docs.

@Seth-Rothschild Seth-Rothschild force-pushed the update-flight branch from 4989609 to 2301f68 Jun 25, 2018

@kmax12

This comment has been minimized.

Member

kmax12 commented Jun 25, 2018

Looks good to me

@kmax12 kmax12 merged commit f224e22 into master Jun 26, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
license/cla Contributor License Agreement is signed.
Details

@rwedge rwedge referenced this pull request Jul 2, 2018

Merged

v0.2.1 #180

@Seth-Rothschild Seth-Rothschild deleted the update-flight branch Aug 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment