New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load retail demo data using compressed file #271

Merged
merged 9 commits into from Oct 2, 2018

Conversation

Projects
None yet
3 participants
@WillKoehrsen
Contributor

WillKoehrsen commented Oct 1, 2018

Changed load_retail to load a gz compressed version of the retail detail from S3. This reduces the size of the downloaded file from 43 MB to 7.3 MB.

WillKoehrsen added some commits Oct 1, 2018

@codecov-io

This comment has been minimized.

codecov-io commented Oct 1, 2018

Codecov Report

Merging #271 into master will decrease coverage by 0.02%.
The diff coverage is 60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #271      +/-   ##
==========================================
- Coverage   94.47%   94.45%   -0.03%     
==========================================
  Files          71       71              
  Lines        7696     7700       +4     
==========================================
+ Hits         7271     7273       +2     
- Misses        425      427       +2
Impacted Files Coverage Δ
featuretools/demo/retail.py 85% <60%> (-8.75%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8498bb6...8903c05. Read the comment docs.

WillKoehrsen added some commits Oct 1, 2018

Update docs spark (#272)
* Update to reference Spark

* Added reference to Spark scaling

Included references to spark notebook and article for using Featuretools on a cluster.
Added fallback to csv
Tries to load `gz` compressed and defaults to uncompressed `csv`  if error.
Updated documentation
Doc string now refers to both `gz` compressed and uncompressed versions of csv data file.
@WillKoehrsen

This comment has been minimized.

Contributor

WillKoehrsen commented Oct 2, 2018

Added a fallback to uncompressed version. This might be necessary if pandas.read_csv() cannot read the gz compressed version.

WillKoehrsen added some commits Oct 2, 2018

@kmax12

This comment has been minimized.

Member

kmax12 commented Oct 2, 2018

Looks good. Merging

@kmax12 kmax12 changed the title from Load retail gz compressed to Load retail demo data as compressed file Oct 2, 2018

@kmax12 kmax12 changed the title from Load retail demo data as compressed file to Load retail demo data using compressed file Oct 2, 2018

@kmax12 kmax12 merged commit da09b20 into master Oct 2, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
license/cla Contributor License Agreement is signed.
Details

@WillKoehrsen WillKoehrsen deleted the load-retail-zip branch Oct 3, 2018

@rwedge rwedge referenced this pull request Oct 31, 2018

Merged

v0.4.0 #304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment