Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #19215

Closed
wants to merge 2 commits into from

Conversation

pdpark
Copy link

@pdpark pdpark commented Jan 12, 2018

@codecov
Copy link

codecov bot commented Jan 13, 2018

Codecov Report

Merging #19215 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19215      +/-   ##
==========================================
+ Coverage   91.53%   91.55%   +0.02%     
==========================================
  Files         147      147              
  Lines       48797    48797              
==========================================
+ Hits        44664    44676      +12     
+ Misses       4133     4121      -12
Flag Coverage Δ
#multiple 89.92% <ø> (+0.02%) ⬆️
#single 41.6% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 66.95% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8347ff8...11ff8a7. Read the comment docs.

@jreback jreback added the Docs label Jan 13, 2018


Alternative to storing lists in DataFrame Cells
------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be the same length as the title


nearest_neighbors = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3
nearest_neighbors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make things into separate ipython:: python blocks, rather than using comments (you can simply write text and not use the #)

nearest_neighbors

#. Create an index with the "parent" columns to be included in the final Dataframe
df2 = pd.concat([df[['name','opponent']], pd.DataFrame(nearest_neighbors)], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to keep naming the dataframes, just use

df = ..... or whatever

------------------------------------------------------
Storing nested lists/arrays inside a pandas object should be avoided for performance and memory use reasons. Instead they should be "exploded" into a flat ``DataFrame`` structure.

Example of exploding nested lists into a DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you have 2 examples you can use another level of sub-section

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

can you update

@jreback
Copy link
Contributor

jreback commented Aug 2, 2018

can you rebase and update

@pdpark
Copy link
Author

pdpark commented Aug 31, 2018

Will do - have been absent due to starting new job, but plan to spend some time on this.

@datapythonista
Copy link
Member

Closing as discontinued. Superseded by #23041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: section on caveats of storing lists inside DataFrame/Series
3 participants