Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add cookbook entry using callable method for DataFrame.corr #22761

Merged
merged 1 commit into from
Oct 7, 2018
Merged

DOC: Add cookbook entry using callable method for DataFrame.corr #22761

merged 1 commit into from
Oct 7, 2018

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Sep 19, 2018

Provides a cookbook entry using the callable method option for DataFrame.corr (PR #22684) to calculate a distance correlation matrix. (Related: issue #22402)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the distance correlation you provided different from: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.correlation.html ?

While the example looks interesting, it doesn't seem enough related to pandas to be worth adding it to our cookbook.

If the function in scipy is the same as the one you implemented, it could be nice having this example (using the scipy function) in the DataFrame.corr docstring.

What do you think?

b = np.zeros(shape=(n, n))

for i in range(n):
for j in range(i+1, n):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing spaces around +

b[i, j] = abs(y[i] - y[j])

a = a + a.T
b = b + b.T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a += a.T is more Pythonic

@dsaxton
Copy link
Member Author

dsaxton commented Sep 19, 2018

The function from scipy.spatial.distance is taking one minus the ordinary Pearson correlation, so this is a different calculation. The cookbook entry was requested by @TomAugspurger in #22402 and I think the goal was just to show the flexibility of allowing callable methods in DataFrame.corr while also being somewhat interesting in itself.

@codecov
Copy link

codecov bot commented Sep 19, 2018

Codecov Report

Merging #22761 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22761      +/-   ##
==========================================
- Coverage   92.18%   92.18%   -0.01%     
==========================================
  Files         169      169              
  Lines       50830    50820      -10     
==========================================
- Hits        46860    46850      -10     
  Misses       3970     3970
Flag Coverage Δ
#multiple 90.6% <ø> (-0.01%) ⬇️
#single 42.38% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby/base.py 91.11% <0%> (-0.73%) ⬇️
pandas/plotting/_compat.py 90.9% <0%> (-0.4%) ⬇️
pandas/core/window.py 96.28% <0%> (-0.12%) ⬇️
pandas/core/internals/managers.py 96.55% <0%> (-0.11%) ⬇️
pandas/io/parsers.py 95.54% <0%> (-0.06%) ⬇️
pandas/core/algorithms.py 94.69% <0%> (-0.03%) ⬇️
pandas/core/nanops.py 95.12% <0%> (-0.02%) ⬇️
pandas/core/strings.py 98.63% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.45% <0%> (ø) ⬆️
pandas/core/frame.py 97.2% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4fae35...76584e4. Read the comment docs.

@TomAugspurger
Copy link
Contributor

I think this will be good to have.

We'll need to wait for #22684 first.

@datapythonista
Copy link
Member

Sorry @dsaxton, I didn't see the related issue (next time it may be a good idea to leave the template we have in the PR description, it makes it easier to see there is a related issue).

@gfyoung gfyoung added Docs Numeric Operations Arithmetic, Comparison, and Logical operations labels Sep 23, 2018
@gfyoung
Copy link
Member

gfyoung commented Sep 23, 2018

@dsaxton : Can you rebase onto master to fix the build issues?

@datapythonista
Copy link
Member

@dsaxton seems like something went wrong with git, and your PR contains many unrelated changes. Can you take a look?

In general that could be fixed by performing being in the PR branch:

git fetch upstream
git merge upstream/master
git reset --soft upstream/master
git commit -m "PR description"
git push -f

Thanks!

@dsaxton
Copy link
Member Author

dsaxton commented Sep 30, 2018

@datapythonista Sorry about that, could this have been caused by my rebase? I'll try the commands you're suggesting.

@datapythonista
Copy link
Member

@dsaxton it happens often to many people, but I'm not quite sure what's the cause.

@jreback jreback added this to the 0.24.0 milestone Oct 7, 2018
@jreback jreback merged commit 904af03 into pandas-dev:master Oct 7, 2018
@jreback
Copy link
Contributor

jreback commented Oct 7, 2018

thanks @dsaxton

@dsaxton
Copy link
Member Author

dsaxton commented Oct 7, 2018

@jreback NP, happy to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants