Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Letter Value Plots #661

Merged
merged 21 commits into from Nov 15, 2015
Merged

Letter Value Plots #661

merged 21 commits into from Nov 15, 2015

Conversation

drewokane
Copy link
Contributor

This pull request adds the _LVPlotter class, lettervalueplot function, documentation, and usage examples.

TODO


  • PEP8 errors
  • Rename plotting function lvplot
  • Median lines should stay inside boxes
  • Change default scale to "exponential"
  • Rename box_widths to scale
  • Put default values into function signature
  • Remove deprecation warnings
  • Rename p to outlier_prop
  • Change outlier_prop to take float, [0, 1].
  • Fix documentation
  • Change outlier plotting symbol to d for internal consistency
  • Make width functions class level functions
  • Write tests

Added letter value plot class and plotting function. In addition, added documentation and examples.
@mwaskom
Copy link
Owner

mwaskom commented Jul 30, 2015

If you want to work out the pep8 errors locally you can install pip install pep8==1.5 which will be the version on travis. The pep8 command got a little stricter and some things in seaborn fail, so just running the latest version won't be that informative about the new code.

@mwaskom
Copy link
Owner

mwaskom commented Jul 30, 2015

I think the function could be called lvplot like it is in R. A little confusing with lmplot, but lettervalueplot is such a mouthful.

orient, color, palette, saturation,
width, k_depth, linewidth, box_widths):

if width is None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These default values should be specified in the function signature for lvplot.

@mwaskom
Copy link
Owner

mwaskom commented Jul 30, 2015

This looks good on a very brief first pass, but it will be very important to get tests on the numerical internals.

ax.plot([y, y], [x - w / 2, x + w / 2], c='k', alpha=.45, **kws)

ax.scatter(outliers, np.repeat(x, len(outliers)),
marker=r"$\ast$", c=color)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just a normal "o"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really have a preference as far as marker for outliers. Matplotlib uses '+' markers, so I could try to be consistent with Matplotlib or go with 'o'. Thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the circles you had in the example plots I just think it will be faster to draw those with "o" than with a custom latex marker, right?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could just use "d" to be consistent with seaborn boxplots i guess...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to 'd' for consistency.

@mwaskom
Copy link
Owner

mwaskom commented Jul 31, 2015

Also before I forget this should get added as a plot kind in factorplot too eventually.

@mwaskom
Copy link
Owner

mwaskom commented Jul 31, 2015

The median lines extend past the width of the boxes, so they overlap on nested plots. Is this intentional? I don't love the effect.

dcouraphf7aaaaaelftksuqmcc

@mwaskom
Copy link
Owner

mwaskom commented Jul 31, 2015

I find this kind of weird:

df = pd.DataFrame({"normal": np.random.normal(3, 1, 10000),
                   "gamma": np.random.gamma(3, .5, 10000)})

sns.lettervalueplot(data=df)
sns.violinplot(data=df, scale="width")

az5eiuygm1 2aaaaaelftksuqmcc

I think I still don't quite understand how these things get drawn, but I would have expected the letter-value plots to look more like a discretized violinplot. Is that wrong?

@drewokane
Copy link
Contributor Author

Your intuition wouldn't be off in this case. The widths of the boxes are determined by which width function is chosen in _lvplot. My initial stab at this I liked the look of the "linear" widths, but if you use the "exponential" widths it actually looks like a discrete violin plot (as you mentioned). Perhaps changing the default to "exponential" will be better for consistency sake. (Also, I realized there is a bug, in that I am not passing the keyword argument to the plotter (embarassed))

df = pd.DataFrame({"normal": np.random.normal(3, 1, 10000),
                   "gamma": np.random.gamma(3, .5, 10000)})

sns.violinplot(data=df, scale="width")
sns.lettervalueplot(data=df, box_widths='exponential')

image

@mwaskom
Copy link
Owner

mwaskom commented Jul 31, 2015

Ah, OK. I had played around with that keyword argument, but couldn't seem to get the effect I was looking for. I guess that explains why :)

@mwaskom
Copy link
Owner

mwaskom commented Jul 31, 2015

Speaking of, I think the box_widths parameter might make more sense as scale, for better consistency with the violinplot paramter. The values aren't the same, but they have a similar effect. Also for some reason I kept expecting box_widths to take a numeric size value.

@mwaskom
Copy link
Owner

mwaskom commented Aug 28, 2015

Any update on tests?

@drewokane
Copy link
Contributor Author

I've added some basic tests so far and plan on adding tests for the numerical internals, e.g. testing the letter value calculations, etc. I'll push the latest edits I have today.

@mwaskom
Copy link
Owner

mwaskom commented Aug 28, 2015

Cool. By the way, I would generally advise writing the tests in the other direction. I've found that it's best to start directly testing the core numeric parts, then to pull back and test some of the plotting methods, and finally pull back further and add some high-level smoke tests for the function itself (what I assume you mean by "basic tests"). The reason is that this way lets the code coverage report be a useful guide to how complete the tests are. If you start with tests that run the function and just ask "are there some plots on the axes", it will look like the internals are "covered", but the plots very well may be wrong. It might be helpful to rename the basic tests so nose doesn't run them as you develop and use the coverage report to make sure you're really finding all the corners in the numeric code. Anyway, just some thoughts.

@mwaskom
Copy link
Owner

mwaskom commented Aug 28, 2015

Also can you add a TODO list in the original issue just so we can track the items that have come up in the comments?

@drewokane
Copy link
Contributor Author

I've completed writing the tests (making them Python 3 forward compatible) and the other tasks on the checklist. I've checked the tests (locally with nosetests) and the only failures I can see are the ones connected with other modules, not TestLVPlotter.

@mwaskom
Copy link
Owner

mwaskom commented Sep 12, 2015

The Python 3 errors are in TestLVPlotter. It's hard to tell with the Python 2 build as it didn't make it past the PEP8 check.

@drewokane
Copy link
Contributor Author

Yeah I checked Travis. I didn't realize filter and map were iterators in Python 3. Hopefully the modifications pass muster.

@drewokane
Copy link
Contributor Author

Unfortunately I can't duplicate the PEP8 error Travis is getting (and I made sure to install pep8==1.5).

def test_draw_missing_boxes(self):

ax = cat.lvplot("g", "y", data=self.df,
order=["a", "b", "c", "d"])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is over-indented

@drewokane
Copy link
Contributor Author

Build is passing on all versions, PEP8 is all done.

@drewokane
Copy link
Contributor Author

Any more comments, concerns, suggestions?

@drewokane
Copy link
Contributor Author

Not trying to be bothersome but I wanted to check in a see if you had any further comments or suggestions for the PR? I completely understand if you're swamped with other stuff.

@mwaskom
Copy link
Owner

mwaskom commented Nov 15, 2015

Yeah, sorry about letting this drop. This is a substantial enough addition that I've really wanted to make time to go through it in detail and make sure I understand how everything works. Unfortunately, I've been very busy the past few months (and will be for the foreseeable future) and just have not been able to devote a large enough block of time to get my head around everything here. But since you put so much effort into this, I really do feel bad about letting this hang in limbo. I'll merge it now so that it can start to get some real use, hopefully any residual issues will get identified and ironed out before the next release. If you could try to stay on top of anything that pops up, that would be very helpful.

mwaskom added a commit that referenced this pull request Nov 15, 2015
@mwaskom mwaskom merged commit 1078424 into mwaskom:master Nov 15, 2015
@stonebig
Copy link
Contributor

Hi,

May the documentation take Matplotlib 1.5 into account now ?

ax = sns.lvplot(x="size", y="tip", data=tips.sort("size"))
# or, if you are on Matplotlib 1.5+
ax = sns.lvplot(x="size", y="tip", data=tips.sort_values(by="size"))

@drewokane
Copy link
Contributor Author

@mwaskom Sure, I can stay on top of any issues that pop up. Thanks!

@drewokane drewokane deleted the lv_plot branch November 15, 2015 19:10
@mwaskom
Copy link
Owner

mwaskom commented Nov 15, 2015

May the documentation take Matplotlib 1.5 into account now ?

This is a change in pandas, not matplotlib, but yes that example should be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants