Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_violin fails when the same values is often in the data #317

Closed
brechtmann opened this issue Sep 19, 2019 · 2 comments
Closed

geom_violin fails when the same values is often in the data #317

brechtmann opened this issue Sep 19, 2019 · 2 comments
Labels

Comments

@brechtmann
Copy link

brechtmann commented Sep 19, 2019

plotnine gives unclear error and shows no violin plot, when a values appears to often in the to be plotted variable.

See the following example code:

import plotnine as p9
import pandas as pd
import numpy as np

x = pd.DataFrame(data = np.array([np.ones(10), np.repeat(['A', 'B'], 5)]).T, columns=['ones', 'cat'])

p9.ggplot(x, p9.aes('factor(cat)', 'ones')) + p9.geom_violin()

Executing this example gives the following error:
plotnine.exceptions.PlotnineError: 'Width not defined. Set with position_dodge(width = ?)'

Running the same example in R ggplot2 produces a plot:

library(ggplot2)
library(data.table)

 x = data.table(ones = rep(1, 10), cat = rep(c('A', 'B')))
ggplot(x, aes(cat, ones)) + geom_violin()

The same happens, when there is only a small fraction of other values in the data.

Thanks!

@brechtmann brechtmann changed the title geom_violin fails when many geom_violin fails when many the same values is often in the data Sep 19, 2019
@TyberiusPrime
Copy link
Contributor

TyberiusPrime commented Sep 19, 2019

The error message with master-2 (4bba2e4) seems to be slightly different:

/anysnake/code_venv/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py:487: RuntimeWarning: invalid value encountered in true_divide
  binned = fast_linbin(X, a, b, gridsize) / (delta * nobs)
/anysnake/code_venv/lib/python3.7/site-packages/statsmodels/nonparametric/kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
  FAC1 = 2*(np.pi*bw/RANGE)**2
/anysnake/code_venv/lib/python3.7/site-packages/statsmodels/sandbox/nonparametric/kernels.py:204: RuntimeWarning: divide by zero encountered in double_scalars
  w = 1. / (h * n) * np.sum(self((xs-x)/h), axis=0)
/anysnake/code_venv/lib/python3.7/site-packages/statsmodels/sandbox/nonparametric/kernels.py:204: RuntimeWarning: invalid value encountered in true_divide
  w = 1. / (h * n) * np.sum(self((xs-x)/h), axis=0)

@brechtmann
Copy link
Author

It seems to be a probablem in the density estimation. A similar error happens for:

p9.ggplot(x, p9.aes('factor(cat)', 'ones')) + p9.geom_density()

@brechtmann brechtmann changed the title geom_violin fails when many the same values is often in the data geom_violin fails when the same values is often in the data Sep 23, 2019
@has2k1 has2k1 added the bug label Sep 24, 2019
@has2k1 has2k1 closed this as completed in 035083f Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants