Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ValueError when trying to compute Quantile #14357
Comments
Rubyj
changed the title from
ValueError when trying to compute Quartile to ValueError when trying to compute Quantile
Oct 5, 2016
|
Can you please make this a fully reproducible example with dummy data? |
Rubyj
commented
Oct 5, 2016
|
I have tracked this error down to there being NaN values in some, but not all, of the columns for a row (2 out of 10 in this case). I then tried to compute the quartile of that DF and pandas did not like this. My solution is to plug the NaN values with 0. |
|
Edited in a reproducible example. Hard to say for sure, but maybe related to 4de83d2 It's definitely related to a (float) block having some cols with missing values: In [11]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)))
In [13]: df.iloc[1, 1] = np.nan
In [14]: df.quantile(.5)
Out[14]:
0 4.5
1 7.0
Name: 0.5, dtype: float64and In [15]: df = pd.DataFrame(np.random.randn(10, 2))
In [17]: df.iloc[0, :] = np.nan
In [18]: df.quantile(.5)
Out[18]:
0 0.347815
1 0.072105
Name: 0.5, dtype: float64both work |
TomAugspurger
added Missing-data Regression Numeric
labels
Oct 5, 2016
TomAugspurger
added this to the
0.19.1
milestone
Oct 5, 2016
TomAugspurger
added Effort Medium Difficulty Intermediate Effort Low and removed Effort Medium
labels
Oct 5, 2016
|
|
@Rubyj you'll have to show a complete end-to-end reproducible example. This was a bug in 0.18.1 but is correct in 0.19.0. |
jreback
removed this from the
0.19.1
milestone
Oct 6, 2016
jreback
removed Difficulty Intermediate Effort Low Regression
labels
Oct 6, 2016
|
@jreback the problem seems to be a DataFrame with a FloatBloack that has at least 1 col with no missing values and at least 1 col with some missing values (see my edit at the top of the OP) |
Rubyj
commented
Oct 6, 2016
•
|
@TomAugspurger provided a reproducible example for me in my original post and added the labels that you removed. Not sure if you saw that. Thank you Tom |
jorisvandenbossche
added the
Regression
label
Oct 6, 2016
jorisvandenbossche
added this to the
0.19.1
milestone
Oct 6, 2016
|
@TomAugspurger your example works, I see that you changed the top of post. thanks. |
|
so in this case, the individual dims needs to be iterated (corresponding with the columns). with the quantiling then combined, rather than doing this all at once. numpy doesn't handle the nans in the quantiling. |
Rubyj commentedOct 5, 2016
•
edited
original post follows
I have a simple dataframe that I created as follows:
df[df['Week of'] == week]where
weekis a week name I'm filtering byI have been taking the quartile values of this dataframe as follows:
df[df['Week of'] == week].quantile(.25)However since the update to Pandas 0.19 I am receiving the error (this code worked fine before):
values = values.reshape(result_shape)ValueError: total size of new array must be unchanged