Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include missing data count in pd.Dataframe.describe method #21689

Open
77QingLiu opened this issue Jun 30, 2018 · 7 comments
Open

Include missing data count in pd.Dataframe.describe method #21689

77QingLiu opened this issue Jun 30, 2018 · 7 comments
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@77QingLiu
Copy link

Code Sample

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.describe()

Problem description

  • Output
    image

The describe method generally only include 9 summary statistics(count, mean, std, min, 25%, 50%, 75%, max, missing) but no missing count which is very import in realworld data analysis.

To include missing count I have to use the following code,

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
des1 = df.describe()
des2 = df.isnull().sum().to_frame(name = 'missing').T
pd.concat([des1, des2])

And the output
image

Expected Output

Expect include missing count in describe method.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jul 1, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 1, 2018

@77QingLiu : Are you proposing that we mix together some of the output of DataFrame.info() (this gives you non-null info) and DataFrame.describe()?

@77QingLiu
Copy link
Author

@gfyoung , Yes, Exactly

@gfyoung
Copy link
Member

gfyoung commented Jul 1, 2018

cc @jreback @jorisvandenbossche

@xuhuizhang
Copy link

Include missing data count in pd.Dataframe.describe() is definitely necessary.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2018

count is the non missing length
so i guess you could add length (or size) is what we call it

@petesherick
Copy link

Agree, this is default behavior of R's summary(df) function for obvious reasons. More useful than sd anyway.

@drkarthi
Copy link

drkarthi commented Nov 8, 2022

@jorisvandenbossche is there still interest in the maintainer community to add the length of the dataframe in describe()? Happy to make a contribution picking up from @alexander-ponomaroff's work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants