Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0' #23050

dpalamuri · 2018-10-08T20:54:42Z

Code Sample, a copy-pastable example if possible

from os import path
import pandas as pd
import numpy as np

input_file = path.join(r'C:\DUMP', 'Process Log 2 Week_2.txt')
tdf = pd.read_csv(input_file, low_memory=False)

# Value Error in this statement -->
tdf_gsdf = tdvdf.groupby(tdvdf.columns.tolist()).size()

Problem description

The Above code is giving 'Value Error : Length of passed values is 65, index implies 0'
I'm trying to identify unique/duplicate rows by grouping by all of the columns in Data Frame.

(Attached the text file here).
Process Log 2 Week_2.txt

I'm new to Python, Pandas and this community as well. just trying to automate few tasks in my project.
I think this might be related to Issue #21624. Not sure how to link.

Expected Output

Output should give distinct rows and corresponding count from DataFrame.

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.4.3
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

math-and-data · 2018-10-08T22:44:16Z

The problem are the NA entries in your dataset. Each row in your dataset has at least one NA somewhere. When you apply .groupby to NA entries, it wouldn't know how to group NAs so it removes them, leaving an empty result (length 0).

See http://pandas.pydata.org/pandas-docs/stable/missing_data.html#na-values-in-groupby and http://pandas.pydata.org/pandas-docs/stable/groupby.html#na-and-nat-group-handling

WillAyd · 2018-10-08T23:12:25Z

Something does appear off here. If you can make your example self-contained (i.e. replicates the issue without an external file) would be much easier for someone to take a look at.

Fixing NA issue: "The problem are the NA entries in your dataset. Each row in your dataset has at least one NA somewhere. When you apply .groupby to NA entries, it wouldn't know how to group NAs so it removes them, leaving an empty result (length 0)." pandas-dev/pandas#23050

adamhooper · 2019-04-16T20:28:08Z

>>> df = pd.DataFrame({'A': ['x', 'y'], 'B': [np.nan, np.nan]})
>>> df.groupby(['A', 'B']).size()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1227, in size
    result = self.grouper.size()
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 233, in size
    dtype='int64')
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/pandas/core/series.py", line 249, in __init__
    .format(val=len(data), ind=len(index)))
ValueError: Length of passed values is 2, index implies 0

(The "correct" result, according to pandas docs, is an empty DataFrame.)

adamhooper · 2019-04-16T21:56:16Z

A workaround, in the meantime, which I think is equivalent:

df.groupby(['A', 'B']).agg(len)

H-peace · 2019-08-18T10:26:43Z

look here,this Error:help me

WillAyd added Groupby Needs Info Clarification about behavior needed to assess issue labels Oct 8, 2018

adamhooper mentioned this issue Apr 16, 2019

BUG: GroupBy.size with all-null data raises ValueError #26112

Merged

4 tasks

jreback added this to the 0.25.0 milestone Apr 16, 2019

jreback closed this as completed in #26112 Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0' #23050

Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0' #23050

dpalamuri commented Oct 8, 2018

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

math-and-data commented Oct 8, 2018 •

edited

Loading

WillAyd commented Oct 8, 2018

adamhooper commented Apr 16, 2019

adamhooper commented Apr 16, 2019

H-peace commented Aug 18, 2019

Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0' #23050

Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0' #23050

Comments

dpalamuri commented Oct 8, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

math-and-data commented Oct 8, 2018 • edited Loading

WillAyd commented Oct 8, 2018

adamhooper commented Apr 16, 2019

adamhooper commented Apr 16, 2019

H-peace commented Aug 18, 2019

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

math-and-data commented Oct 8, 2018 •

edited

Loading