PERF: DataFrame constructor from list dataclasses #44306

ezerkar · 2021-11-03T18:47:25Z

I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the master branch of pandas.

Reproducible Example

@dataclass()
class Example:
    first: int
    second: int

class_list = [Example(random.randint(0,1000), random.randint(0,1000)) for x in range(1000)]

pd.DataFrame(class_list)
6.1 ms ± 902 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is probably because the constructor uses asdict which is quite slow, think we can make the constructor work without asdict, something along these lines:

pd.DataFrame([(x.first, x.second) for x in class_list], columns = ['first', 'second'])
653 µs ± 58.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-38-generic
Version : #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_IL
LOCALE : en_IL.UTF-8

pandas : 1.3.4
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : 0.14.1
pyarrow : 5.0.0
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

Prior Performance

No response

The text was updated successfully, but these errors were encountered:

phofl · 2021-12-22T23:06:14Z

It's not that simple unfortunately. asdict resolves the attributes recursively. Also your example would not cover different types of dataclasses

ezerkar · 2021-12-23T06:48:51Z

Yes you are right, and I haven't realised that when first posting the suggestion.
Saying that, I'm not sure losing the recursion is entirely bad as right now this constructor is more similar to json normalizer than to a plain constructor.
For instance let's say that one of the fields in the dataclass is a dataclass on its own, the current asdict based constructor will open that to columns, while the user might want it to be a single column with a dataclass in it.
But this is a much wider discussion.

phofl · 2021-12-23T12:27:50Z

Yep you are correct, this would be an API change.

Also I personally don't like DataFrames with nested data, so I would prefer that my dataclass gets resolved.

ezerkar · 2021-12-24T07:17:38Z

OK, thanks,
see your point , makes sense
Closing

ezerkar added Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance labels Nov 3, 2021

mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 6, 2021

ezerkar closed this as completed Dec 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: DataFrame constructor from list dataclasses #44306

PERF: DataFrame constructor from list dataclasses #44306

ezerkar commented Nov 3, 2021

INSTALLED VERSIONS

phofl commented Dec 22, 2021

ezerkar commented Dec 23, 2021 •

edited

Loading

phofl commented Dec 23, 2021

ezerkar commented Dec 24, 2021

PERF: DataFrame constructor from list dataclasses #44306

PERF: DataFrame constructor from list dataclasses #44306

Comments

ezerkar commented Nov 3, 2021

Reproducible Example

Installed Versions

INSTALLED VERSIONS

Prior Performance

phofl commented Dec 22, 2021

ezerkar commented Dec 23, 2021 • edited Loading

phofl commented Dec 23, 2021

ezerkar commented Dec 24, 2021

ezerkar commented Dec 23, 2021 •

edited

Loading