Define expected datatypes for datasets #304

bendnorman · 2023-12-15T01:21:38Z

I've noticed most the columns in the interconnection data, are object types which causes problems when I try to write the data to parquet files because there are mixed types. To resolve this I do a lot of type conversions.

It would be nice to have gridstatus' methods return dataframes with columns that do not have mixed datatypes and use data types that support pd.NA.

I'm only familiar with the interconnection queue data so I'm not sure if this issue exists in the other gridstatus datasets. Something like Pandera or pydantic could be used to organize and correct dtypes.

The text was updated successfully, but these errors were encountered:

kmax12 · 2023-12-15T03:36:22Z

I don't think many other datasets have mixed datatypes, at least I haven't noticed it be a problem.

That being said, I would very much welcome a contribution doing it for at least one of the interconnection datasets to give me a better sense if it would be more broadly helpful.

I'm wondering it could replace/improve a good chunk of our testing code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define expected datatypes for datasets #304

Define expected datatypes for datasets #304

bendnorman commented Dec 15, 2023

kmax12 commented Dec 15, 2023

Define expected datatypes for datasets #304

Define expected datatypes for datasets #304

Comments

bendnorman commented Dec 15, 2023

kmax12 commented Dec 15, 2023