Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define expected datatypes for datasets #304

Open
bendnorman opened this issue Dec 15, 2023 · 1 comment
Open

Define expected datatypes for datasets #304

bendnorman opened this issue Dec 15, 2023 · 1 comment

Comments

@bendnorman
Copy link
Contributor

I've noticed most the columns in the interconnection data, are object types which causes problems when I try to write the data to parquet files because there are mixed types. To resolve this I do a lot of type conversions.

It would be nice to have gridstatus' methods return dataframes with columns that do not have mixed datatypes and use data types that support pd.NA.

I'm only familiar with the interconnection queue data so I'm not sure if this issue exists in the other gridstatus datasets. Something like Pandera or pydantic could be used to organize and correct dtypes.

@kmax12
Copy link
Collaborator

kmax12 commented Dec 15, 2023

I don't think many other datasets have mixed datatypes, at least I haven't noticed it be a problem.

That being said, I would very much welcome a contribution doing it for at least one of the interconnection datasets to give me a better sense if it would be more broadly helpful.

I'm wondering it could replace/improve a good chunk of our testing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants