Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with parsing PeriodDtype columns in read_csv() #26934

Closed
mc4229 opened this issue Jun 19, 2019 · 4 comments · Fixed by #27380
Closed

Issue with parsing PeriodDtype columns in read_csv() #26934

mc4229 opened this issue Jun 19, 2019 · 4 comments · Fixed by #27380
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue IO CSV read_csv, to_csv
Milestone

Comments

@mc4229
Copy link

mc4229 commented Jun 19, 2019

Code Sample

df = pd.DataFrame({'Int': [1, 2, 3], 'Period': pd.period_range(start="2019-01", end="2019-03", freq="M")})
df.to_csv("PeriodDtype.csv")
pd.read_csv("PeriodDtype.csv", dtype={"Int": np.int64, "Period": pd.PeriodDtype("M")})

Problem description

Using pandas 0.24.2, I wrote a simple data frame with the following dtypes into a csv file,

Int           int64
Period    period[M]
dtype: object

When I tried to read it back in, I found that read_csv() could not parse PeriodDtype("M"). I got the following error message:

NotImplementedError: Extension Array: <class 'pandas.core.arrays.period.PeriodArray'> 
must implement _from_sequence_of_strings in order to be used in parser methods

I saw a similar issue #24542 raised for Datetime dtype. It seems that _from_sequence_of_strings() is also not defined for PeriodArray, which prevents parsing columns with PeriodDtype.

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.16.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Yep. You may be able to reuse _from_sequence.

@TomAugspurger TomAugspurger added ExtensionArray Extending pandas with custom dtypes or arrays. IO CSV read_csv, to_csv labels Jun 19, 2019
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Jun 19, 2019
@mc4229
Copy link
Author

mc4229 commented Jun 19, 2019

@TomAugspurger Thanks! I will take a look at this.

@PaulCherian
Copy link

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Yep. You may be able to reuse _from_sequence.

@mc4229 I tried calling _sequence directly from _from_sequence_of_strings() and it works

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>
@chibby0ne
Copy link
Contributor

Hi all, I created a PR using for this issue using the suggested approach.

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>
chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>
chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 14, 2019
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants