Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with parsing PeriodDtype columns in read_csv() #26934

Closed
mc4229 opened this issue Jun 19, 2019 · 4 comments

Comments

@mc4229
Copy link

commented Jun 19, 2019

Code Sample

df = pd.DataFrame({'Int': [1, 2, 3], 'Period': pd.period_range(start="2019-01", end="2019-03", freq="M")})
df.to_csv("PeriodDtype.csv")
pd.read_csv("PeriodDtype.csv", dtype={"Int": np.int64, "Period": pd.PeriodDtype("M")})

Problem description

Using pandas 0.24.2, I wrote a simple data frame with the following dtypes into a csv file,

Int           int64
Period    period[M]
dtype: object

When I tried to read it back in, I found that read_csv() could not parse PeriodDtype("M"). I got the following error message:

NotImplementedError: Extension Array: <class 'pandas.core.arrays.period.PeriodArray'> 
must implement _from_sequence_of_strings in order to be used in parser methods

I saw a similar issue #24542 raised for Datetime dtype. It seems that _from_sequence_of_strings() is also not defined for PeriodArray, which prevents parsing columns with PeriodDtype.

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.16.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Yep. You may be able to reuse _from_sequence.

@mc4229

This comment has been minimized.

Copy link
Author

commented Jun 19, 2019

@TomAugspurger Thanks! I will take a look at this.

@PaulCherian

This comment has been minimized.

Copy link

commented Jun 19, 2019

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Yep. You may be able to reuse _from_sequence.

@mc4229 I tried calling _sequence directly from _from_sequence_of_strings() and it works

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>
@chibby0ne

This comment has been minimized.

Copy link
Contributor

commented Jul 13, 2019

Hi all, I created a PR using for this issue using the suggested approach.

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 14, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv
Fixes: pandas-dev#26934

Signed-off-by: Antonio Gutierrez <chibby0ne@gmail.com>

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.