Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statsmodels.api.datasets issue #4775

Closed
atodniAr opened this issue Jul 4, 2018 · 14 comments
Closed

statsmodels.api.datasets issue #4775

atodniAr opened this issue Jul 4, 2018 · 14 comments
Milestone

Comments

@atodniAr
Copy link

@atodniAr atodniAr commented Jul 4, 2018

Issue

import pandas as pd
import statsmodels.api as sm

data = sm.datasets.co2.load_pandas()

returns

/usr/local/lib/python3.6/site-packages/statsmodels/datasets/co2/data.py in load_pandas()
     64     index = pd.DatetimeIndex(start=data.data['date'][0].decode('utf-8'),
     65                              periods=len(data.data), format='%Y%m%d',
---> 66                              freq='W-SAT')
     67     dataset = pd.DataFrame(data.data['co2'], index=index, columns=['co2'])
     68     #NOTE: this is how I got the missing values in co2.csv

TypeError: __new__() got an unexpected keyword argument 'format'

Fix

Seems pd.DatetimeIndex() has no argument 'format'. My code worked now after deleting the format argument.
The strange thing is this code worked one month ago, when I should have pandas 0.22.0, but I can't find this argument in the pandas documents for this patch as well.

@harjain99

This comment has been minimized.

Copy link

@harjain99 harjain99 commented Jul 14, 2018

Hi,
I've been getting the same error and I already tried the fix that you mentioned but it doesn't seem to work.
Please let me know if you have any solutions

`def load_pandas():
data = load()

# pandas <= 0.12.0 fails in the to_datetime regex on Python 3

index = pd.DatetimeIndex(start=data.data['date'][0].decode('utf-8'), periods=len(data.data), freq='W-SAT')

dataset = pd.DataFrame(data.data['co2'], index=index, columns=['co2'])

#NOTE: this is how I got the missing values in co2.csv

#new_index = pd.DatetimeIndex(start='1958-3-29', end=index[-1],
                             freq='W-SAT')

#data.data = dataset.reindex(new_index)

data.data = dataset

return data`

Any kind of help would be appreciated. Thanks in advance.

@atodniAr

This comment has been minimized.

Copy link
Author

@atodniAr atodniAr commented Jul 24, 2018

Hi @harjain99

Sorry for a late reply. I'm not sure about the error you got. Can you include the error message python returned?

@harjain99

This comment has been minimized.

Copy link

@harjain99 harjain99 commented Jul 28, 2018

Hi @atodniAr,
Its the exact same error as you got :(

@bashtage

This comment has been minimized.

Copy link
Contributor

@bashtage bashtage commented Aug 2, 2018

This is old statsmodels. Please update to 0.9.0.

@acbecker

This comment has been minimized.

Copy link

@acbecker acbecker commented Aug 2, 2018

I can recreate with:

conda 4.5.9 py36_0
python 3.6.5 hc3d631a_2
statsmodels 0.9.0 py36h035aef0_0
pandas 0.23.3 py36h04863e7_0

A quick workaround is

    index = pd.DatetimeIndex(start=data.data['date'][0].decode('utf-8'),
                             periods=len(data.data),
                             freq='W-SAT').strftime('%Y%m%d')
@bashtage

This comment has been minimized.

Copy link
Contributor

@bashtage bashtage commented Aug 3, 2018

You are right -- it is fixed in master but not in a released version.

@Interesting6

This comment has been minimized.

Copy link

@Interesting6 Interesting6 commented Jan 22, 2019

This is old statsmodels. Please update to 0.9.0.

My statsmodels's version is 0.9.0, however it doesn't work and return the same error.

And I use acbecker's method solved it.

Besides, can I complain about the statsmodels's "api" mode?😂
When I first use this module, I can see the "api" almost everywhere. But when I read the document of module, I can hardly see the "api". So I'm very confused what the difference between "import statsmodel.api as sm; sm.tsa.ARMA", "import statsmodel.tsa.api as smt; smt.ARMA" and "import statsmodel as sm; sm.tsa.arima_model.ARMA". 🙈
After I turn to read the site-packge, I realized they are equality. But now, I think using "api" also make me bit little puzzled. so I prefer to use "from statsmodel.tsa import arima_model as arimapy; arimapy.ARMA".

@josef-pkt

This comment has been minimized.

Copy link
Member

@josef-pkt josef-pkt commented Jan 22, 2019

@Interesting6 see http://www.statsmodels.org/devel/importpaths.html
It's too make two different types of working with statsmodels easier.
(I almost never import the api because the import time is too long for repeated restart of the python process during testing, in notebooks or in spyder. But many users want to have everything imported and available at once, which works well during longer interactive sessions.)

@josef-pkt

This comment has been minimized.

Copy link
Member

@josef-pkt josef-pkt commented Jan 22, 2019

One advantage of the api, expecially when new to statsmodels is that tab completion already shows what's available. Without api, you need to know the module where a function is officially () located.
(
) the actual module might be private, or in the sandbox.

@Interesting6

This comment has been minimized.

Copy link

@Interesting6 Interesting6 commented Jan 31, 2019

OK. Thank you.

@AVI18794

This comment has been minimized.

Copy link

@AVI18794 AVI18794 commented Feb 11, 2019

I got the same error even my statsmodels version is 0.9.0. Here is the attached screenshot of my code.
image

@FavorMylikes

This comment has been minimized.

Copy link

@FavorMylikes FavorMylikes commented Feb 27, 2019

please add

statsmodels
pandas==0.22.0

in requirements.txt under project directory and install (pycharm will popup some tip)

@bhishanpdl

This comment has been minimized.

Copy link

@bhishanpdl bhishanpdl commented Mar 26, 2019

For the following configuration I got the same error and also FIXED IT indirectly:

pandas      0.24.2
statsmodels 0.9.0
numpy       1.16.
sm.datasets.co2.load_pandas().data

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-d6eff66314c5> in <module>
----> 1 sm.datasets.co2.load_pandas().data

~/miniconda3/envs/tsa/lib/python3.7/site-packages/statsmodels/datasets/co2/data.py in load_pandas()
     64     index = pd.DatetimeIndex(start=data.data['date'][0].decode('utf-8'),
     65                              periods=len(data.data), format='%Y%m%d',
---> 66                              freq='W-SAT')
     67     dataset = pd.DataFrame(data.data['co2'], index=index, columns=['co2'])
     68     #NOTE: this is how I got the missing values in co2.csv

TypeError: __new__() got an unexpected keyword argument 'format'

Solution
Use numpy.rec instead of direct pandas:

df = pd.DataFrame.from_records(sm.datasets.co2.load().data)
df['date'] = df.date.apply(lambda x: x.decode('utf-8'))
df['date'] = pd.to_datetime(df.date, format='%Y%m%d')
df.set_index('date')
@bashtage

This comment has been minimized.

Copy link
Contributor

@bashtage bashtage commented Jul 19, 2019

Fixed. in master.

@bashtage bashtage closed this Jul 19, 2019
@bashtage bashtage added this to the 0.11 milestone Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.