Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Var Models #537

Closed
legant opened this issue Oct 20, 2012 · 7 comments
Closed

Var Models #537

legant opened this issue Oct 20, 2012 · 7 comments

Comments

@legant
Copy link

legant commented Oct 20, 2012

I have a file fr1.txt. A sample of data:

date, price1,price2,price3
2009-11-09 11:05:21pm,3.2,31.4,22.1
2009-11-09 11:07:23pm,40.5,35.8,38.2
2009-11-09 11:43:10pm,11.9,32.1,58.8
2009-11-10 12:22:07am,22.7,65.9,31.8
2009-11-10 1:43:11am,98.32,54.3,21.5
2009-11-10 2:28:59am,95.2,33.4,21.9
2009-11-11 12:33:39am,51.7,32.1,16.9
2009-11-11 12:34:26am,21.5,15.8,10.2
2009-11-11 12:40:16am,31.4,2.4,21.6
2009-11-11 12:45:22am,5.6,2.2,41.9

The code (using the example of here http://statsmodels.sourceforge.net/devel/vector_ar.html#var)

import numpy as np

import pandas as pd

import datetime as datetime

from statsmodels.tsa.api import VAR, SVAR

import matplotlib.pyplot as plt

import statsmodels.api as sm

parse = lambda x: datetime.datetime.strptime(x, '%Y-%m-%d')

dframe = pd.read_table("fr1.txt", delimiter=",", index_col=0, parse_dates=True, date_parser=parse)

ts = dframe.resample("D")

mdata = ts[['price1','price2','price3']]

names = mdata.dtype.names

data = mdata.view((float,3))

data = np.diff(np.log(data), axis=0)

model = sm.tsa.VAR(data, names=names)

results = model.fit(2)

results.summary()

model.plot()

plt.show()

The error:
AttributeError: 'DataFrame' object has no attribute 'dframe'

@josef-pkt
Copy link
Member

you are mixing working with pandas and working with recarrays. The doc example uses recarrays throughout.

I cannot replicate your example because of an exception in date parsing. Also, it is better if you include the full traceback, since I don't know where your exception occurs.
I get an exception with names = mdata.dtype.names

If I remove the date parsing, then it works if I convert pandas to recarray

mdata = ts.to_records()[['price1','price2','price3']]

But I think it would be better to work (or make it to work) with pandas all the way.

Aside: the VAR documentation hasn't been updated in a while, and doesn't take advantage of the pandas integration, or provide examples for it, I guess.

@legant
Copy link
Author

legant commented Oct 20, 2012

Thank you. Traceback:

names = mdata.dtype.names
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1771, in getattr
(type(self).name, name))
AttributeError: 'DataFrame' object has no attribute 'dtype'

Try this for data set:
date,p1,p2,p3
2009-12-12 11:15:39pm,0.375,0.375,0.5
2009-12-12 11:22:10pm,0.2575,0.375,0.0
2009-12-12 11:43:10pm,0.2,0.1,0.3
2009-12-12 11:44:07pm,0,0,0
2009-12-13 12:23:11am,0,1,0
2009-12-13 12:28:59am,0.0,0.0,0.0
2009-12-13 12:33:39am,1,0,1

Can I use pandas for var model?

@josef-pkt
Copy link
Member

I think so, but it needs a datetime index. I get an exception with an integer index. (I don't have a timeseries for pandas handy right now.)

otherwise you can always use

>>> model = sm.tsa.VAR(np.asarray(data), names=data.columns)
E:\Josef\testing\tox\py27b\lib\site-packages\statsmodels-0.5.0-py2.7-win32.egg\statsmodels\tsa\vector_ar\var_model.py:340: FutureWarning: The names argument is deprecated and will be removed in the next release.
  "removed in the next release.", FutureWarning)

I don't know what's the substitute for names argument.

@josef-pkt
Copy link
Member

this needs checking for 0.5

What's the status? Do we have pandas support in VAR?

@jseabold
Copy link
Member

jseabold commented Feb 6, 2013

I don't see why not. If you give a pandas DataFrame to the timeseries models then they expect a DatetimeIndex

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/base/tsa_model.py#L25

The names are done automatically now like the other models. It uses the fields from a recarray or the column names from a DataFrame.

@josef-pkt
Copy link
Member

If it works and we have at least a smoke test, then we can close this issue.

(I'm just trying to figure out what the status with some of the issues is.)

@jseabold
Copy link
Member

jseabold commented Feb 6, 2013

I think this should be covered. Looks like this was a problem with trying to use a DataFrame as a structured array by following the docs not something internal. Maybe leave it open as a reminder to update the docs?

Thanks for the maintenance work. Am hoping to be able to pitch in in March after I meet a few deadlines and get a release out the door. We were bad about bug-fixes / features in this release cycle (again) and have some stuff that needs to be released (and, of course, some stuff that's not and might not be ready).

jseabold added a commit that referenced this issue Apr 9, 2013
DOC: Update VAR docs. Closes #537.
PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants