Johansen method doesn't give correct index values #1763

Closed
mapsa opened this Issue Jun 16, 2014 · 9 comments

Projects

None yet

3 participants

@mapsa
mapsa commented Jun 16, 2014

The coint_johansen at (statsmodels/tsa/johansen.py) method rewrite the correct index values in the last for loop. I've commented the line 213 and the index values are the correct now. Please check if this is correct.
Thanks

@jseabold
Member

Can you provide a link to the file and line you're talking about?

@josef-pkt
Member

It's likely that this is wrong given a brief look, but I need to check to be sure

The unit tests look only at cases where the index is sorted (which won't help to check this)
https://github.com/statsmodels/statsmodels/pull/453/files#diff-b8606f9b5bf6ab1cb94a6641dc1a2335R20

@mapsa
mapsa commented Jun 16, 2014

I checked the function with the data used in the book (monthly mining employment for il,in,ky,mi,oh,pa,tn,wv) and result.ind is correct if you comment that line.
The matlab code would have the same issue.

@josef-pkt
Member

@mapsa

The line is identical to the matlab code, but the sorting code above this is different because I wrote it in a different way. Now I'm not sure whether the unit test ever check the correct order of the results arrays.

Do you know if all the other results arrays have the order that correspond to the example in your book?
Which book are you using?

The way it's currently written, line 213 doesn't make much sense since it's always just range(m)

What is the index ind in your book?

After looking at this (including the matlab comments), I think the index ind or aind is redundant and is always range(m).

There is no relationship to the original series/columns, and the results are sorted by decreasing likelihood of being a cointegrating vector.
We are sorting eigenvalues and eigenvector by decreasing eigenvalues.
Then, we calculate the test statistics which are also sorted,
Then, we can find the cut-off for the hypothesis that the first r eigenvectors are cointegrating vectors.
(or something close to this, if I interpret the code correctly. It has been a while that I looked at this.)

which would mean that we can just delete ind from the results.

@mapsa can you provide more details for where this differs from the example in your book?

@mapsa
mapsa commented Jun 16, 2014

@josef-pkt
The book I'm following is http://www.spatial-econometrics.com/html/wbook.pdf (page 120). The "test.dat" file can be download from http://www.spatial-econometrics.com/html/datasets.zip. If you run

nlag = 9
result = johansen(data,0, nlag)

you will see that result.ind is [0, 1, 2, 3, 4, 5, 6, 7] but the correct output is:
result.ind = [0, 1, 2, 4, 5, 6, 7, 3] which is obtained if line 213 is commented (since it doesn't make sense).

Thanks

@josef-pkt
Member

I only find johansen on page 226, and don't see the index mentioned.
I don't see what the ordering of states in the tables should mean. An ordering of states doesn't follow from the cointegration test. (except vary vaguely which is not in the calculation)

Note the version of the wbook has date 1998, that matlab code has a comment about bugfixes and changes made in 4/10/2000.
There are also comments in the matlab code that ordering has changed and that To preserve existing programming, aind is reset back to 1, 2, 3, .... which most likely means that in the previous version the eigenvectors where not sorted by descending eigenvalues.

The longer I think about this, the more convinced I become that ind/aind is redundant. There is no index that would make sense.
And the only relationship that we could get with states i.e. columns of the original array is if we look for cases where only a few states are involved in a cointegrating relationship, which would mean checking to see whether cointegrating vectors are dominated by a few variables/states. (which I guess will usually not be the case.)

@mapsa
mapsa commented Jun 16, 2014

Thanks for your answer. Now, I understood the code comments.
I agree that ind is redundant since the eigenvectors were previously sorted, but I wanted to know which variable were not cointegrated with the rest so I needed that ind value. Thanks again.

@mapsa mapsa closed this Jun 16, 2014
@josef-pkt
Member

but I wanted to know which variable were not cointegrated with the rest so I needed that ind value

I don't think we can tell, but there might be additional calculations or tests to check whether a variable is not cointegrated with the others.

If you find any recommendations that would help, then you could open an issue about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment