Ensure that tuples are not considered as data, not as data containers #820

toobaz · 2013-05-08T13:55:42Z

It can happen (not necessarily by craziness, but because of default names given by pandas.core.groupby.DataFrameGroupBy.agg() ) that names of columns are tuples. If one tries to print a regression summary() in such cases,

if the column is used as a dependent variable, there will be an error,
if the column is used as an independent variable, "%s" will be printed instead than its name.

My change ensures that tuples are considered as objects for interpolation, not as containers of objects. It should have no effect at all on any data type which is not a tuple.

BTW: tested only on d5ed746 , unfortunately later commits give me an error ("ImportError: cannot import name kalman_loglike", at line 32 of statsmodels/statsmodels/tsa/kalmanf/kalmanfilter.py ). But I did check that the file was not changed by any subsequent commit.

josef-pkt · 2013-05-08T14:06:16Z

statsmodels/iolib/table.py

@@ -679,7 +679,7 @@ def format(self, width, output_format='txt', **fmt_dict):
        data_aligns = fmt.get('data_aligns','c')
        if isinstance(datatype, int):
            datatype = datatype % len(data_fmts) #constrain to indexes
-            content = data_fmts[datatype] % data
+            content = data_fmts[datatype] % (data,)


this might not work if data is already an iterable

You mean there could be formats strings with more than one "%s"? There aren't, and I don't see why there should be, but maybe I'm missing something. What I say is precisely that only this will work if data is already an iterable.

I don't know this code, and never looked at the details, I sent an email to Alan, the author of SimpleTable.

It's likely that you are right, since this is in a single cell (after looking a bit closer). Then, the change would really be without effect in other cases than tuple.

That's precisely my point.

I'll try to produce a test, but 1) I never produced a test in my life, so I'll do that when I have some time, 2) anyway, the test would certainly have formats with one "%s" only, since I can't think of anything different (and anyway, one must change the code to do something different).

You don't need to test or come up with something where we have more than a single "%s".

What would be good to have is your original case, a simplified version, where names of columns are tuples.

(writing unit tests is a good skill to practise :)

josef-pkt · 2013-05-08T14:13:25Z

@toobaz Thank you, can you provide a simple test case, that is made to work with this change.

A case with a pandas dataframe in a model.summary would be best.
It could be that we should "sanitize" the names already in the model data handling, where we know more about the meaning of the names/tuples.

About the import error, the current master requires compiling. Building the extensions is not optional anymore.

josef-pkt · 2013-05-08T14:15:03Z

One problem in this is that our "real" test coverage for the summaries and SimpleTable is not very high. So we might be introducing a mistake without getting a test failure.

jseabold · 2013-05-08T14:16:38Z

If you could file an issue about the build error that would be helpful, so we can fail more gracefully.

josef-pkt · 2013-05-08T19:39:24Z

Alan made the same change

https://code.google.com/p/econpy/source/diff?spec=svn197&r=197&format=side&path=/trunk/utilities/table.py

So this should be good to merge.
(our version differs a bit)

toobaz · 2013-05-11T15:06:32Z

I added a test ( ae1a38b ). It's the first time I write one, so feel free to point out at enhancements.

Notice I test only the minimum related to my changes in the code. I couldn't test anyway the whole regression summary, since it contains the date.

josef-pkt · 2013-05-11T16:19:40Z

statsmodels/iolib/tests/test_table.py

@@ -3,6 +3,8 @@
 from statsmodels.iolib.table import SimpleTable, default_txt_fmt
 from statsmodels.iolib.table import default_latex_fmt
 from statsmodels.iolib.table import default_html_fmt
+import pandas
+from statsmodels.api import OLS


from statsmodels.regression.linear_model import OLS

It doesn't make much difference in this case, but it's better to import from the actual module (inside statsmodels)

josef-pkt · 2013-05-11T16:26:36Z

Thank you Pitro
The test looks good, matches the style of the SimpleTable tests, which is different from our usual nose based tests.

Would you please change the import, then we can just merge through github. This branch is only one commit behind master.

toobaz · 2013-05-11T18:31:25Z

Done. Notice that in that file there are several tests which are disabled (indented).

josef-pkt · 2013-05-11T18:48:22Z

I didn't know about the nested functions or never paid enough attention.)

It looks like they are indented intentionally, they use objects created in the outer functions.

I never checked whether nose runs nested functions. They might be candidates for some test refactoring, but that's not relevant for this pull request.

josef-pkt · 2013-05-11T18:52:55Z

I restarted the Travis CI build, because it had an unrelated build/network error

SimpleTable: Ensure that tuples are considered as data, not as data containers

josef-pkt · 2013-05-11T19:26:08Z

merging
Thanks Pietro

SimpleTable: Ensure that tuples are considered as data, not as data containers

krrishsgk · 2014-10-29T14:04:55Z

I had the same error. "ImportError: cannot import name kalman_loglike". I've installed Cython and installed statsmodels with 'build-ext'. I still get the error. How do I get past it? This is the beginning of the code I'm running where it throws the error:

import numpy as np
import pandas as pd

import statsmodels.api as sm

df = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/longley.csv', index_col=0)
df.head()

jseabold · 2014-10-29T14:14:29Z

If you do python setup.py build_ext --inplace you have to import from the source tree unless you next install it with something like python setup.py install --user. Can you ask on the mailing list instead of on this issue for help?

krrishsgk · 2014-10-30T05:50:18Z

Yeah ok. Didn't know that. Sorry.

Ensure that tuples are not considered as data, not as data containers

380f73f

josef-pkt reviewed May 8, 2013
View reviewed changes

toobaz mentioned this pull request May 8, 2013

Fail gracefully when extensions are not built #821

Closed

Test with tuple variable names

ae1a38b

josef-pkt reviewed May 11, 2013
View reviewed changes

Import OLS directly from module

6e3b03a

josef-pkt added a commit that referenced this pull request May 11, 2013

Merge pull request #820 from toobaz/fix_tuples_printout

9bd0cbd

SimpleTable: Ensure that tuples are considered as data, not as data containers

josef-pkt merged commit 9bd0cbd into statsmodels:master May 11, 2013

toobaz deleted the fix_tuples_printout branch February 28, 2014 12:53

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#820 from toobaz/fix_tuples_printout

31667ad

SimpleTable: Ensure that tuples are considered as data, not as data containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that tuples are not considered as data, not as data containers #820

Ensure that tuples are not considered as data, not as data containers #820

toobaz commented May 8, 2013

josef-pkt May 8, 2013

toobaz May 8, 2013

josef-pkt May 8, 2013

toobaz May 8, 2013

josef-pkt May 8, 2013

josef-pkt commented May 8, 2013

josef-pkt commented May 8, 2013

jseabold commented May 8, 2013

josef-pkt commented May 8, 2013

toobaz commented May 11, 2013

josef-pkt May 11, 2013

josef-pkt commented May 11, 2013

toobaz commented May 11, 2013

josef-pkt commented May 11, 2013

josef-pkt commented May 11, 2013

josef-pkt commented May 11, 2013

krrishsgk commented Oct 29, 2014

jseabold commented Oct 29, 2014

krrishsgk commented Oct 30, 2014

Ensure that tuples are not considered as data, not as data containers #820

Ensure that tuples are not considered as data, not as data containers #820

Conversation

toobaz commented May 8, 2013

josef-pkt May 8, 2013

Choose a reason for hiding this comment

toobaz May 8, 2013

Choose a reason for hiding this comment

josef-pkt May 8, 2013

Choose a reason for hiding this comment

toobaz May 8, 2013

Choose a reason for hiding this comment

josef-pkt May 8, 2013

Choose a reason for hiding this comment

josef-pkt commented May 8, 2013

josef-pkt commented May 8, 2013

jseabold commented May 8, 2013

josef-pkt commented May 8, 2013

toobaz commented May 11, 2013

josef-pkt May 11, 2013

Choose a reason for hiding this comment

josef-pkt commented May 11, 2013

toobaz commented May 11, 2013

josef-pkt commented May 11, 2013

josef-pkt commented May 11, 2013

josef-pkt commented May 11, 2013

krrishsgk commented Oct 29, 2014

jseabold commented Oct 29, 2014

krrishsgk commented Oct 30, 2014