New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that tuples are not considered as data, not as data containers #820
Conversation
@@ -679,7 +679,7 @@ def format(self, width, output_format='txt', **fmt_dict): | |||
data_aligns = fmt.get('data_aligns','c') | |||
if isinstance(datatype, int): | |||
datatype = datatype % len(data_fmts) #constrain to indexes | |||
content = data_fmts[datatype] % data | |||
content = data_fmts[datatype] % (data,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might not work if data is already an iterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean there could be formats strings with more than one "%s"? There aren't, and I don't see why there should be, but maybe I'm missing something. What I say is precisely that only this will work if data is already an iterable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this code, and never looked at the details, I sent an email to Alan, the author of SimpleTable.
It's likely that you are right, since this is in a single cell (after looking a bit closer). Then, the change would really be without effect in other cases than tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's precisely my point.
I'll try to produce a test, but 1) I never produced a test in my life, so I'll do that when I have some time, 2) anyway, the test would certainly have formats with one "%s" only, since I can't think of anything different (and anyway, one must change the code to do something different).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to test or come up with something where we have more than a single "%s".
What would be good to have is your original case, a simplified version, where names of columns are tuples.
(writing unit tests is a good skill to practise :)
@toobaz Thank you, can you provide a simple test case, that is made to work with this change. A case with a pandas dataframe in a model.summary would be best. About the import error, the current master requires compiling. Building the extensions is not optional anymore. |
One problem in this is that our "real" test coverage for the summaries and SimpleTable is not very high. So we might be introducing a mistake without getting a test failure. |
If you could file an issue about the build error that would be helpful, so we can fail more gracefully. |
Alan made the same change So this should be good to merge. |
I added a test ( ae1a38b ). It's the first time I write one, so feel free to point out at enhancements. Notice I test only the minimum related to my changes in the code. I couldn't test anyway the whole regression summary, since it contains the date. |
@@ -3,6 +3,8 @@ | |||
from statsmodels.iolib.table import SimpleTable, default_txt_fmt | |||
from statsmodels.iolib.table import default_latex_fmt | |||
from statsmodels.iolib.table import default_html_fmt | |||
import pandas | |||
from statsmodels.api import OLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from statsmodels.regression.linear_model import OLS
It doesn't make much difference in this case, but it's better to import from the actual module (inside statsmodels)
Thank you Pitro Would you please change the import, then we can just merge through github. This branch is only one commit behind master. |
Done. Notice that in that file there are several tests which are disabled (indented). |
I didn't know about the nested functions or never paid enough attention.) It looks like they are indented intentionally, they use objects created in the outer functions. I never checked whether nose runs nested functions. They might be candidates for some test refactoring, but that's not relevant for this pull request. |
I restarted the Travis CI build, because it had an unrelated build/network error |
SimpleTable: Ensure that tuples are considered as data, not as data containers
merging |
SimpleTable: Ensure that tuples are considered as data, not as data containers
I had the same error. "ImportError: cannot import name kalman_loglike". I've installed Cython and installed statsmodels with 'build-ext'. I still get the error. How do I get past it? This is the beginning of the code I'm running where it throws the error: import numpy as np import statsmodels.api as sm df = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/longley.csv', index_col=0) |
If you do |
Yeah ok. Didn't know that. Sorry. |
It can happen (not necessarily by craziness, but because of default names given by pandas.core.groupby.DataFrameGroupBy.agg() ) that names of columns are tuples. If one tries to print a regression summary() in such cases,
My change ensures that tuples are considered as objects for interpolation, not as containers of objects. It should have no effect at all on any data type which is not a tuple.
BTW: tested only on d5ed746 , unfortunately later commits give me an error ("ImportError: cannot import name kalman_loglike", at line 32 of statsmodels/statsmodels/tsa/kalmanf/kalmanfilter.py ). But I did check that the file was not changed by any subsequent commit.