Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST: Docstring edits and variable name changes for clarity #4686

Merged
merged 6 commits into from
May 30, 2018

Conversation

kshedden
Copy link
Contributor

@jbrockmendel if you have other suggestions for naming, documentation, and testing of the two SMW routines let me know, we can add them here. So far, I don't see any indication that there is anything materially wrong with this code.

@@ -47,8 +49,7 @@ def __init__(self, meth, irfs, ds_ix):
self.loglike = getattr(lme_r_results, "loglike" + bname)

if hasattr(lme_r_results, "ranef_mean" + bname):
self.ranef_postmean = getattr(lme_r_results, "ranef_mean"
+ bname)
self.ranef_postmean = getattr(lme_r_results, "ranef_mean" + bname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to cause flake8 complaints about lines longer than 80 characters? Not a big deal, but if we're being picky...

assert_allclose(
result.fittedvalues.iloc[0:4],
np.r_[-0.101549, 0.028613, -0.224621, -0.126295],
rtol=1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The older formatting here is a pattern I kinda like. Any particular reason for disliking it?

@@ -417,8 +440,8 @@ def test_dietox(self):
assert_allclose(result.cov_re, 39.82097, rtol=1e-5)

# logLik(rm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logLik is almost certainly out of date, doesn't serve as a very useful comment. Not sure what its replacement would be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are the R code required to extract the relevant values for comparison.

df = pd.DataFrame(
exog[:, 1:], columns=[
"x1",
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking the columns list into three lines is pretty ugly. Standard thing to do here would be to keep the original but remove the trailing , inside the columns list.

@@ -724,23 +767,27 @@ def test_summary(self):
# Test that the summary correctly includes all variables.
summ = self.res.summary()
desired = ["const", "x1", "x2", "Group Var"]
actual = summ.tables[1].index.values # Second table is summary of params
actual = summ.tables[
1].index.values # Second table is summary of params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for adding the extra space before the #, but does the line need to be split like this? It's not a pattern most readers are used to.

@@ -808,18 +854,17 @@ def do1(reml, irf, ds_ix):
fnames = os.listdir(rdir)
fnames = [x for x in fnames if x.startswith("lme") and x.endswith(".csv")]

import itertools
import nose.tools
import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for imports at the top of the file (and I think removing a duplicated import)


# Copied from bashtage's #3847
@nose.tools.nottest
@pytest.mark.parametrize('fname,reml,irf',
itertools.product(fnames, [False, True], [False, True]))
itertools.product(fnames, [False, True],
[False, True]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this is being futzed with, the idiomatic way to do this would be

@pytest.mark.parametrize('fname', fnames)
@pytest.mark.parametrize('reml', [False, True])
@pytest.mark.parametrize('irf', [False, True])
def test_r(fname, reml, irf):

sp2 = np.array([ 3.48416861, 0.55287862, 1.38537901])
mod2 = MixedLM.from_formula('X ~ Y', d, groups=d ['IDS'])
sp2 = np.array([3.48416861, 0.55287862, 1.38537901])
mod2 = MixedLM.from_formula('X ~ Y', d, groups=d['IDS'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for whitespace fixups here

for q in 4, 8:
for r in 2, 3:
for s in 0, 0.5:
tester(p, q, r, s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. This is a perfect fit for pytest.parametrize instead of a for-loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and brackets around the for-loop args pls

for q in 4, 8:
for r in 2, 3:
for s in 0, 0.5:
tester(p, q, r, s)


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if deleting this is an option, it's ready to go

@jbrockmendel
Copy link
Contributor

A few comments, but generally I like this a lot.

What do you think of putting the smw functions in tools.linalg (and moving corresponding tests)? I advocate it based on a) if a reader were to look for such a function, that'd be the natural place, and b) collecting easy-to-test-in-isolation code in tools can help us move toward cordoning off well-tested from not-well-tested code. (Totally OK for the idea to be out-of-scope for this PR)

@coveralls
Copy link

coveralls commented May 26, 2018

Coverage Status

Coverage increased (+0.001%) to 83.076% when pulling a4da0c7 on kshedden:smw_dims into 6611e87 on statsmodels:master.

@codecov-io
Copy link

codecov-io commented May 27, 2018

Codecov Report

Merging #4686 into master will increase coverage by <.01%.
The diff coverage is 94.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4686      +/-   ##
==========================================
+ Coverage   80.48%   80.48%   +<.01%     
==========================================
  Files         567      567              
  Lines       86707    86714       +7     
  Branches     9772     9780       +8     
==========================================
+ Hits        69787    69794       +7     
+ Misses      14675    14674       -1     
- Partials     2245     2246       +1
Impacted Files Coverage Δ
statsmodels/regression/mixed_linear_model.py 81.81% <ø> (ø) ⬆️
statsmodels/regression/tests/test_lme.py 89.09% <94.35%> (+0.09%) ⬆️
statsmodels/stats/descriptivestats.py 24.13% <0%> (ø) ⬆️
statsmodels/imputation/bayes_mi.py 93.28% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6611e87...d90ca89. Read the comment docs.


B = np.zeros((q, q))
c = np.random.normal(size=(r, r))
B[0:r, 0:r] = np.dot(c.T, c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other tester func sets B[0:r, 0:r] = c (effectively). Is the difference intentional, and if not, can these share some code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The construction of B needs to be different. In the log determinant calculation, B needs to be positive definite, but for the solver B only needs to be non-singular.

@kshedden
Copy link
Contributor Author

@jbrockmendel, thanks for the review. I updated some of the formatting based on your suggestions.

If no further comments I will merge.

Moving the smw functions to the tools module sounds like a good idea. Let's do that as a separate PR.

@jbrockmendel
Copy link
Contributor

Sounds good, thanks for explaining. BTW the randomized tests are fixed by enforcing q < p.

@kshedden
Copy link
Contributor Author

Merging (the appveyor failure is due to a timeout on their end, and it returned green before the last commit, which only changed a comment string)

@kshedden kshedden merged commit 83c0327 into statsmodels:master May 30, 2018
@josef-pkt josef-pkt added this to the 0.10 milestone Sep 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants