Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase docstring coverage and add doctests #232

Merged
merged 23 commits into from
Sep 15, 2023

Conversation

jpreszler
Copy link
Contributor

This addresses issue #129:

  1. All methods in the api documentation have docstrings with working examples. Documentation was checked locally.
  2. Interrogate has 97% coverage - only things missing are file level docstrings on some test files.
  3. Interrogate failure threshold moved up to 85%

Potential future work:

  • some docstrings are still minimal, so they could be expanded - such as examples of data simulation functions
  • various plot methods don't have output examples in the API reference, links to static images could be added

Main items for review:

  • wording of documentation
  • consistency
  • docstring style

@@ -70,6 +175,8 @@ def __init__(
self._input_validation(data, treatment_time)

self.treatment_time = treatment_time
# set experiment type - usually done in subclasses
self.expt_type = "Pre-Post Fit"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only code change. The experiment type had to be added here in order for the summary method to be run on an instance of the PrePostFit class (rather than the SyntheticControl or InterruptedTimeSeries classes).

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, thanks so much for this @jpreszler. This is amazing work. Timing wise I wanted to do a relatively quick pass to keep the ball rolling. But I've not done a thorough read of all the parameter explanations in the docstrings for example, so I will almost certainly have a few minor points about that before we merge.

One thought I had about the docstring examples was about the maintainability. I've not used this yet, but my feeling is that we will probably avoid major headaches in the future if we test these docstring examples with doctest (e.g. https://realpython.com/python-doctest/; https://docs.python.org/3/library/doctest.html). That may also require changes to the GitHub actions so that these tests are run locally and remotely. If we run into trouble there, then @lucianopaz may be able to give you pointers.

There's a comment in there about when/if we include text output in the docstring examples. I've noted down my first thought, but I'd definitely appreciate input from @juanitorduz and @lucianopaz on this. As I say, I've not used doctest, so it might be that we do need to include (some) text output.

PS: We're likely to merge #213 very soon, so there will be a couple of new classes added there.

I've rendered the docs locally with your changes and they look really great. This is an excellent contribution.

causalpy/skl_experiments.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

Merging #232 (c80d78e) into main (d7a12cb) will increase coverage by 0.02%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##             main     #232      +/-   ##
==========================================
+ Coverage   73.86%   73.89%   +0.02%     
==========================================
  Files          19       19              
  Lines        1148     1149       +1     
==========================================
+ Hits          848      849       +1     
  Misses        300      300              
Files Changed Coverage Δ
causalpy/data/datasets.py 92.30% <ø> (ø)
causalpy/data/simulate_data.py 0.00% <ø> (ø)
causalpy/plot_utils.py 60.00% <ø> (ø)
causalpy/pymc_models.py 100.00% <ø> (ø)
causalpy/skl_experiments.py 66.86% <ø> (ø)
causalpy/skl_models.py 100.00% <ø> (ø)
causalpy/tests/conftest.py 100.00% <ø> (ø)
causalpy/tests/test_data_loading.py 100.00% <ø> (ø)
causalpy/tests/test_input_validation.py 100.00% <ø> (ø)
causalpy/tests/test_integration_pymc_examples.py 100.00% <ø> (ø)
... and 7 more

@twiecki
Copy link
Contributor

twiecki commented Aug 24, 2023

Wow, this is amazing -- much appreciated @jpreszler!

@drbenvincent drbenvincent added the documentation Improvements or additions to documentation label Aug 24, 2023
percentiles = self.causal_impact.quantile([0.03, 1 - 0.03]).values
ci = r"$CI_{94\%}$" + f"[{percentiles[0]:.2f}, {percentiles[1]:.2f}]"
ci = "$CI_{94%}$" + f"[{percentiles[0]:.2f}, {percentiles[1]:.2f}]"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove this \ for linting. With it, the docstring output fails flake8 and doctest issues a deprecation warning for an invalid escape sequence. Is this needed for another reason - such as being able to put these strings into LaTEX documents?

percentiles = self.causal_impact.quantile([0.03, 1 - 0.03]).values
ci = r"$CI_{94\%}$" + f"[{percentiles[0]:.2f}, {percentiles[1]:.2f}]"
ci = r"$CI_{94%}$" + f"[{percentiles[0]:.2f}, {percentiles[1]:.2f}]"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the DiD experiment summary above.

@jpreszler
Copy link
Contributor Author

I've cleaned up the example output, added doctest to the CI workflow and all examples pass locally. I've also added examples for the new IV model and experiment added in #213 .

This is ready for a new review @drbenvincent when you have a chance.

@drbenvincent
Copy link
Collaborator

Hi @jpreszler. Thanks for the updates. I'll try to carve out some time (I now have an 11 day old son now!) to review properly. But in the mean time I triggered the remote checks and it looks like we've got a failure. The test output looks like it could be just a missing import of statsmodels.

@jpreszler
Copy link
Contributor Author

@drbenvincent , it looks like statsmodels wasn't being installed into the remote environment so I added it to the dependencies. That should fix the test failure.

Congratulations on the baby boy!

@jpreszler
Copy link
Contributor Author

Looks like there's a little instability in the summary output that I'm looking into fixing.

@jpreszler
Copy link
Contributor Author

I should have all the instability addressed, the doctests passed in the remote environment on my fork as well as locally.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. I did a quick pass and left a few more change requests. It's really coming along.

causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/tests/test_integration_pymc_examples.py Outdated Show resolved Hide resolved
@jpreszler
Copy link
Contributor Author

@drbenvincent Thanks for the helpful comments. I've made the improvements so this is ready for another look when you have time.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really getting there! Not far to go I think.

Sorry for the iterative nature of this, but having had time to stare at this for a while, I've had some thoughts.

I think the docstring examples are great. But perhaps we've overdone it. I feel that one example for each class makes a lot of sense, but examples for accessing properties (like result.idata) and some/all methods (e.g. result.plot()) are overkill. My feeling is that these are relatively self explanatory and we can perhaps leave it to the example notebooks for users to see more complete worked examples in its full context.

I'm open to counter-arguments, but if you agree then let's just keep one core example per class.

causalpy/data/simulate_data.py Outdated Show resolved Hide resolved
causalpy/skl_experiments.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Outdated Show resolved Hide resolved
@jpreszler
Copy link
Contributor Author

I think there's definitely a bit of redundancy in the examples, and with doctest that adds a lot of time to running tests.

I've reduced the redundancy and moved all meaningful examples (like summary() calls) to a single example for each class and removed plot and a few other low value examples.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. I just noticed 2 minor issues in pymc_models.py:

  1. Under the LinearRegression, there seems to be something going wrong with the note admonition where it says Generally, the .fit()method should be used rather than calling.build_model() directly. This could just be an issue with my local build of the docs, but if it's an issue for you also, let's see if we can fix it.
  2. Under the InstrumentalVariableRegression class, the :code:priors = {"mus":...has a line break so it's separate from the:param priors:` block.

I'm sure there are a bunch of other small errors that we might find or slight improvements, but I'm happy to merge after these minor updates :)

@jpreszler
Copy link
Contributor Author

Those issues were not just local to you. I've fixed them and looked for other problems, but didn't spot much besides the same issue in WeightedSumFitter as in LinearRegression.

The tests have also passed on my fork after some small adjustments.

@drbenvincent
Copy link
Collaborator

Quick question as I've never used doctest before...

As far as I can tell doctests are run with pytest --doctest-modules causalpy/. At the moment I can only see these being triggered with the remote tests. Do you think it makes sense to add brief instructions to CONTRIBUTING.md either under "Pull request checklist" or "Building the documentation locally" to tell people that they should run the doctests locally (and how to do that) before a PR.

@jpreszler
Copy link
Contributor Author

Good call. The same command can run all doctests locally, but I also added a make doctest command to the makefile. I've also added some details to the contributing doc about running all doctests or individual ones. This might be too much for the PR checklist.

This is my first venture into doctests, but the system is pretty straightforward. A good thing to note is that the +NUMBER option that is used by a number of the tests is a pytest extension of doctest. If you use doctest directly (not through pytest) these tests will fail.

@drbenvincent
Copy link
Collaborator

This is great. When running the doctests I just noticed that it produces new files, ancova_data.csv and regression_discontinuity.csv. We should ideally either not produce those files, or have some clean-up.

@jpreszler
Copy link
Contributor Author

That's from a few of the doctests for the simulated data, I've skipped the lines that write out the csvs, but leaving the example of how to do so.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Thanks for the contribution! The first of many perhaps :)

@drbenvincent drbenvincent changed the title Issue 129: increase docstring coverage Increase docstring coverage and add doctests Sep 15, 2023
@drbenvincent drbenvincent merged commit 234a0cd into pymc-labs:main Sep 15, 2023
10 checks passed
@drbenvincent drbenvincent mentioned this pull request Oct 18, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants