DOC: cleanup examples folder and webpage #1292

merged 5 commits into from Jan 6, 2014


None yet

4 participants


I couldn't make the revision history readable, so I just squashed it. I think it's easier to see what I did by looking at the repo rather than the diffs.

Basically, I re-organized examples in 3 subfolders:

  1. examples/notebooks: This is where real work happens for "website-ready" examples
  2. examples/python: I replaced the old python scripts with new ones that were generated by issuing:
    • ipython nbconvert --to python *
    • This is done at commit time (not when the website is built)
  3. examples/sandbox: Example scripts that don't have an accompanying notebook, and that are mostly useful to developers and adventurous users.


  • Putting half-baked examples inside a "sandbox" folder marks them as such, thereby reducing potential for confusion.
  • Using IPython to convert notebooks into scripts allows us to:
    • Keep both scripts and notebooks in the repo
    • Keep notebooks and scripts in sync at essentially zero maintenance cost
  • Looks cleaner

Note that I kept ALL the old code. This is just re-org, no deletion.


Coverage Status

Coverage remained the same when pulling 6dfb6c5 on vincentarelbundock:master into f46421a on statsmodels:master.


did you move the files before overwriting them with nbconvert? Some py files have a change history, most show up as new files.

I don't like it much to loose the hand formatted py files. However, this makes maintenance much easier.
I do use the py example files pretty often to base scripts on them when I need a quick example.

Browing them a bit, the only part I saw so far is that we should delete the In line markers

+# In[ ]:

maybe we can remove them with a global replace.

A question on nbconvert:
Does ipython nbconvert --to python * create the same python files as download as python file in a notebook session directly?
If we maintain them as autoconverted, then it would be good if there are no noisy, irrelevant changes in the file formatting.

calling the folder sandbox is a bit misleading, since my first association was that these are examples for code that is in the sandbox. Maybe we should call the folder dirty in the hope that someone cleans them up. :)

I think we need a review of the examples again, and add new ones for topics that are not yet covered.
There are some new examples in the statsmodels examples folders, and some topics like GEE are still missing example scripts.

  • I removed the files before writing the new ones. Since the whole point is to make maintenance easy by never touching the .py, it made sense to start fresh.
  • I think the loss of continuity in history is a small price to pay for maintainability and convenience. We can still access the example history, we just need to know to look at a different file before today.
  • How about "incomplete" instead of "dirty"? Not quite accurate, but sounds more professional.
  • The proper way to removeIn[]/Out[] lines would be to customize the Jinja2 template that IPython uses to export from notebook to .py.
    • Those were included by design, to allow us to distinguish between cells that have executed and those that represent output code, but they are not really useful in our case, where we only store "un-executed" examples.
    • I think a custom Jinja2 template would be easy to do, but it can probably go on my todo list for future improvements
  • I agree on the need to review. This PR actually fixes a few minor things, like df -> data keywords in various formula calls. More examples are needed and review too.

The proper way to removeIn[]/Out[] lines would be to customize the Jinja2 template that IPython uses to export from notebook to .py.

Would this be possible to add in a generic way or are only developers have to change their Jinja2 templates to regenerate the py files?
Looks complicated to me to make this reproducible across developers.

Can we clean some of it, before we commit the .py files to master ?


Done in vincentarelbundock/statsmodels@0ac8144

Not complicated at all. Just call:

ipython nbconvert --to python *.ipynb --template notebook2python

Using this template that I just put together (named notebook2python.tpl):

{%- extends 'null.tpl' -%} 

{% block input %}
{{ cell.input | ipython2python }}
{% endblock input %}

{# Those Two are for error displaying
even if the first one seem to do nothing, 
it introduces a new line
{% block pyerr %}
{{ super() }}
{% endblock pyerr %}

{% block traceback_line %}
{{ line | indent | strip_ansi }}
{% endblock traceback_line %}
{# .... #}

{% block pyout %}
{{ output.text | indent | comment_lines }}
{% endblock pyout %}

{% block stream %}
{{ output.text | indent | comment_lines }}
{% endblock stream %}

{% block display_data scoped %}
# image file:
{% endblock display_data %}

{% block markdowncell scoped %}
{{ cell.source | comment_lines }}
{% endblock markdowncell %}

{% block headingcell scoped %}
{{ '#' * cell.level }}{{ cell.source | replace('\n', ' ') | comment_lines }}
{% endblock headingcell %}

{% block rawcell scoped %}
{{ cell.source | comment_lines }}
{% endblock rawcell %}

{% block unknowncell scoped %}
unknown type  {{ cell.type }}
{% endblock unknowncell %}

can you also commit the notebook2python.tpl? I didn't see it.

There might still be a few extra blank lines in the py files. But I think they look good overall. They don't look much autogenerated anymore.

Thanks Vincent

@vincentarelbundock vincentarelbundock merged commit 3396b98 into statsmodels:master Jan 6, 2014

merge looks fine, but you didn't branch off master
according to the network you started from the silverman branch, but it looks like it didn't confuse git.



I did branch off master, but I think it tacked me onto the silverman when I rebased and squashed (interactively using -i). I need to get a real understanding of out how rebasing works.

I'll be more careful next time.

jseabold commented Jan 8, 2014

It looks like this broke the automatic generation of these into the docs.

jseabold commented Jan 9, 2014

I hope the fix is simple, but I don't have time to go through all of this right now. Should I revert this merge until the docs build is fixed, so that we have examples available online?


I'll take a look to see if it's an easy fix.


What I tested:

  1. clone master
  2. edit line 9 of tools/ to hard-code the location of my local statsmodels clone (instead of Skipper's)
  3. cd statsmodels/docs
  4. make html

The result includes a proper TOC with all the rendered html examples. Where should I look for the autobuild code?

I uploaded a live version of the site here:

And you can download a zipped version here if it's more convenient:

jseabold commented Jan 9, 2014

What do you have to change line 9 to?

SOURCE_DIR = ("/home/skipper/statsmodels/statsmodels-skipper/examples/"


SOURCE_DIR = ("/Users/vincent/Downloads/statsmodels/examples"
jseabold commented Jan 9, 2014

So they're still in examples/notebooks? Must be a local build problem them. Sorry for the noise, I didn't look at the changes yet.


Yep, the notebooks themselves have not moved at all. Only the python scripts have been moved. And I commented out the EXAMPLEBUILD calls for those example scripts in the Makefile: 3396b98

jseabold commented Jan 9, 2014

Ok, I'll fix the build box and then have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment