DOC: cleanup examples folder and webpage #1292

Merged
merged 5 commits into from Jan 6, 2014

Projects

None yet

4 participants

@vincentarelbundock
Member

I couldn't make the revision history readable, so I just squashed it. I think it's easier to see what I did by looking at the repo rather than the diffs.

https://github.com/vincentarelbundock/statsmodels/tree/master/examples

Basically, I re-organized examples in 3 subfolders:

  1. examples/notebooks: This is where real work happens for "website-ready" examples
  2. examples/python: I replaced the old python scripts with new ones that were generated by issuing:
    • ipython nbconvert --to python *
    • This is done at commit time (not when the website is built)
  3. examples/sandbox: Example scripts that don't have an accompanying notebook, and that are mostly useful to developers and adventurous users.

Advantages:

  • Putting half-baked examples inside a "sandbox" folder marks them as such, thereby reducing potential for confusion.
  • Using IPython to convert notebooks into scripts allows us to:
    • Keep both scripts and notebooks in the repo
    • Keep notebooks and scripts in sync at essentially zero maintenance cost
  • Looks cleaner
@vincentarelbundock
Member

Note that I kept ALL the old code. This is just re-org, no deletion.

@coveralls

Coverage Status

Coverage remained the same when pulling 6dfb6c5 on vincentarelbundock:master into f46421a on statsmodels:master.

@josef-pkt
Member

did you move the files before overwriting them with nbconvert? Some py files have a change history, most show up as new files.

I don't like it much to loose the hand formatted py files. However, this makes maintenance much easier.
I do use the py example files pretty often to base scripts on them when I need a quick example.

Browing them a bit, the only part I saw so far is that we should delete the In line markers

+
+# In[ ]:
+

maybe we can remove them with a global replace.

A question on nbconvert:
Does ipython nbconvert --to python * create the same python files as download as python file in a notebook session directly?
If we maintain them as autoconverted, then it would be good if there are no noisy, irrelevant changes in the file formatting.

calling the folder sandbox is a bit misleading, since my first association was that these are examples for code that is in the sandbox. Maybe we should call the folder dirty in the hope that someone cleans them up. :)

Related:
I think we need a review of the examples again, and add new ones for topics that are not yet covered.
There are some new examples in the statsmodels examples folders, and some topics like GEE are still missing example scripts.

@vincentarelbundock
Member
  • I removed the files before writing the new ones. Since the whole point is to make maintenance easy by never touching the .py, it made sense to start fresh.
  • I think the loss of continuity in history is a small price to pay for maintainability and convenience. We can still access the example history, we just need to know to look at a different file before today.
  • How about "incomplete" instead of "dirty"? Not quite accurate, but sounds more professional.
  • The proper way to removeIn[]/Out[] lines would be to customize the Jinja2 template that IPython uses to export from notebook to .py.
    • Those were included by design, to allow us to distinguish between cells that have executed and those that represent output code, but they are not really useful in our case, where we only store "un-executed" examples.
    • I think a custom Jinja2 template would be easy to do, but it can probably go on my todo list for future improvements
  • I agree on the need to review. This PR actually fixes a few minor things, like df -> data keywords in various formula calls. More examples are needed and review too.
@josef-pkt
Member

The proper way to removeIn[]/Out[] lines would be to customize the Jinja2 template that IPython uses to export from notebook to .py.

Would this be possible to add in a generic way or are only developers have to change their Jinja2 templates to regenerate the py files?
Looks complicated to me to make this reproducible across developers.

Can we clean some of it, before we commit the .py files to master ?

@vincentarelbundock
Member

Done in vincentarelbundock/statsmodels@0ac8144

Not complicated at all. Just call:

ipython nbconvert --to python *.ipynb --template notebook2python

Using this template that I just put together (named notebook2python.tpl):

{%- extends 'null.tpl' -%} 

{% block input %}
{{ cell.input | ipython2python }}
{% endblock input %}

{# Those Two are for error displaying
even if the first one seem to do nothing, 
it introduces a new line
#}
{% block pyerr %}
{{ super() }}
{% endblock pyerr %}

{% block traceback_line %}
{{ line | indent | strip_ansi }}
{% endblock traceback_line %}
{# .... #}

{% block pyout %}
{{ output.text | indent | comment_lines }}
{% endblock pyout %}

{% block stream %}
{{ output.text | indent | comment_lines }}
{% endblock stream %}

{% block display_data scoped %}
# image file:
{% endblock display_data %}

{% block markdowncell scoped %}
{{ cell.source | comment_lines }}
{% endblock markdowncell %}

{% block headingcell scoped %}
{{ '#' * cell.level }}{{ cell.source | replace('\n', ' ') | comment_lines }}
{% endblock headingcell %}

{% block rawcell scoped %}
{{ cell.source | comment_lines }}
{% endblock rawcell %}

{% block unknowncell scoped %}
unknown type  {{ cell.type }}
{% endblock unknowncell %}
@josef-pkt
Member

can you also commit the notebook2python.tpl? I didn't see it.

There might still be a few extra blank lines in the py files. But I think they look good overall. They don't look much autogenerated anymore.

Thanks Vincent

@vincentarelbundock vincentarelbundock merged commit 3396b98 into statsmodels:master Jan 6, 2014
@josef-pkt
Member

merge looks fine, but you didn't branch off master
according to the network you started from the silverman branch, but it looks like it didn't confuse git.

@vincentarelbundock
Member

Huh?

I did branch off master, but I think it tacked me onto the silverman when I rebased and squashed (interactively using -i). I need to get a real understanding of out how rebasing works.

I'll be more careful next time.

@jseabold
Member
jseabold commented Jan 8, 2014

It looks like this broke the automatic generation of these into the docs.

http://statsmodels.sourceforge.net/devel/examples/index.html

@jseabold
Member
jseabold commented Jan 9, 2014

I hope the fix is simple, but I don't have time to go through all of this right now. Should I revert this merge until the docs build is fixed, so that we have examples available online?

@josef-pkt
Member

I'll take a look to see if it's an easy fix.

@vincentarelbundock
Member

What I tested:

  1. clone master
  2. edit line 9 of tools/nbgenerate.py to hard-code the location of my local statsmodels clone (instead of Skipper's)
  3. cd statsmodels/docs
  4. make html

The result includes a proper TOC with all the rendered html examples. Where should I look for the autobuild code?

I uploaded a live version of the site here: http://umich.edu/~varel/sm

And you can download a zipped version here if it's more convenient:

http://umich.edu/~varel/sm.zip

@jseabold
Member
jseabold commented Jan 9, 2014

What do you have to change line 9 to?

@vincentarelbundock
Member
SOURCE_DIR = ("/home/skipper/statsmodels/statsmodels-skipper/examples/"

becomes

SOURCE_DIR = ("/Users/vincent/Downloads/statsmodels/examples"
@jseabold
Member
jseabold commented Jan 9, 2014

So they're still in examples/notebooks? Must be a local build problem them. Sorry for the noise, I didn't look at the changes yet.

@vincentarelbundock
Member

Yep, the notebooks themselves have not moved at all. Only the python scripts have been moved. And I commented out the EXAMPLEBUILD calls for those example scripts in the Makefile: 3396b98

@jseabold
Member
jseabold commented Jan 9, 2014

Ok, I'll fix the build box and then have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment