New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: Allow sorting subsection files #281

Merged
merged 8 commits into from Aug 6, 2017

Conversation

Projects
None yet
4 participants
@larsoner
Contributor

larsoner commented Aug 2, 2017

Closes #275.
Closes #278. (Take over of this PR.)

This adds sort keys for sorting within subsections. I don't need explicit reordering currently (filename-based is good enough for now) so I didn't bother implementing it for now. But it at least sets up the framework for it later.

WIP for now because I'm not updating the doc until #279 is merged and I can rebase to avoid conflicts.

@larsoner larsoner changed the title from Sortkeyex to WIP: Allow sorting subsection files Aug 2, 2017

@larsoner larsoner force-pushed the larsoner:sortkeyex branch from 4d9bdc9 to d7e00c2 Aug 2, 2017

@larsoner larsoner referenced this pull request Aug 2, 2017

Merged

MRG: Improve doc #205

2 of 2 tasks complete

@larsoner larsoner force-pushed the larsoner:sortkeyex branch from d7e00c2 to c51c93e Aug 4, 2017

@larsoner larsoner changed the title from WIP: Allow sorting subsection files to MRG: Allow sorting subsection files Aug 4, 2017

@larsoner

This comment has been minimized.

Contributor

larsoner commented Aug 4, 2017

Rebased, docs added, and CIs are happy. Ready for review/merge from my end.

@larsoner larsoner referenced this pull request Aug 4, 2017

Merged

MRG+4: Epochs metadata #4414

2 of 2 tasks complete
@choldgraf

In general I think this looks great - I have a few little comments and one less-little one about assuming where the example titles are in the .py files...other than this it LGTM

by the amount of code, i.e. :class:`sphinx_gallery.sorting.AmountOfCodeSortKey`
as::
from sphinx_gallery.sorting import AmountOfCodeSortKey

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

Why not NumberOfLines sortkey? I know that's kind of a bikeshed but I think it's more explicit

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

It's not just number of lines but more number of code characters IIUC

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

Sorry, number of code lines rather (which you note below)

Sorting gallery examples
========================
Within a given gallery (sub)section, the example files by default are ordered

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

Do we have any short description in the docs that just explains that ordering in general is handled by "sorting keys" which are a way of defining a custom sorting behavior with the sorted function? Maybe that's too much info but I was confused the first time I saw sort keys being used.

``within_subsection_order``:
- :class:`sphinx_gallery.sorting.FileSizeSortKey` to sort by file size.
- :class:`sphinx_gallery.sorting.FileNameSortKey` to sort by file name.

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

just a thought on ExplicitFilenameSortKey (which I know you aren't implementing here): I think it would get really confusing if we implemented this, since right now doing explicit sorting for folders requres that all folders are listed. We couldn't do the same thing for explicit filenames because there would be too many IMO

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

Yeah we'll have to think harder about that one, but I don't think that the current API will make the problem harder necessarily

listdir = [fname for fname in os.listdir(src_dir)
if fname.endswith('.py')]
sorted_listdir = sorted(
listdir, key=gallery_conf['within_subsection_order'](src_dir))

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

I think we should have a check somewhere that sees if one of the allowed keys are given, and if not (e.g., if somebody use some other wonky object) we'd throw an error that says "this must be one of XXX sortkeys"

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

But people in principle can use their own

@@ -385,29 +387,25 @@ def generate_dir_rst(src_dir, target_dir, gallery_conf, seen_backrefs):
'Generating gallery for %s ' % build_target_dir,
length=len(sorted_listdir))
for fname in iterator:
intro, amount_of_code, time_elapsed = generate_file_rst(
intro, time_elapsed = generate_file_rst(

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

I like that this is being factored out

def __call__(self, filename):
src_file = os.path.normpath(os.path.join(self.src_dir, filename))
file_conf, script_blocks = split_code_and_text_blocks(src_file)

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

ah I see that it isn't strictly NumberOfLines per-se, since you're only counting the code lines. I think it's fine to keep it AmountOfCode in that case, but maybe clarify in the docs that this will count only code lines and not all the lines in the .py files

The source directory.
"""
def __call__(self, filename):

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

I think this will break for many people's sphinx-gallery setups, as there are a few projects (ahem, MNE-python being one of them) that put sphinx tags just above their titles. I feel like I opened an issue about this a while back but the dev team wasn't keen on supporting that use-case...

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

Ah yes I should probably parse the title however it's found and parsed elsewhere (and refactor to DRY)

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

Actually this already uses split_code_and_text_blocks which calls get_docstring_and_rest, and script_blocks[0] is the docstring, which is already cleaned. I even added this to the code:

        assert script_blocks[0][1].strip().split('\n')[0][0] == '='

And it passed, so we should be okay here.

_check_order(config_app, 'filesize')
@pytest.mark.conf_file(content="""

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

These tests feel like magic to me

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

The conf stuff? Agreed but I copy pasted (then mostly simplified) from earlier code so I assumed thus was best practice for the repo

@larsoner

This comment has been minimized.

Contributor

larsoner commented Aug 5, 2017

Okay @choldgraf I changed the name to NumberOfCodeLinesSortKey, added a brief bit to get people comfortable with sorted / key, and added some junk comment lines at the top of one the plot_1.py test file to ensure the title-ordering doesn't care about code comments. Updated rendered doc is here:

https://170-25860190-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/rtd_html/advanced_configuration.html#sorting-gallery-examples

@@ -1,7 +1,9 @@
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

no no - it's not encoding lines, it's sphinx references. People put stuff like

.. _my_ref:

My title
--------

which you're technically not needed to do since SG auto generates references, but people do it anyway :-)

This comment has been minimized.

@larsoner

larsoner Aug 5, 2017

Contributor

Ahh okay, I'll probably have to do something smarter then, I'll look into it more.

}
However, other options exist that can be instantiated for use with
However, multiple convenience classes are provided for use with

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

If you're going to make edits can you make this "In addition" instead of "However"?

If there aren't any more commits needed then I can edit and then merge (but see my other comment)

@larsoner larsoner force-pushed the larsoner:sortkeyex branch from 4231956 to 933d51c Aug 5, 2017

@@ -153,12 +153,15 @@ def codestr2rst(codestr, lang='python', lineno=None):
return code_directive + indented_block
def extract_intro(filename, docstring):
def extract_intro_title(filename, docstring):

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

this naming makes me thing that it extracts just the title of the introduction. could we make it extract_intro_and_title?

""" Extract the first paragraph of module-level docstring. max:95 char"""
# lstrip is just in case docstring has a '\n\n' at the beginning
paragraphs = docstring.lstrip().split('\n\n')
paragraphs = [p for p in paragraphs if not p.startswith('..')]

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

I think this is going to fix some other bugs in the process 🎉

if len(paragraphs) > 1:
title = paragraphs[0].split('\n')
title = ' '.join(t for t in title if t[0] not in ['=', '-', '^', '~'])

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

Maybe this could be a list constant somewhere? RST_TITLE_CHARACTERS?

@@ -172,6 +172,7 @@ def _check_order(config_app, key):
regex = '.*:%s=(.):.*' % key
with codecs.open(index_fname, 'r', 'utf-8') as fid:
for line in fid:
print(line)

This comment has been minimized.

@choldgraf

choldgraf Aug 5, 2017

Contributor

just making sure this is intentional and not for debugging

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented Aug 5, 2017

In general this looks great - a few small comments from me then LGTM

@larsoner

This comment has been minimized.

Contributor

larsoner commented Aug 5, 2017

@choldgraf done, I made the check for RST title chars better based on their official description and renamed the function.

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented Aug 5, 2017

nice - so it sounds like this will work if you either put === just after, or both before and after the title, yeah?

LGTM...I'll give it 24 hrs and merge if nobody else brings anything up. That OK w/ you?

@larsoner

This comment has been minimized.

Contributor

larsoner commented Aug 5, 2017

nice - so it sounds like this will work if you either put === just after, or both before and after the title, yeah?

Yeah and RST supports both so we should be good there.

Ready for merge from my end.

@choldgraf choldgraf merged commit 7263a18 into sphinx-gallery:master Aug 6, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@choldgraf

This comment has been minimized.

Contributor

choldgraf commented Aug 6, 2017

🎉 🎉 🎉

@lesteve

This comment has been minimized.

Contributor

lesteve commented Aug 7, 2017

Sorry I completely missed this one. I think this is great stuff. Quickly looking at the diff, I have the following comments though:

I think we should have an instance (rather than a class) as the value associated with the within_subsection_order to be consistent with subsection_order. Another strong reason is that some people may want to control the order explicity (and I am hoping ExplicitOrder may be used for this).

Another consistency problem, it would be good to have a consistent naming for sorting keys (i.e. finishing with Order or SortKey).

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented Aug 7, 2017

I think we should have an instance (rather than a class) as the value

I think this is a good point. No harm in having people give NumberOfLinesSortKey() and I agree it keeps things more standard. Would we need to change the code at all or would the sorting behavior stay the same regardless of whether its an instance or a class?

it would be good to have a consistent naming for sorting keys

+1, either is fine with me but maybe I have a small preference for Order. So we do ExplicitOrder, NumberOfLinesOrder, AmountOfCodeOrder etc. @Eric89GXL do you have thoughts?

@larsoner

This comment has been minimized.

Contributor

larsoner commented Aug 7, 2017

I think we should have an instance (rather than a class) as the value

What eventually gets called is e.g. instance = FileNameSortKey(src_dir); instance(filename). We could pass an instance, but then __call__ would need to take src_dir as an arg, and return something (a function?) that would then operate on fname. Is this preferred?

This is also why the subsection ordering and within-subsection ordering need to be kept separate -- the former operates on effectively just ExplictOrder(order).__call__(fname) and the other on what would need to be ExplicitExampleOrder(order)(src_dir)(fname). So I'm not sure ExplicitOrder can be used as a key using the existing interface.

I used the name SortKey to make it clear that it will be used as a key argument to the sorted function. Until just now, I actually didn't know if ExplictOrder needed to return ints (i.e., the index in the order), or if it also used as a sort key (i.e., just needed to provide anything that could be ordered by sort) when I looked deeper into the code. I'm fine with *Order, too, but we should document in the code that it's really just used as a sort key under the hood so the next dev knows immediately upon looking at the class.

@lesteve

This comment has been minimized.

Contributor

lesteve commented Aug 8, 2017

This is also why the subsection ordering and within-subsection ordering need to be kept separate -- the former operates on effectively just ExplictOrder(order).__call__(fname) and the other on what would need to be ExplicitExampleOrder(order)(src_dir)(fname). So I'm not sure ExplicitOrder can be used as a key using the existing interface.

The hope when we were discussing the design was that they would be exactly the same. Is there a good reason the within_subsection_order sort key not be called on a list of the fully qualified filenames inside each src_dir?

Personally I am fine with SortKey because I know what a sort key is. The argument was that a significant chunk of our users may not have heard of a sort key. In the end I think that's why Order was used.

@larsoner

This comment has been minimized.

Contributor

larsoner commented Aug 8, 2017

The hope when we were discussing the design was that they would be exactly the same. Is there a good reason the within_subsection_order sort key not be called on a list of the fully qualified filenames inside each src_dir?

It is eventually called that way -- any class for within_subsection_order will effectively operate as Class(my_order)(src_dir)(fname). Any class for subsection_order, by contrast, will effectively need to operate as Class(my_order)(fname). How should these be unified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment