Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA Add examples recommender system #1125

Merged
merged 97 commits into from Nov 17, 2023

Conversation

ArturoAmorQ
Copy link
Contributor

@ArturoAmorQ ArturoAmorQ commented Apr 5, 2023

Fixes #1081

For a proof of concept using the scikit-learn library navigate through:
https://output.circle-artifacts.com/output/job/22589aa6-cfe8-4daf-a9c0-b5abf94098b6/artifacts/0/doc/auto_examples/index.html

This PR is ready for review.

@GaelVaroquaux
Copy link
Contributor

This is exciting. It works: https://output.circle-artifacts.com/output/job/24f2114f-3815-45f3-8e2e-aec20d01b25a/artifacts/0/rtd_html/auto_examples/plot_0_sin.html#sphx-glr-auto-examples-plot-0-sin-py .

If at some point you can test a documentation with more examples such as matplotlib or scikit-learn's, it would be very useful to get a feeling of how well it works.

@larsoner
Copy link
Contributor

larsoner commented Apr 5, 2023

If at some point you can test a documentation with more examples such as matplotlib or scikit-learn's, it would be very useful to get a feeling of how well it works.

I'll take the bait :) Tested for MNE-Python (192 examples) in mne-tools/mne-python#11619. Spot checking I see that:

So it seems to work well!

FYI I do get some warnings like:

auto_examples/time_frequency/time_frequency_simulated.recommendations:3: CRITICAL: Title level inconsistent:

More related examples
^^^^^^^^^^^^^^^^^^^^^

@GaelVaroquaux
Copy link
Contributor

Thanks a lot, @larsoner ! I don't know MNE-Python that well. At a quick glance it seems that the recommendations are imperfect but definitely useful!

@ArturoAmorQ a couple of quick comments:

  • I would use as a section title "Related examples", instead of "More related examples".

  • You might consider using the rubric directive (https://docutils.sourceforge.io/docs/ref/rst/directives.html#rubric ) instead of a title. The reason is related to the warning reported by Eric above: it's impossible to get the title level consistent as the meaning of the type of underline depends on the order in which the underlines appear.

  • I think that controlling the number of related examples display might be interesting

  • Also, I was wondering: I notice that as you use scipy.sparse matrices, this feature will require the have scipy. That's a bit of a heavy dependency. Do you think that using plain numpy arrays is feasible? It will increase the memory cost, of course, but I wonder if this will lead to actually large memory costs

  • If the recommendation code comes with additional dependencies, we need to make sure that if the feature is not enabled in the configuration file, the corresponding code is not imported (to avoid triggering those additional dependencies)

Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments regarding some clean up

sphinx_gallery/gen_gallery.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_gallery.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_gallery.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_gallery.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_gallery.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_gallery.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_rst.py Outdated Show resolved Hide resolved
sphinx_gallery/gen_rst.py Outdated Show resolved Hide resolved
sphinx_gallery/recommender.py Outdated Show resolved Hide resolved
sphinx_gallery/recommender.py Show resolved Hide resolved
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@lucyleeow
Copy link
Contributor

@ArturoAmorQ let me know when this is ready to go in!

@ArturoAmorQ
Copy link
Contributor Author

@ArturoAmorQ let me know when this is ready to go in!

@lucyleeow On my side it's good to go :D

Copy link
Contributor

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your work! Sorry, last nitpicks I think.

sphinx_gallery/tests/test_recommender.py Outdated Show resolved Hide resolved
sphinx_gallery/recommender.py Outdated Show resolved Hide resolved
sphinx_gallery/recommender.py Outdated Show resolved Hide resolved
sphinx_gallery/recommender.py Outdated Show resolved Hide resolved
@@ -1466,6 +1470,11 @@ def save_rst_example(

example_rst += CODE_DOWNLOAD.format(example_file.name, language)

if gallery_conf["recommender"]["enable"]:
# extract the filename without the extension
recommend_fname = os.path.splitext(os.path.split(example_fname)[1])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be nicer if we use Path instead?

return app


def test_recommend_n_examples(sphinx_app):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry, I should have been clearer, this test can stay in test_full.py. Since it uses sphinx_app it may as well stay in test_full.py, I was meaning more the other 2 tests, which don't need spinx_app.

Also thank you for separating test_example_recommender_methods to it's own unit test.

@ArturoAmorQ
Copy link
Contributor Author

I hope that all the comments have been addressed. Please let me know otherwise.

Copy link
Contributor

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry, just thought of 2 more things. Thanks for your patience. This is the last thing!

the very rare/very common words. This may improve the recommendations quality,
but more importantly, it spares some computation resources that would be wasted
on non-informative tokens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is already implied above but could we explicitly say that only the examples within a single gallery (and it's sub galleries) are used for computing closest examples.

Also this is probably obvious but can we add that only recommendations for .py files will be generated.

n_examples = sphinx_app.config.sphinx_gallery_conf["recommender"]["n_examples"]

assert '<p class="rubric">Related examples</p>' in html
assert count == n_examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a check that the same 3 related examples are found?

@lucyleeow
Copy link
Contributor

Note CI failure due to: scipy/docs.scipy.org#80 (we use 'http://docs.scipy.org/doc/scipy/wrong_url' in our tinybuild)

Will merge once green, just in case.

@ArturoAmorQ I've made the last small changes.

@lucyleeow lucyleeow enabled auto-merge (squash) November 16, 2023 02:14
@ArturoAmorQ
Copy link
Contributor Author

I've made the last small changes.

Thanks @lucyleeow :)

@lucyleeow lucyleeow merged commit 13b9bef into sphinx-gallery:master Nov 17, 2023
19 checks passed
@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Nov 17, 2023 via email

@larsoner
Copy link
Contributor

Would be good to get this out there -- @lucyleeow do you have time to cut a release next week? If not, I can do it

@lucyleeow
Copy link
Contributor

Tentative yes. I'll ping you if not.

clrpackages pushed a commit to clearlinux-pkgs/pypi-sphinx_gallery that referenced this pull request Nov 28, 2023
… to version 0.15.0

v0.15.0
-------

Support for Python 3.7 dropped in this release. Requirement is now Python >=3.8.
Pillow added as a dependency.

**Implemented enhancements:**

-  ENH: Improve logging visibility of errors and filenames `#1225 <https://github.com/sphinx-gallery/sphinx-gallery/pull/1225>`__ (`larsoner <https://github.com/larsoner>`__)
-  ENH: Improve API usage graph `#1203 <https://github.com/sphinx-gallery/sphinx-gallery/pull/1203>`__ (`larsoner <https://github.com/larsoner>`__)
-  ENH: Always write sg_execution_times and make DataTable `#1198 <https://github.com/sphinx-gallery/sphinx-gallery/pull/1198>`__ (`larsoner <https://github.com/larsoner>`__)
-  ENH: Write all computation times `#1197 <https://github.com/sphinx-gallery/sphinx-gallery/pull/1197>`__ (`larsoner <https://github.com/larsoner>`__)
-  ENH: Support source files in any language `#1192 <https://github.com/sphinx-gallery/sphinx-gallery/pull/1192>`__ (`speth <https://github.com/speth>`__)
-  FEA Add examples recommender system `#1125 <https://github.com/sphinx-gallery/sphinx-gallery/pull/1125>`__ (`ArturoAmorQ <https://github.com/ArturoAmorQ>`__)

(NEWS truncated at 15 lines)
@story645
Copy link
Contributor

Sorry to post about it here, but I don't have folks emails. For GSOC this year, one of the proposed projects is an image based search. I think it could work as an extension of this feature/built on top if it, so I was wondering if any of the folks here had time/interest in being subject area mentors? Thanks!

@larsoner
Copy link
Contributor

Hmm... I'm not totally convinced it would be useful to other projects that use SG like scikit-learn, MNE, etc. Would it?

If it's mostly (or only) useful for matplotlib, it might be better to figure out how to interface with SG properly to insert some image-based search stuff. I could help with that at least. (Sphinx might already have the appropriate hooks to modify the reST files generated by SG for example, in which case nothing would even be needed at the SG end!)

@story645
Copy link
Contributor

story645 commented Apr 1, 2024

it might be better to figure out how to interface with SG properly to insert some image-based search

The reason I was thinking custom solution is primarily b/c my end goal is a detexify type interface & also possibly a component based search. Granted like a detexify interface could hook into an image-based search - and also yeah that might be generally useful b/c I don't think my motivation of "folks may not know what a chart is called" is necessarily specific to matplotlib.

@larsoner
Copy link
Contributor

larsoner commented Apr 1, 2024

@lucyleeow would you be interested in mentoring this potentially? You might be a better fit

@lucyleeow
Copy link
Contributor

I was thinking the scikit-learn folks would be more knowledgable about machine learning algorithms but @story645 email me, we could be able to work something out.

@lucyleeow
Copy link
Contributor

Actually do you have my email? I'll make it visible on my github account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add content-based recommendation-system for the example gallery
7 participants