Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exported modules can scrub_magics #1250

Merged
merged 5 commits into from Feb 17, 2023
Merged

Conversation

yegeniy
Copy link
Contributor

@yegeniy yegeniy commented Dec 21, 2022

For #1249.

This scrubs the cell magic lines out in the generated code, if a configuration is set. This behavior is turned off by default, but can be enable by the scrub_magics configuration in settings.ini.

Example:

For example, a cell like below could export the line "hello nbdev" into the bar module. Without including the %%spark magic line.

%%spark
#|export bar
"hello nbdev"

Use Case:

The specific use-case here is to simplify doing exploratory PySpark programming With this change, we can now use nbdev notebooks to generate valid pyspark code modules. At the same time, the Python-kernel notebook can continue to use the %%spark magic. This allows a local notebook to interact with a remote cluster running Livy.

This scrubs the cell magic lines out in the generated code. In some
sense, it is a follow-up to fastai#905.

Would y'all be interested in a PR that adds a new option to the
settings.ini config? It would scrub any `%%magics` from the exported
modules. Fairly similar to how `black_formatting` works today.

---

The specific use-case is to simplify doing exploratory programming in
PySpark. We can now use nbdev notebooks to generate valid [pyspark][0]
code modules. At the same time, the Python-kernel notebook can continue
to use the [`%%spark`][1] magic. We can use a local notebook to interact
with a remote cluster running Livy.

[0]: https://github.com/apache/spark/tree/v3.1.1-rc3/examples/src/main/python
[1]: https://github.com/jupyter-incubator/sparkmagic

---

Please note that the implementation is just copying
`processors.clean_magics`. Not sure how much of an issue that is for
this repository. Should we add a code cell like below above the
scrub_magics method?

```
\#|hide
\# FIXME: reusing processors.clean_magics runs into a circular import
\# from .processors import clean_magics
```
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -134,7 +134,26 @@
"outputs": [],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the implementation is just copying processors.clean_magics. Not sure how much of an issue that is for this repository. Should we add a code cell like below above this export.scrub_magics method?

\#|hide

\# FIXME: reusing processors.clean_magics runs into a circular import

\# from .processors import clean_magics


Reply via ReviewNB

@yegeniy yegeniy reopened this Feb 10, 2023
@jph00 jph00 added the enhancement New feature or request label Feb 15, 2023
Copy link
Member

@jph00 jph00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for your PR! Just a couple of requests below. Also, could you add some brief docs for your new directive?

nbdev/export.py Outdated
except: pass

# %% ../nbs/api/04_export.ipynb 10
def nb_export(nbname, lib_path=None, procs=[black_format,scrub_magics], debug=False, mod_maker=ModuleMaker, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use mutables for params: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/

Also, I don't think we want scrub_magics as a default here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think we want scrub_magics as a default here.

Agreed. scrub_magics is a no-op by default. Would be nice to avoid passing and executing the (no-op) method in altogether, by default.

Don't use mutables for params: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/

Thanks. I'll take some time to rethink how to pass scrub_magics in.
My first thought is to expose the list of procs in the nbdev_export CLI command, instead of relying on a config.

nbdev/config.py Outdated
@@ -59,6 +59,7 @@ def _apply_defaults(
language='English', # Language PyPI classifier
recursive:bool_arg=True, # Include subfolders in notebook globs?
black_formatting:bool_arg=False, # Format libraries with black?
scrub_magics:bool_arg=False, # Remove magics from exported modules?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to add this to _nbdev_cfg_sections IIRC.

Documentation on the `scrub_magics` processor is included just below its
associated code. None of the existing tutorials or explanations
notebooks seemed appropriate for this small contribution.

The scrub_magics configuration was removed because the processor is no
longer always executed. However, the existing behavior of always
executing the black_format processor is retained, for compatibility
purposes.

    usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs PROCS]

    Export notebooks in `path` to Python modules

    optional arguments:
    ...
      --procs PROCS                    whitespace delimited tokens of processors to use. (default: black_format)

Thank you for the feedback on 9366c7f @jph00.

Quick and hopefully unimportant question: You mentioned that this is
a "new directive". Should we refer to these as directives instead of as
processors? I didn't want to muddy up the diff with the refactoring.
The terms do seem similar, but I do worry that calling black_format and
scrub_magics export processors might confuse people who can't find them
under the `10_processors.ipynb` notebook. It certainly confused me at
first.
Copy link
Contributor Author

@yegeniy yegeniy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for the feedback on 9366c7f @jph00. I left some inline-comments below.

Documentation on the scrub_magics processor is included just below its
associated code. None of the existing tutorials or explanations
notebooks seemed appropriate for this small contribution.

The scrub_magics configuration was removed because the processor is no
longer always executed. However, the existing behavior of always
executing the black_format processor is retained, for compatibility
purposes.

usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs PROCS]
Export notebooks in `path` to Python modules
optional arguments:
...
  --procs PROCS                    whitespace delimited tokens of processors to use. (default: black_format)

Quick and hopefully unimportant question: You mentioned that this is
a "new directive". Should we refer to these as directives instead of as
processors? I didn't want to muddy up the diff with the refactoring.
The terms do seem similar, but I do worry that calling black_format and
scrub_magics export processors might confuse people who can't find them
under the 10_processors.ipynb notebook. I think I recall it confused me at
first.

**kwargs):
"Export notebooks in `path` to Python modules"
if os.environ.get('IN_TEST',0): return
if procs: procs = [getattr(nbdev.export, p) for p in procs.split(" ") if p in optional_procs()]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling getattr like this requires import nbdev.export (or an alternate solution). I placed the import just above this method. But I suspect we want all imports at the top, please just let me know.

Because it's being imported for a relatively non-standard reason, it seemed convenient to have it near the place it's called. I wanted to avoid calling the import within the function, though that may make its intention even clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the import within the function sounds like a good solution to me. There's no particular downsides to doing that - Python automatically caches the result so there's no significant overhead there, if that's what you were worried about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! My only concerns was w.r.t. aesthetics. I'll make that change and resolve the conversation thread.

@call_parse
@delegates(nbglob_cli)
def nbdev_export(
path:str=None, # Path or filename
procs:str="black_format", # whitespace delimited tokens of processors to use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find comma-separated-values to be more convenient when passing lists on the command-line. But the only examples I noticed in nbdev repo were whitespace delimited. So it seemed like the proper convention to follow.
Using a CSV makes it a little easier on the command line, because we don't need to worry about wrapping the CLI arguments in a string: --procs 'a b c' vs --procs a,b,c.

Maybe there's a nice way to pass in a glob that I'm unaware of though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right way to do it IMO is to use a tuple. That's the normal approach used in most Python APIs.

Copy link
Contributor Author

@yegeniy yegeniy Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good call. Thank you. I'll look into passing in a tuple via commandline with call_parse. Could probably leverage "choices" so only implemented parsers are accepted. I need to see how to set that up, so it might not make it into this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't quite figure out how to get it to work with a tuple.

But using Param(.., nargs="*", choices=optional_procs())="black_format" seems to get something more user-friendly for the command line.

Here are some example using `raise Exception(L(procs))` in the method to debug it.
> nbdev_export -h
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
...
  --procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]  tokens naming the export processors to use. (default: black_format)

# default behavior is to pass in black_format
> nbdev_export 
Exception: ['black_format']

# but this default can be omitted
> nbdev_export --procs 
> Exception: []

> nbdev_export --procs scrub_magics black_format
Exception: ['scrub_magics', 'black_format']

> nbdev_export --procs black_format
Exception: ['black_format']

> nbdev_export --procs scrub_magics
Exception: ['scrub_magics']

> nbdev_export --procs black_forma
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics')

> nbdev_export --procs black_forma scrub_magics
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics')

> nbdev_export --procs scrub_magics barst
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'barst' (choose from 'black_format', 'scrub_magics')

if p not in ["nb_export", "ExportModuleProc", "optional_procs"]])

# %% ../nbs/api/04_export.ipynb 16
def nb_export(nbname, lib_path=None, procs=None, debug=False, mod_maker=ModuleMaker, name=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I changed procs=black_format to procs=None. Refactoring that out of here seemed prudent. The code didn't seem to be used anywhere else, black_format is still passed in by default in nbdev_export, and tests passed locally. So I don't think it's a breaking change. Highlighting it, just in case though.

Comment on lines +44 to +45
# includes the newline, because calling .strip() would affect all cells.
_magics_pattern = re.compile(r'^\s*(%%|%).*\n?', re.MULTILINE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic pattern in clean_magics (in the processors notebook) is a little different. It doesn't include the newline and relies on calling .strip(). That's a much easier to follow, but seems to impact the whole cell instead of just the spark line.

When I used r'^\s*(%%|%).*' to test nbdev_export --procs scrub_magics, an unrelated notebook (with no magics!) was stripped of their trailing whitespace. Probably not a big deal, but I wanted to minimize the disruption here.

@yegeniy yegeniy changed the title exported modules can scrub_magics via settings.ini exported modules can scrub_magics Feb 16, 2023
@yegeniy yegeniy requested a review from jph00 February 16, 2023 13:59
@jph00
Copy link
Member

jph00 commented Feb 17, 2023

Thanks for your thorough response - please let me know when you feel this is ready to merge.

@yegeniy
Copy link
Contributor Author

yegeniy commented Feb 17, 2023

Thank you for the kind words and multiple reviews with excellent actionable feedback. I'll reply back after completing the changes I mentioned in the threads above. Let's merge after:

Thank you for the feedback here @jph00.

I couldn't quite figure out how to get --procs to work with a tuple.

But using Param(.., nargs="*", choices=optional_procs())="black_format" seems to get something more user-friendly for the command line.

Here are some examples using `raise Exception(L(procs))` in the method to debug it.

```
> nbdev_export -h
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
...
  --procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]  tokens naming the export processors to use. (default: black_format)

> nbdev_export
Exception: ['black_format']

> nbdev_export --procs
> Exception: []

> nbdev_export --procs scrub_magics black_format
Exception: ['scrub_magics', 'black_format']

> nbdev_export --procs black_format
Exception: ['black_format']

> nbdev_export --procs scrub_magics
Exception: ['scrub_magics']

> nbdev_export --procs black_forma
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics')

> nbdev_export --procs black_forma scrub_magics
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics')

> nbdev_export --procs scrub_magics barst
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'barst' (choose from 'black_format', 'scrub_magics')
```
@yegeniy
Copy link
Contributor Author

yegeniy commented Feb 17, 2023

@jph00 - if the latest changes in 0f4d622 look good to you , I think this is ready to go.

@jph00
Copy link
Member

jph00 commented Feb 17, 2023

Good job!

@jph00 jph00 merged commit 923f2b0 into fastai:master Feb 17, 2023
@yegeniy yegeniy deleted the scrub-magics-squash branch February 17, 2023 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants