New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exported modules can scrub_magics #1250
Conversation
This scrubs the cell magic lines out in the generated code. In some sense, it is a follow-up to fastai#905. Would y'all be interested in a PR that adds a new option to the settings.ini config? It would scrub any `%%magics` from the exported modules. Fairly similar to how `black_formatting` works today. --- The specific use-case is to simplify doing exploratory programming in PySpark. We can now use nbdev notebooks to generate valid [pyspark][0] code modules. At the same time, the Python-kernel notebook can continue to use the [`%%spark`][1] magic. We can use a local notebook to interact with a remote cluster running Livy. [0]: https://github.com/apache/spark/tree/v3.1.1-rc3/examples/src/main/python [1]: https://github.com/jupyter-incubator/sparkmagic --- Please note that the implementation is just copying `processors.clean_magics`. Not sure how much of an issue that is for this repository. Should we add a code cell like below above the scrub_magics method? ``` \#|hide \# FIXME: reusing processors.clean_magics runs into a circular import \# from .processors import clean_magics ```
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
nbs/api/export.ipynb
Outdated
@@ -134,7 +134,26 @@ | |||
"outputs": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that the implementation is just copying processors.clean_magics
. Not sure how much of an issue that is for this repository. Should we add a code cell like below above this export.scrub_magics
method?
\#|hide
\# FIXME: reusing processors.clean_magics runs into a circular import
\# from .processors import clean_magics
Reply via ReviewNB
9366c7f
to
3acb976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your PR! Just a couple of requests below. Also, could you add some brief docs for your new directive?
nbdev/export.py
Outdated
except: pass | ||
|
||
# %% ../nbs/api/04_export.ipynb 10 | ||
def nb_export(nbname, lib_path=None, procs=[black_format,scrub_magics], debug=False, mod_maker=ModuleMaker, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use mutables for params: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/
Also, I don't think we want scrub_magics
as a default here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think we want scrub_magics as a default here.
Agreed. scrub_magics
is a no-op by default. Would be nice to avoid passing and executing the (no-op) method in altogether, by default.
Don't use mutables for params: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/
Thanks. I'll take some time to rethink how to pass scrub_magics
in.
My first thought is to expose the list of procs in the nbdev_export
CLI command, instead of relying on a config.
nbdev/config.py
Outdated
@@ -59,6 +59,7 @@ def _apply_defaults( | |||
language='English', # Language PyPI classifier | |||
recursive:bool_arg=True, # Include subfolders in notebook globs? | |||
black_formatting:bool_arg=False, # Format libraries with black? | |||
scrub_magics:bool_arg=False, # Remove magics from exported modules? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also need to add this to _nbdev_cfg_sections
IIRC.
Documentation on the `scrub_magics` processor is included just below its associated code. None of the existing tutorials or explanations notebooks seemed appropriate for this small contribution. The scrub_magics configuration was removed because the processor is no longer always executed. However, the existing behavior of always executing the black_format processor is retained, for compatibility purposes. usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs PROCS] Export notebooks in `path` to Python modules optional arguments: ... --procs PROCS whitespace delimited tokens of processors to use. (default: black_format) Thank you for the feedback on 9366c7f @jph00. Quick and hopefully unimportant question: You mentioned that this is a "new directive". Should we refer to these as directives instead of as processors? I didn't want to muddy up the diff with the refactoring. The terms do seem similar, but I do worry that calling black_format and scrub_magics export processors might confuse people who can't find them under the `10_processors.ipynb` notebook. It certainly confused me at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the feedback on 9366c7f @jph00. I left some inline-comments below.
Documentation on the scrub_magics
processor is included just below its
associated code. None of the existing tutorials or explanations
notebooks seemed appropriate for this small contribution.
The scrub_magics configuration was removed because the processor is no
longer always executed. However, the existing behavior of always
executing the black_format processor is retained, for compatibility
purposes.
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs PROCS]
Export notebooks in `path` to Python modules
optional arguments:
...
--procs PROCS whitespace delimited tokens of processors to use. (default: black_format)
Quick and hopefully unimportant question: You mentioned that this is
a "new directive". Should we refer to these as directives instead of as
processors? I didn't want to muddy up the diff with the refactoring.
The terms do seem similar, but I do worry that calling black_format and
scrub_magics export processors might confuse people who can't find them
under the 10_processors.ipynb
notebook. I think I recall it confused me at
first.
nbdev/doclinks.py
Outdated
**kwargs): | ||
"Export notebooks in `path` to Python modules" | ||
if os.environ.get('IN_TEST',0): return | ||
if procs: procs = [getattr(nbdev.export, p) for p in procs.split(" ") if p in optional_procs()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling getattr
like this requires import nbdev.export
(or an alternate solution). I placed the import just above this method. But I suspect we want all imports at the top, please just let me know.
Because it's being imported for a relatively non-standard reason, it seemed convenient to have it near the place it's called. I wanted to avoid calling the import within the function, though that may make its intention even clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the import within the function sounds like a good solution to me. There's no particular downsides to doing that - Python automatically caches the result so there's no significant overhead there, if that's what you were worried about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! My only concerns was w.r.t. aesthetics. I'll make that change and resolve the conversation thread.
nbdev/doclinks.py
Outdated
@call_parse | ||
@delegates(nbglob_cli) | ||
def nbdev_export( | ||
path:str=None, # Path or filename | ||
procs:str="black_format", # whitespace delimited tokens of processors to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find comma-separated-values to be more convenient when passing lists on the command-line. But the only examples I noticed in nbdev repo were whitespace delimited. So it seemed like the proper convention to follow.
Using a CSV makes it a little easier on the command line, because we don't need to worry about wrapping the CLI arguments in a string: --procs 'a b c'
vs --procs a,b,c
.
Maybe there's a nice way to pass in a glob that I'm unaware of though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right way to do it IMO is to use a tuple. That's the normal approach used in most Python APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good call. Thank you. I'll look into passing in a tuple via commandline with call_parse. Could probably leverage "choices" so only implemented parsers are accepted. I need to see how to set that up, so it might not make it into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite figure out how to get it to work with a tuple.
But using Param(.., nargs="*", choices=optional_procs())="black_format"
seems to get something more user-friendly for the command line.
Here are some example using `raise Exception(L(procs))` in the method to debug it.
> nbdev_export -h
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
...
--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]] tokens naming the export processors to use. (default: black_format)
# default behavior is to pass in black_format
> nbdev_export
Exception: ['black_format']
# but this default can be omitted
> nbdev_export --procs
> Exception: []
> nbdev_export --procs scrub_magics black_format
Exception: ['scrub_magics', 'black_format']
> nbdev_export --procs black_format
Exception: ['black_format']
> nbdev_export --procs scrub_magics
Exception: ['scrub_magics']
> nbdev_export --procs black_forma
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics')
> nbdev_export --procs black_forma scrub_magics
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics')
> nbdev_export --procs scrub_magics barst
usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]]
nbdev_export: error: argument --procs: invalid choice: 'barst' (choose from 'black_format', 'scrub_magics')
if p not in ["nb_export", "ExportModuleProc", "optional_procs"]]) | ||
|
||
# %% ../nbs/api/04_export.ipynb 16 | ||
def nb_export(nbname, lib_path=None, procs=None, debug=False, mod_maker=ModuleMaker, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I changed procs=black_format
to procs=None
. Refactoring that out of here seemed prudent. The code didn't seem to be used anywhere else, black_format
is still passed in by default in nbdev_export
, and tests passed locally. So I don't think it's a breaking change. Highlighting it, just in case though.
# includes the newline, because calling .strip() would affect all cells. | ||
_magics_pattern = re.compile(r'^\s*(%%|%).*\n?', re.MULTILINE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic pattern in clean_magics
(in the processors
notebook) is a little different. It doesn't include the newline and relies on calling .strip()
. That's a much easier to follow, but seems to impact the whole cell instead of just the spark line.
When I used r'^\s*(%%|%).*'
to test nbdev_export --procs scrub_magics
, an unrelated notebook (with no magics!) was stripped of their trailing whitespace. Probably not a big deal, but I wanted to minimize the disruption here.
Thanks for your thorough response - please let me know when you feel this is ready to merge. |
Thank you for the kind words and multiple reviews with excellent actionable feedback. I'll reply back after completing the changes I mentioned in the threads above. Let's merge after: |
Thank you for the feedback here @jph00. I couldn't quite figure out how to get --procs to work with a tuple. But using Param(.., nargs="*", choices=optional_procs())="black_format" seems to get something more user-friendly for the command line. Here are some examples using `raise Exception(L(procs))` in the method to debug it. ``` > nbdev_export -h usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]] ... --procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]] tokens naming the export processors to use. (default: black_format) > nbdev_export Exception: ['black_format'] > nbdev_export --procs > Exception: [] > nbdev_export --procs scrub_magics black_format Exception: ['scrub_magics', 'black_format'] > nbdev_export --procs black_format Exception: ['black_format'] > nbdev_export --procs scrub_magics Exception: ['scrub_magics'] > nbdev_export --procs black_forma usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]] nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics') > nbdev_export --procs black_forma scrub_magics usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]] nbdev_export: error: argument --procs: invalid choice: 'black_forma' (choose from 'black_format', 'scrub_magics') > nbdev_export --procs scrub_magics barst usage: nbdev_export [-h] [--path PATH] [--symlinks] [--file_glob FILE_GLOB] [--file_re FILE_RE] [--folder_re FOLDER_RE] [--skip_file_glob SKIP_FILE_GLOB] [--skip_file_re SKIP_FILE_RE] [--skip_folder_re SKIP_FOLDER_RE] [--procs [{black_format,scrub_magics} [{black_format,scrub_magics} ...]]] nbdev_export: error: argument --procs: invalid choice: 'barst' (choose from 'black_format', 'scrub_magics') ```
Good job! |
For #1249.
This scrubs the cell magic lines out in the generated code, if a configuration is set. This behavior is turned off by default, but can be enable by the
scrub_magics
configuration insettings.ini
.Example:
For example, a cell like below could export the line
"hello nbdev"
into thebar
module. Without including the%%spark
magic line.Use Case:
The specific use-case here is to simplify doing exploratory PySpark programming With this change, we can now use nbdev notebooks to generate valid pyspark code modules. At the same time, the Python-kernel notebook can continue to use the
%%spark
magic. This allows a local notebook to interact with a remote cluster running Livy.