<img src="https://fsdl.me/logo-720-dark-horizontal">

# Lab 05: Troubleshooting & Testing

### What You Will Learn

- Practices and tools for testing and linting Python code in general: `black`, `flake8`, `precommit`, `pytests` and `doctests`
- How to implement memorization tests for ML training systems in particular
- What a PyTorch training step looks like under the hood and how to troubleshoot performance bottlenecks

# Setup

If you're running this notebook on Google Colab,
the cell below will run full environment setup.

It should take about three minutes to run.

In [None]:
lab_idx = 5

if "bootstrap" not in locals() or bootstrap.run:
    # path management for Python
    pythonpath, = !echo $PYTHONPATH
    if "." not in pythonpath.split(":"):
        pythonpath = ".:" + pythonpath
        %env PYTHONPATH={pythonpath}
        !echo $PYTHONPATH

    # get both Colab and local notebooks into the same state
    !wget --quiet https://fsdl.me/gist-bootstrap -O bootstrap.py
    import bootstrap

    # change into the lab directory
    bootstrap.change_to_lab_dir(lab_idx=lab_idx)

    # allow "hot-reloading" of modules
    %load_ext autoreload
    %autoreload 2
    # needed for inline plots in some contexts
    %matplotlib inline

    bootstrap.run = False  # change to True re-run setup
    
!pwd
%ls

In [None]:
from IPython.display import display, HTML, IFrame

full_width = True
frame_height = 720  # adjust for your screen

if full_width:  # if we want the notebook to take up the whole width
    # add styling to the notebook's HTML directly
    display(HTML("<style>.container { width:100% !important; }</style>"))
    display(HTML("<style>.output_result { max-width:100% !important; }</style>"))

# Linting

We want to keep our code clean and uniform across developers.

Applying the cleanliness checks and style rules should be
as painless and automatic as possible.

We recommend bundling tools together with [`pre-commit`](https://pre-commit.com/).

`pre-commit` separates the model development environment from the environments
needed for the linting tools, preventing conflicts.

If you're working locally, might find it easier to use
`make pip-tools-lint`.

Run `pre-commit`. **MAKE SURE THIS WORKS IN LOCAL LAB DEV**.

In [None]:
!pre-commit run --all-files

Simple hygiene things: accidental private key leaks, leftover debugger statements, merge conflicts, formatting.

Configured via yaml file:

In [None]:
!cat .pre-commit-config.yaml

We apply a number of smaller cleanliness checks using hooks built by `pre-commit`:

In [None]:
!cat .pre-commit-config.yaml | grep repos -A 15

Let's take a look at the section of the file that applies most of our style enforcement with `flake8`:

In [None]:
!cat .pre-commit-config.yaml | grep "flake8 python" -A 10

Show the `.flake8` file. Important to link these things, bidirectionally.

In [None]:
!cat .flake8

`select`: allow error codes that match these

`extend-ignore`: bit not error codes that match these

defines style out of all the things possibly checked.

`per-file-ignores`: ignore specific warnings in specific files, an escape valve.

for details on selecting and ignoring, see [`flake8` docs](https://flake8.pycqa.org/en/latest/user/violations.html)

for definitions of the core error codes, see the [list in the docs](https://flake8.pycqa.org/en/latest/user/error-codes.html)

Most of these conventions come from [Python Enhancement Proposal 8](https://peps.python.org/pep-0008/),
which exhorts you to "know when to be inconsistent".

The remainder are configurations for the other `flake8` plugins that we use to define and enforce the rest of our style, with links to documentation:
- [`flake8-import-order`](https://github.com/PyCQA/flake8-import-order) for checking imports
- [`flake8-docstrings`](https://github.com/pycqa/flake8-docstrings) for docstring style
- [`darglint`](https://github.com/terrencepreilly/darglint) for docstring completeness
- [`flake8-annotations`](https://github.com/sco1/flake8-annotations) for type annotations

### Linting via a script and using `shellcheck`

To avoid needing to think about `pre-commit` while developing locally,
we might put our linters into a shell script:

In [None]:
!cat tasks/lint.sh

Run `shellcheck` on this file. 

In [None]:
!pre-commit run shellcheck --files tasks/lint.sh

Encourage copying a script from their machine or a favorite repo.
You'd be surprised at the classes of subtle bugs possible in bash!

### "Unofficial bash strict mode" for louder failures in scripts

Another way to reduce bugs is to use the "unofficial bash strict mode" settings, from the top:

In [None]:
!head -n 3 tasks/lint.sh

Core idea is to fail more loudly:
`-u` means fail if a variable's value is `u`nset,
ie not defined.
Yes, bash allows you to reference undefined variables.
Weird behavior of bash: it's just an empty string.

`-o pipefail` means failures inside a pipe of commands propagate,
rather than using the exit code of the last command.
Kind of like `?` in typescript.
Unix tools are happy to work on nonsense input,
like sorting error messages or empty strings.

Usually includes `-e`, here `+e` means exit codes do not cause an error.
We want to run all of our linters, not just fail on the first one, so we do explicit error handling.

Read more about these choices [here](http://redsymbol.net/articles/unofficial-bash-strict-mode/),
including considerations for working with other non-conforming scripts
and for resource-handling.

# Testing ML Codebases

## Testing Python code with `pytests`


ML codebases are Python first and foremost, so first let's get some Python tests going.

Run `pytest`, look at output.

[Docs](https://docs.pytest.org/en/7.1.x/)

In [None]:
!pytest .

The end section there is a report of coverage from
[`codecov`](https://about.codecov.io/).

By default, `pytest` looks for files named `test_*.py` or `*_test.py`.

In [None]:
!ls text_recognizer/tests

It's [good practice](https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html#test-discovery)
to separate these from the rest of your code, rather than scattering around the repo, in a folder or folders named `tests`.

Let's take a look at a specific example:
the tests for some of our utilities around
custom PyTorch Lightning `Callback`s.

In [None]:
from text_recognizer.tests import test_callback_utils


test_callback_utils.__doc__

Notice that we can import this as a module! Keeping tests as simple as possible makes this easier.

This code is designed to prevent crashes:
it checks for a particular type of error and turns it into a warning.

Error-handling code is a common cause of bugs,
so we test it.

In [None]:
test_callback_utils.test_check_and_warn_simple??

Basic test, not incorporating any external libraries.

This is the core functionality. Should be very non-flaky. Important for diagnosing a bug: which tests passed, not just which tests failed.

This reasoning is explained in the docstrings, which are close to the code.
Your test suite should be as welcoming as the rest of your codebase! The people reading it are likely upset
and we want keep our time-to-resolve errors as short as possible.

There's a specific error that triggered the addition of this code.

So we test that it's handled as expected.

In [None]:
test_callback_utils.test_check_and_warn_tblogger??

That test can fail if the libraries around us change, ie if `TensorBoardLogger` gets a `log_table` method.

But that will _also_ change the behavior of our code, and just because the method has the same name doesn't mean it does the same thing.

Adding error handling can also accidentally kill the happy path by raising an error incorrectly.

So we explicitly test this:

In [None]:
test_callback_utils.test_check_and_warn_wandblogger??

There are more tests we could build, e.g. manipulating classes and testing the behavior,
testing more classes that might be targeted by `check_and_warn`,
asserting that warnings are raised to the command line.

But these three basic tests are likely to catch most changes that would break our code.

If this utility starts to get more usage and become a critical path for lots of features, we can always add more.

## Interleaving testing and documentation with `doctests`

One function of tests is to build user/reader confidence in code.

The function of documentation is to build user/reader knowledge in code.

These are related. Let's put them together: code in docstring that gets tested.

In [None]:
from text_recognizer.lit_models.util import first_appearance


first_appearance??

This function can be used to e.g. quickly look for stop tokens,
giving the length of each sequence.

In [None]:
import torch


first_appearance(torch.tensor([[1, 2, 3], [2, 3, 3], [1, 1, 1], [3, 1, 1]]), 3)

We can run the test by passing a command line argument to `pytest`:

In [None]:
!pytest --doctest-modules text_recognizer/lit_models/util.py

With the
[right configuration](https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2022/blob/627dc9dabc9070cb14bfe5bfcb1d6131eb7dc7a8/pyproject.toml#L12-L17),
running `doctests` happens automatically.

## Testing data

For testing our data, we mostly just use bare `assert`s:

In [None]:
!grep "assert" -r text_recognizer/data

Can organize some into a module for more clarity, incorporation into coverage

In [None]:
from text_recognizer.tests.test_iam import test_iam_data_splits


test_iam_data_splits??

Also because this is in a module, we can easily run it.

In [None]:
test_iam_data_splits()

But we're checking something pretty simple here.

What if we wanted to test more complex properties?
We'll end up writing more complex code that might itself have subtle bugs,
requiring tests for our tests and suffering from
tester's regress
([by analogy with experimenter's regress](https://en.wikipedia.org/wiki/Experimenter%27s_regress)):
the validity of our tests is itself up for dispute requiring testing,
which has disputable validity.

Use a library or framework that is well-tested.

More future-proof choice, with better features, is `great_expectations`.

[Docs](https://docs.greatexpectations.io/docs/)

Especially with data, some tests are particularly "heavy" -- they take a long time.

For example, testing whether the download of a dataset succeeds and gives the right checksum.

`pytest` resolves this with `mark`s, which "tag" tests with names.

In [None]:
!pytest --markers | head -n 10

We can choose to run tests with a given mark with `-m`:

In [None]:
!pytest -m "data"

Or to skip tests with a given mark, among other basic logical operations around combining and filtering tags:

In [None]:
!pytest -m "not data and not slow"

## Testing training with memorization tests

We should also test training,
the process by which data becomes models.

Training brings together data and models,
so is dependent on both.

So we decouple checking whether the script has a critical bug
from whether the data or model code is broken
by testing on some basic "fake data",
based on a utility from `torchvision`.

In [None]:
!ls text_recognizer/data/

In [None]:
from text_recognizer.data import FakeImageData


FakeImageData??

We then test on the actual data with a smaller version of the real model.

We use a smaller version so that these tests can run in just a few minutes on a CPU without acceleration.
That means we can use tools like GitHub Actions.

Here's the script:

In [None]:
!cat training/tests/test_run_experiment.sh

In [None]:
! ./training/tests/test_run_experiment.sh

The script below runs a memorization test,
checking whether our model can "memorize"
the content of a single batch.

It takes up to two arguments:
a `MAX`imum number of `EPOCHS` to run for and
a `CRITERION` value of the loss to test against.

The test passes if the loss is lower than the `CRITERION` value
after the `MAX`imum number of `EPOCHS` has passed.

The important line for this is the one that invokes our training script.

This test has been tuned for maximum speed.
Check each argument and understand why it's chosen.

In [None]:
!cat training/tests/test_memorize_iam.sh

Encourage them to try it out and look at results on W&B.

In [None]:
running_memorization = False

if running_memorization:
    max_epochs = 1000
    loss_criterion = 0.1
    !./training/tests/memorize-iam.sh {max_epochs} {loss_criterion}

# Automation with GitHub Actions

Show the YAML file for automating `pre-commit`.

We'll take a look at this locally,
but note that there's a rich interface for looking at
workflows, configurations, and executions on GitHub.

In [None]:
pre_commit_action_url = "https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2022/actions/workflows/pre-commit.yml"

print(pre_commit_action_url)

!cat .github/workflows/pre-commit.yml

Workflow syntax docs [here](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions)

#### `name`:
What is this called in the GitHub UI

#### `on`:

What triggers does this event have.

`pull_request` and `push` are self-explanatory. Can also filter by branches.

`workflow_dispatch` allows manual triggering. Super important for debugging!

#### `jobs`:
Each workflow can include multiple jobs, which may depend on each other, and so form a DAG.

There's just one job here, `pre-commit`.

#### `steps`:

Inside a job, there are "steps".

Each step is a unit of work, they're executed in linear order.

`uses` means we are applying an existing workflow as a step.

We're just using existing workflows here. We don't need to write any commands ourselves.

You can run this for yourself if you fork the lab repo!

Let's look at a more complex workflow:

In [None]:
# show the test.yml file, but just the lines before the integration-tests start
!wget -nv -qO- https://raw.githubusercontent.com/full-stack-deep-learning/fsdl-text-recognizer-2022/main/.github/workflows/test.yml  | grep "integration-tests" -B 50

Simplest addition: environment variables, like `PYTHONPATH`.

We also `run` our own commands, `pip` installing an environment and executing a test.

Another level: secret variables. Inject eg API keys without revealing values. Configured via GitHub.

Harder bits are for performance: multiple `jobs`, run independently. `actions/cache` caches environment to avoid needing to rebuild Python environment every time.

protip: use a prefix like `v1` so you can manually trigger a cache miss if things are wonky. critical for debugging.

see [blogpost from AIAI](https://blog.allenai.org/python-caching-in-github-actions-e9452698e98d)
for more tips.

# Troubleshooting model speed with the PyTorch Profiler

Troubleshooting deep neural networks for speed is challenging.

- Follow advice from others (Karpathy tweet, NVIDIA talk) and use existing implementations
- Do empirical work, with good observations, to troubleshoot for speed
- Truly understand distributed, accelerated tensor computations so you can write it correctly from scratch the first time

For the full stack deep learning engineer,
the last is typically out of reach,
unless you're specializing in model performance.

So we recommend reaching the second level.

We've added a new feature to `training/run_experiment.py`:

In [None]:
!python training/run_experiment.py --help | grep -A 1 -e "^\s*--profile\s"

Again, this relies mostly on features of PyTorch Lightning,
with just a few lines of customization:

In [None]:
!cat training/run_experiment.py | grep args.profile -A 5

For more on this, see the
[Lightning tutorial](https://pytorch-lightning.readthedocs.io/en/1.6.1/advanced/profiler.html)
on profiling.

Heads up! Tools are a lot fiddlier here :/

Also, the details depend on the precise machine being used -- GPU and CPU and RAM.

If you don't observe the described phenomenon, check out the links to public pages with this information on W&B.

In [None]:
import glob

import torch
import wandb

from text_recognizer.data.base_data_module import DEFAULT_NUM_WORKERS


# make it easier to separate these from training runs
%env WANDB_JOB_TYPE=profile

batch_size = 16
num_workers = DEFAULT_NUM_WORKERS
gpus = 1  # must be run with accelerator

%run training/run_experiment.py --wandb --profile \
  --max_epochs=1 \
  --num_sanity_val_steps=0 --limit_val_batches=0 --limit_test_batches=0 \
  --model_class=ResnetTransformer --data_class=IAMParagraphs --loss=transformer \
  --batch_size={batch_size} --num_workers={num_workers} --precision=16 --gpus=1

latest_expt = wandb.run

try:  # add execution trace to logged and versioned binaries
    folder = wandb.run.dir
    trace_matcher = wandb.run.dir + "/*.pt.trace.json"
    trace_file = glob.glob(trace_matcher)[0]
    trace_at = wandb.Artifact(name=f"trace-{wandb.run.id}", type="trace")
    trace_at.add_file(trace_file, name="training_step.pt.trace.json")
    wandb.log_artifact(trace_at)
except IndexError:
    print("trace not found")

wandb.finish()

We get out a table of statistics in the terminal,
courtesy of Lightning.

With practice, some useful information can be read out from this table,
but it's better to start with both a less detailed and a more detailed view.

## High-Level Statistics from the PyTorch Profiler

Let's look at this info in a high-level TensorBoard dashboard, conveniently hosted for us on W&B.

In [None]:
your_tensorboard_url = latest_expt.url + "/tensorboard"

print(your_tensorboard_url)

In [None]:
public_tensorboard_url = "https://wandb.ai/cfrye59/fsdl-text-recognizer-2022/runs/z9ja0ngm/tensorboard"
print(public_tensorboard_url)

### Overview Tab

- Compute Capability: def'n from NVIDIA. Effectively, which features are available.

- GPU Utilization: fraction of time a kernel is running, anywhere on GPU. gross metric, first target. don't spend more on GPUs until you've got this to 80-90%+!
- Est. SM Efficiency: **lookup precise def'n**
- Est. Occupancy: **lookup precise def'n**. hard to get above 60%, requires very specialized techniques to get abvoe 80% **confirm by Google**

- might see Tensor Cores -- from later generations **confirm details here, eg exact Compute Capability (7?)**, run much faster, require `precision=16`, so try cutting the batch size in half and running with `precision=32`. should get a performance recommendation to use half precision for Tensor Cores.

- Execution Summary: **check what it means to appear here**. does it mean we're waiting on this op? not really, bc kernel. is it "kernel or something else?"

### GPU Kernel Tab

- names of kernels being used.
- hard to read, poorly documented, somewhat proprietary
- but some useful bits, like `gemm` for matmuls, `winograd` and `conv` for convolutions
- might look at to get a sense for where your model is spending time, but the better method is the trace

## Going deeper with the Chrome Trace Viewer

Web developers have amazing tools.

We steal.

In [None]:
trace_files_url = latest_expt.url.split("/runs/")[0] + f"/artifacts/trace/trace-{latest_expt.id}/latest/files/"
trace_url = trace_files_url + "training_step.pt.trace.json"

Can be flaky. Open page -> in W&B if so.

If you're having trouble finding the features referred to below,
it could be due to hardware differences.
In that case, read the below while looking at
[this example trace](https://wandb.ai/cfrye59/fsdl-text-recognizer-2022-training/artifacts/trace/trace-67j1qxws/latest/files/training_step.pt.trace.json)
and then return to examine your trace once you understand the trace viewer better.

In [None]:
print(trace_url)
IFrame(src=trace_url, height=frame_height * 1.5, width="100%")

microsecond-level detail on what's going on inside `training_step`.

"call stack" -- methods at the top call the method beneath them.

`training_step` is towards the top.

Let's orient ourselves with some gross features.

### The forwards pass

Type in `resnet` to the search bar in the top-right.

This will highlight the first part of our forwards pass.

Should say `thread XYZ (python)` next to it.

Then type in `transformer` to highlight the second part.
Should be at same height in same section.

Zoom in (arrows in the floating toolbar) to view in detail.
Clear the search bar so that the trace is in color.

Start at a very abstract level in Python ("`Sequential`", `Conv2d`)
end up with very precise `cudnn` and `cuda` operations
(`aten::cudnn_convolution`).

`aten` ([no relation to the Pharaoh](https://twitter.com/charles_irl/status/1422232585724432392?s=20&t=Jr4j5ZXhV20xGwUVD1rY0Q))
is the PyTorch tensor math library
that abstracts over specific backends like `cudnn`.

### GPU kernel execution

Towards the bottom, should see a section labeled "GPU".

Within it, you'll see one or more "`stream`s".
These are the units of work on a GPU,
akin to threads on the CPU.

When there are colored bars in this area,
the GPU is doing work of some kind.
The fraction of this bar that is filled in with color
is the same as the "GPU Utilization %" we've seen previously.

In CUDA, work is queued up to be placed into streams and completed
in a distributed and asynchronous manner.

The selection of work is happening on the CPU,
as we saw above.
The CPU and the GPU work together to coordinate this process.

Type `cuda` into the search bar and you'll see these coordination operations happening:
`cudaLaunchKernel`, for example, is the CPU telling the GPU what to do.

Running the same PyTorch model in different versions of PyTorch,
on different GPUs, and even on tensors of different sizes will result
in different choices of concrete kernel operation,
e.g. different matrix multiplication algorithms.

Type `sync` into the search bar and you'll see places where either work on the GPU
or work on the CPU needs to await synchronization,
e.g. copying data from the CPU to the GPU or deciding what to do next
on the basis of the contents of a tensor.

If you see a "sync" block above an area where the stream on the GPU is empty,
you've got a performance bottleneck.
That's a good place to review your code to understand why the synchronization is happening
and removing it if it's not necessary.

### The backwards pass

Type in `backward` to the search bar.

This will highlight mostly components of our backwards pass.

Generally, this happens in a separate thread from the forwards pass,
also on the CPU.

Similarly launches kernels on GPU from CPU.

Generally, there's no need to optimize the backwards pass --
removing bottlenecks in the forwards pass results in a fast backwards pass.

One reason why is that these two passes are the transpose of another,
so they share a lot of properties,
and bottlenecks in one become bottlenecks in the other.
But the forwards pass is under our control,
so it's easier to reason about.

Another reason is that the forwards pass is harder because PyTorch doesn't know what's happening next.
Backwards passes, on the other hand, are happening once with a static compute graph,
so more optimizations are possible.

### The optimizer step

Type in `Adam.step` to the search bar to highlight the actions of the optimizer.

Lightning implementation detail:
`optimizer_step` actually wraps the forwards and backwards pass,
because some optimizers require multiple calls to forward/backward.
So reading the stats makes it look like the `optimizer` takes up all the time,
even though the actual calculations and updates by the optimizer are not taking that much time.

One immediately obvious bit:
our GPU utilization is not great.

We're looping over parameters,
in Python,
and applying the ADAM update rules to each,
resulting in the launch of a number of kernels
proportional to the number of layers in the model.

As of writing in August 2022,
more efficient optimizers are not a stable part of PyTorch,
which is at v1.12, but
[there is an unstable API](https://github.com/pytorch/pytorch/issues/68041)
and stable implementations outside of PyTorch, e.g.
[in NVIDIA's `apex` library](https://nvidia.github.io/apex/optimizers.html),
not to be confused with the
[Apex Optimizers Project](https://www.apexoptimizers.com/),
which is a collection of fitness-themed cheetah NFTs.

### Take-aways for PyTorch performance bottleneck troubleshooting

Our goal here was to learn some basic principles and tools for bottlenecking
the most common issues and the lowest-hanging fruit in PyTorch code.

Towards that goal, we viewed the trace to get an understanding of
what's going on inside a PyTorch training step,
in terms of a "host", generally the CPU, and a "device", here the GPU.

- host moves through compute graph, not caring about content unless it has to. it's just teeing up.
- the host has metadata, like type and shape, which it uses to select operations. convolutions with very large filter sizes, for example, might use fast Fourier transform-based convolution algorithms, while the smaller filter sizes typical of contemporary CNNs.
- device executes actual operations.

And how to optimize it:
- goal: Python can slowly chew its way through looking up the right CUDA kernel and telling the GPU that's what it needs in time before the previous kernel finishes.
- Ideally, we're actually getting far ahead of execution. The CPU makes it all the way through the backwards pass before the GPU is done.
- It's more like a navigator who is steering a ship. They move their finger over the route on the map, delivering steering directions, at the same time as the ship moves over the route in the real world. The navigator needs to know where they are going before they can give steering directions, and so long as they can just use the map and never have to "go to visual" and look at the environment, the ship will never be sitting idle waiting on the navigator.
- operationalization: 100% GPU utilization, meaning a kernel is running at all times. this is the aggregate metric reported in the systems tab on W&B or in the output of `!nvidia-smi`
- hardmode: high occupancy **(check definition)**. getting to 80%+ is challenging.
- sharp edge: some operations require knowledge not available until the value is computed, see the `type_as` operation, which causes a synchronization between host and device
- sharp edge: Python is very slow, so if you throw in a really slow Python operation, like dynamically creating classes or iterating over a bunch of bytes, esp from disk, that can easily dwarf the "actually hard" part running in a fast language (C++) on the GPU

Common issue: the `DataLoader`s are the bottleneck.

See suggestions from `timm` author Wightman. FFCV library. Hugging Face `FastTokenizers` with Rust acceleration.

Faster CPUs can be important, and good network connections!

#### Next steps

But high utilization does not mean high efficiency. Just spinning the GPUs is enough there.

For example, double precision floats. Fun example: results in convolutions taking longer than anything else, since 64 bit floats don't have as fast of conv implementations.

Synchronization events between GPUs.

Always need to also consider examples per second -- and the gold star is _decrease in loss per second_.

For PyTorch internals abstractly, see Ed Yang's blogpost.

For performance considerations in PyTorch, see Horace Xu's blogpost.

# Exercises

### 🌟 Compare `num_workers=0` and `num_workers=1`

For a comparison between `0` and `DEFAULT_NUM_WORKERS`, see results [here](https://wandb.ai/cfrye59/fsdl-text-recognizer-2022-training/artifacts/trace/trace-2eddoiz7/v0/files/training_step.pt.trace.json#f388e363f107e21852d5$trace-67j1qxws).

Our dataset here fits in RAM, so large numbers of workers don't tend to cause issues.

### 🌟🌟 Resolve issues with a file by fixing flake8 lints, then write a test.

Add links for `flake8` error codes.

Add a simple function here -- some basic torch operations with a clear name?

Write bad docs that trigger flakes.

In [None]:
%%writefile training/fixme.py
import torch
from training import run_experiment
from numpy import *
import random
from pathlib import Path
def foo(model):
  return 1

In [None]:
!pre-commit run black --files training/fixme.py

In [None]:
!cat training/fixme.py

In [None]:
!pre-commit run --files training/fixme.py