# Excercises

## 1 Your own repo
To practice with creating your own repo, create a directory outside of ML22:
### create a folder
```bash
cd ~/code
mkdir les5
cd les5
poetry init
```

### Optional: add the cookiecutter
Optional, you can use the datascience cookiecutter:
```bash
poetry add cookiecutter
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science
```
Go through the questions.
As python interpreter, pick `python`.
I advise cleaning out the `make_dataset.py` and `Makefile`
While the cookiecutter is nice (it sets up a lot of things for you), I wanted something more custom, and I created
https://github.com/raoulg/DsTemplate to have a custom template that drops some things I never use.

Downside is you need to install julia to use it, so maybe stick with the cookiecutter for now.

### Add linters
```bash
poetry add black flake8 isort mypy pytest pep8-naming flake8-annotations hypothesis jupyter --dev
```

If you want, you can copy the contents of our Makefile. 
Rename the folder you want to lint if you used another name as `src` for your sourcecode folder.

You should be able to run
```bash
make format
make lint
```
on your code from now on (well, if you added a hypertune.py file, off course)
You can also run the commands manually.

### Initialize a git folder
```bash
git init
git add .
git commit -m 'init'
```

## 2 Hypertuning
### Objective
In hypertune.py, I have set up an example for hypertuning.
Implement your own hypertuner for another model / dataset from scratch. 

- make sure your environment is reproducable (dont blindly reuse the environment from the lessons; use a minimal environment)
- make a function to pull your data into data/raw
- make a model
- build a hypertuner.py script
- make notebooks that show other people how to use your project
- Lint and format your code with the makefile untill all your errors are gone.


### data
You could pick the fashionMNIST set if you want to work on that for your assignment,
but you could also pick the flowers, or even another set you like from [torchvision](https://pytorch.org/vision/0.8/datasets.html) or [torchtext](https://pytorch.org/text/stable/datasets.html#imdb).

### Use pydantic 
Create your own pydantic config for your hypertuner









# Common gotchas

- you can't blindly copy paths. If you want to add your own `src` folder for import, you need to figure out if your `src` folder is located at the ".." or "../.." location, relative to your notebook in your `sys.path.insert(0, "..")` command
- same goes for datalocations. "../../data/raw" might have changed into "../data/raw" depending in your setup
- While developing functions, you can:
    1. Write the function in a .py file, and (re)load into a notebook if you change it. Note that ray tune can be problematic to run from inside a notebook.
    2. Make a hypertune.py file and excecute from the command line. You will never have problems with reloading functions, because that is done every time you start up.
- Build your own poetry toml file with caution. Dont add stuff you dont need (e.g. you probably don't need jax, trax, plotly or scikit-learn). It's better to add later on if you miss something, than to add everything and get a bulky environment.
- PRACTICE linting and formating with the Makefile. Black is simple, mypy takes more effort but you will become a better programmer if you use mypy, and mypy will catch possible errors that dont show up during a first run (but might show up later, with different input) 


