# Python Packages - for Training Code
### IN ACTIVE DEVELOPMENT - not complete

At the simplest, all the training code may be in a single `filename.py` file that is a module. There are a couple of layers of depth that are commonly added to this:

**Python Modules**

Modules are files: `filename.py`

**Python Project**

Projects are collections of **Python Modules** in folders and possibly subfolders.  Here is an example project named `trainer`.
```bash
│   │   ├── trainer/
│   │   │   ├── __init__.py
│   │   │   ├── train.py
│   │   │   ├── module_1.py
│   │   │   ├── helpers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── module_a.py
│   │   │   │   ├── module_a.py
```
Here the `train.py` might have `import module_1` and `import helpers.module_a as module_a`.  Note the `__init__.py` file in the folders - this is an empty file that lets Python know the folder can be imported as a module.

**Python Packages**

Packages are creating by adding necessary files to a **Python Project** to help create a distribution package.
```bash
├── training_package/
│   ├── pyproject.toml
│   ├── src/
│   │   ├── trainer/
│   │   │   ├── __init__.py
│   │   │   ├── train.py
│   │   │   ├── module_1.py
│   │   │   ├── helpers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── module_a.py
│   │   │   │   ├── module_a.py
```

Example `pyproject.toml` file that sets `setuptools` as the build system:
```python
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = 'trainer'
version = '0.1'
dependencies = ['tensorflow_io', 'google-cloud-aiplatform>=1.17.0']
description = 'Training Package'
authors = [{{name = 'statmike'}}]
```

**Python Distribution Archive**

Prepare **Python Packages** for distribution - called an archive, distribution, or distribution archive. There are two formats for these:
- `file.tar.gz` is a source distributions
    - created with `python setup.py sdist` or `python -m build` run in the package level folder
    - tarballs, `file.tar`, a collection of files wrapped into a single file
    - compressed, `file.tar.gz`, using [gzip](https://www.gzip.org/)
    - contains metadata and source files to be installed by pip
- `.whl` is a built distribution
    - created with `python setup.py bdist_wheel` or `python -m build` from the `package` level folder
    - wheels, `file.whl`, built into a compressed binary format that is portable

Notes on distribution tools:
- here we use the setuptools directly with
    - `python setup.py sdist` which automaticlaly creates `file.tar.gz` by default
    - `python setup.py bdist_wheel` which creates `file.whl`
- `python -m build` would use a `project.toml` file which could specify setuptools and will automaticaly create .whl and .tar.gz versions

write these up also when clean above
pyproject.toml and setup tools will by default look for packages in common folder layouts, or you can specify package.  package data, if present is include by default.  pytproject.toml can also link to readme.md and license



**Installing Packages**

When you `pip install ...` what is happening?  This causes pip to look for the package and install it.  The default location to look is [PyPI](https://pypi.org/).  This can be overridden:
- local install `pip install path/to/file.tar.gz` or `pip install path/to/file.whl`
- install from custom repository on Artifact Registry with `pip install --index-url https://{REGION}-python.pkg.dev/{PROJECT}/{REPOSITORY}/{PACKAGE}/ sampleproject`


Resources:
- [pip install](https://pip.pypa.io/en/stable/cli/pip_install/)
- [Packaging Python Projects Tutorial](https://packaging.python.org/en/latest/tutorials/packaging-projects/)
- [setuptools](https://docs.python.org/3/distutils/sourcedist.html)
- [setuptools quickstart](https://setuptools.pypa.io/en/latest/userguide/quickstart.html)

---
## Package Installs (if needed)

The cells below check to see if the required Python libraries are installed.  If any are not it will print a message to do the install with the associated pip command to use.  These install must be completed before continuing this notebook.

In [109]:
try:
    import build
except ImportError:
    print('You need to pip install build')

---
## Setup

inputs:

In [4]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [32]:
REGION = 'us-central1'
EXPERIMENT = 'packages'
SERIES = 'tips'

packages:

In [97]:
import os, shutil

from google.cloud import aiplatform

clients:

In [98]:
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [19]:
DIR = f'temp/{EXPERIMENT}'

environment:

In [85]:
# remove directory named DIR if exists
shutil.rmtree(DIR, ignore_errors = True)

# create directory DIR
os.makedirs(DIR)

# check for existance of DIR
print('DIR exists? ', os.path.exists(DIR))

# list contents of directory one level higher than DIR
os.listdir(DIR + '/../')

True


['job-parms', 'tips_build', '.ipynb_checkpoints', 'multi', 'gcs']

### Construct Python Package

Use the temp dirctory crated at DIR:

In [81]:
DIR

'temp/tips_build'

In [80]:
os.listdir(f'{DIR}/../')

['job-parms', 'tips_build', '.ipynb_checkpoints', 'multi', 'gcs']

create the folder structure:

In [86]:
os.makedirs(DIR+'/trainer/src/trainer')

In [91]:
for r, d, f in os.walk(DIR):
    for s in d:
        print(os.path.join(r, s))

temp/tips_build/trainer
temp/tips_build/trainer/src
temp/tips_build/trainer/src/trainer


add files to directory:

In [102]:
shutil.copyfile('../05 - TensorFlow/05_train.py', f'{DIR}/trainer/src/trainer/train.py')
with open(f'{DIR}/trainer/src/trainer/__init__.py', 'w') as file: pass

In [119]:
with open(f'{DIR}/trainer/pyproject.toml', 'w') as file:
    file.write(f"""[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = 'trainer'
version = '0.1'
dependencies = ['tensorflow_io', 'google-cloud-aiplatform>={aiplatform.__version__}']
description = 'Training Package'
authors = [{{name = 'statmike'}}]
""")

list directory:

In [120]:
for root, dirs, files in os.walk(DIR):
    for d in dirs:
        print(os.path.join(root, d))
    for f in files:
        print(os.path.join(root, d, f))

temp/tips_build/trainer
temp/tips_build/trainer/.ipynb_checkpoints
temp/tips_build/trainer/src
temp/tips_build/trainer/src/pyproject.toml
temp/tips_build/trainer/src/trainer
temp/tips_build/trainer/src/trainer/trainer/__init__.py
temp/tips_build/trainer/src/trainer/trainer/train.py


build Python distribution archive:

In [121]:
!cd ./{DIR}/trainer && python -m build

[1m* Creating virtualenv isolated environment...[0m
[1m* Installing packages in isolated environment... (setuptools)[0m
[1m* Getting dependencies for sdist...[0m
running egg_info
creating src/trainer.egg-info
writing src/trainer.egg-info/PKG-INFO
writing dependency_links to src/trainer.egg-info/dependency_links.txt
writing requirements to src/trainer.egg-info/requires.txt
writing top-level names to src/trainer.egg-info/top_level.txt
writing manifest file 'src/trainer.egg-info/SOURCES.txt'
reading manifest file 'src/trainer.egg-info/SOURCES.txt'
writing manifest file 'src/trainer.egg-info/SOURCES.txt'
[1m* Building sdist...[0m
running sdist
running egg_info
writing src/trainer.egg-info/PKG-INFO
writing dependency_links to src/trainer.egg-info/dependency_links.txt
writing requirements to src/trainer.egg-info/requires.txt
writing top-level names to src/trainer.egg-info/top_level.txt
reading manifest file 'src/trainer.egg-info/SOURCES.txt'
writing manifest file 'src/trainer.egg-inf

list directory:

In [122]:
for root, dirs, files in os.walk(DIR):
    for d in dirs:
        print(os.path.join(root, d))
    for f in files:
        print(os.path.join(root, d, f))

temp/tips_build/trainer
temp/tips_build/trainer/.ipynb_checkpoints
temp/tips_build/trainer/src
temp/tips_build/trainer/dist
temp/tips_build/trainer/dist/pyproject.toml
temp/tips_build/trainer/src/trainer.egg-info
temp/tips_build/trainer/src/trainer
temp/tips_build/trainer/src/trainer.egg-info/trainer/top_level.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/SOURCES.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/requires.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/dependency_links.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/PKG-INFO
temp/tips_build/trainer/src/trainer/trainer/__init__.py
temp/tips_build/trainer/src/trainer/trainer/train.py
temp/tips_build/trainer/dist/trainer/trainer-0.1-py3-none-any.whl
temp/tips_build/trainer/dist/trainer/trainer-0.1.tar.gz


This directory now has three key items:
- a single training file: {DIR}/training/src/trainer/train.py
- a folder of training code: {DIR}/training/src/trainer*
    - with a starting point of train.py
- a source distribution: {DIR}/training/dist/trainer-0.1.tar.gz