# Python Development with Gitlab CI and Tox

In this post I want to give a brief overview how to structure a Python package, setup continuous integration with Gitlab and `tox-conda` for data science projects that I found quite nice to work with.

Most data scientists coming from R will probably ask themselves:

> What is *the* way to structure a Python package?

While in R there is only one (admittedly sometimes limiting) way to structure a package, you have got quite a lot of options with in the Python world.

Since data science workloads rely heavily on the `conda` packaging ecosystem (especially on Windows machines), I will focus on how to make everything play nicely with `conda`.

## Setup a conda environment

We begin setup by creating a new conda environment and specifying the interpreter we want to use and activating it:
```
conda create --name <environment_name> python=3.8 -y
conda activate <environment_name>
```

Now we need to setup our package structure. In R there is a function called `package.skeleton()` (or alternatively RStudio's GUI) that create a basic package with all necessary files. For Python there are different packages that provide similar functionality (such as `cookie-cutter`), but I will quickly outline a minimal setup here:

```
src/
    package_name/
        __init__.py
        python_file1.py
        ...
        python_fileN.py
tests/
notebooks/
outputs/
setup.py
gitlab-ci.yml
tox.ini
config.yml
CHANGELOG.md
README.md
.gitignore
.env
```


You can download a zip file containing a sample project from [here].

Let's take a quick look at the content of:

- setup.py
- gitlab-ci.yml

Don't forget to exclude your `.env` file from version control by putting it in `.gitignore` in case you have any api tokens or similar in your environment file.

You can specify environment variables in Gitlab to be used by your pipelines where you can mask them in log outputs.

For development, I install the package with `pip install -e .` and include the following code in my Jupyter notebook to reload the package after an update:

```
# Load autoreload module 
# autoreload 1: reload modules imported with %aimport everytime before execution
# autoreload 2: reload all modules
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext dotenv

dotenv -o "../.env"
```

explain reason for src/package_name structure
setup jupyter kernel
talk about setup.py ssh install
tox-conda

## Create draft template to download to automate setup process