# Code organisation 

As I'm responsible for the output from more and more programmers I thought it was about time to write down some notes on good habits for organising code.

Perhaps this notebook will evolve from a remember-list - a kind of template for a todo-list - to a verification scripts. I'm not sure, but the prospects is there.


To keep the realism high I base the notes on the [tablite](https://github.com/root-11/tablite) project, instead of some theoretical `myproject` as that would feel artificial.


## Laying the foundations.

[1] Get the __right__ python version installed!

[2] Create virtual environment with 

```c:\....\Python310\Scripts\python.exe -m venv d:\venvs\tablite310``` 

I prefer calling the venv something meaningful so there's an association between project and python version - hence the 310 postfix.

[3] Activate the virtual environment: `d:\venvs\tablite\Scripts\activate.bat`. You can see that the venv is "active" as the commandline has the bracketed venv prefix:

```
(myproject) d:\
```

[4] Install pytests: `(tablite310) d:\github\tablite> pip install pytest`


[5] Run pytests on the empty project: `(tablite) d:\github\tablite> pytest` to see that there are no errors and that your python version reveals that you're the right virtual environment.

[6] create the project folder `d:\github\tablite` with the following contents. We will add content later, but if you're burning to see details, go to the [tablite source repo](https://github.com/root-11/tablite) and look at the files:

```
.github\workflows
    python-package.yml
.vscode\
    launch.json
    settings.json
dist\
tablite\
    __init__.py
    core.py  # sometimes called src.py or main.py
    utils.py
tests\
    data\
    __init__.py
    test_basics.py
.gitignore
LICENSE
README.md
setup.py
tutorial.ipynb
```

Notice that only the [README.md](https://github.com/root-11/tablite/blob/master/README.md) and [LICENSE](https://github.com/root-11/tablite/blob/master/LICENSE) are in uppercase. The PEP-8 (guideline) recommends that files and functions are in the pot_hole case, and only Classes are in the CamelCase, so let's stick to that.

You may think that `LICENSE` is not important in your project. Good faith or naivety however will not be a legal cover in every country. And as your code is on the **world wide** web, you could face charges if you land in a country where someone wants to blame you for neglect. So just add that [MIT license](https://opensource.org/licenses/MIT) and move on.

The folder `dist\` is deliberately empty. It will contain the packages that you'll upload to pypi when you create them (later), so just leave it for now.

We will need to commit all of this to version control. For this to be a breeze we would like a `.gitignore` file with this content:

```
# IDE configs.
.idea/   
*.code-workspace
.vscode/

# Compiled files
*.pyc

# Notebook checkpoints
.ipynb_checkpoints/

# Packages
*.egg-info
dist/
build/
```

The `.gitignore` files means that we have a clean view of what is monitored by version control. 

Using `#` for comments on what those different escapes mean is a great help. Use it!


The `tests\` are deliberately kept seperate from the `tablite\` so that `tests\` can import the modules from `tablite` just as if they were installed using pip.


## Make the architecture work as the first thing.

There is nothing worse than spending time to get a project to run it's own test suite. 
For VScode this is simplified using the contents of the `.vscode` folder:

`launch.json` contains this:
```
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File (Integrated Terminal)",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "InternalConsole",
            "redirectOutput": true
        }
    ]
}
```
which instructs VScode to run any active file you're using.

You'll also need to send pytest the `-sv` arguments to assure that you can see any printouts as pytest otherwise hides them. This is set in `settings.json`:

```
{
    "python.defaultInterpreterPath": "C:/Data/venv/tablite310/Scripts/python.exe",
    "python.testing.pytestArgs": [
        "tests",
        "-sv"
    ],
    "python.testing.unittestEnabled": false,
    "python.testing.pytestEnabled": true

    
}
```


# Let's commit all of this to git.

Next install `git` and set your credentials: 
```
git config --global user.name “[firstname lastname]”
git config --global user.email “[valid-email]”
```

Now you're ready to use git. read the [cheat sheet from github](https://education.github.com/git-cheat-sheet-education.pdf) and commit to version control by:
```
d:\github\tablite> git init
d:\github\tablite> git commit -a "initial commit"
```



## Imports

Imagine we have a module called `x.py`

```# x.py
def dumps(text):
    return bytes(text)
```

and you choose to type

```
from x import *
from json import dumps
```
What exactly is going to happen in your namespace?

1. First `dumps` is imported from `x`.
2. Then `dumps` is imported from `json`, which overrides the variable declaration from `x`.

**Conclusion**

Do never ever use `from x import *`, as you will have no idea what exists in the namespace.






# Logging

Logging is supposed to be simple. Let's keep it that way.

In your module start with importing pythons logging module

```
import logging

logger = logging.getLogger()

try:
    logger.debug("this works?")
    # your code here
    logger.debug("it worked! ")
except Exception:
    logger.debug("it didn't!" )
    raise

```

In your test, you can set the log level to figure out what might be wrong:

```
import logging
logging.setLevel(logging.DEBUG)

# your code goes here.

```
As your test progresses, you'll see all the log messages.





# Building the package

To build the pypi package we'll follow the instructions from [packaging.python.org](https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/)

There are 3 parts to that:

1. set the installer up using the file `setup.py`
2. create the package
3. upload the package.


### set the installer up

My [`setup.py`](https://github.com/root-11/tablite/blob/master/setup.py) depends on the file `requirements.txt` to know which packages to add when running `tablite`. The [list is quite short](https://github.com/root-11/tablite/blob/master/requirements.txt):

```
tqdm>=4.63.0
graph-theory>=2022.3.9.54615
numpy>=1.22.3
h5py>=3.6.0
psutil>=5.9.0
chardet==5.0.0
pyexcel==0.7.0
pyexcel-odsr==0.6.0
pyperclip==1.8.2
pyexcel-xlsx==0.6.0
pyexcel-xls==0.7.0
pyuca>=1.2
mplite==1.1.0
```

In setup.py I read this file using:
```
with open('requirements.txt', 'r') as fi:
    requirements = [v.rstrip('\n') for v in fi.readlines()]
```

and add the requirements to the `setup` function like this:

```
setup(
    name="tablite",
    version=__version__,  <------- VERSION HERE!
    url="https://github.com/root-11/tablite",
    license="MIT",
    author="Bjorn Madsen",
    author_email="dr.bjorn.madsen@gmail.com",
    description="A library for cleaning tabular data.",
    long_description=long_description,
    long_description_content_type='text/markdown',
    keywords=keywords,
    packages=["tablite"],
    include_package_data=True,
    data_files=[(".", ["LICENSE", "README.md", "requirements.txt"])],
    platforms="any",
    install_requires=requirements,   <---------- REQUIREMENTS HERE!
    classifiers=[
        "Development Status :: 5 - Production/Stable",
        "Intended Audience :: Science/Research",
        "Natural Language :: English",
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
    ],
)
```

You probably also noticed the `VERSION HERE!` hint. As `pypi` depends on having packages in numeric order it is a really good idea to have a "single source" for your version number. There is a brief introduction [here](https://packaging.python.org/en/latest/guides/single-sourcing-package-version/) but I can summarize:

I have a file `tablite\version.py` that contains only this:
```
major, minor, patch = 2022, 7, "dev5"
__version_info__ = (major, minor, patch)
__version__ = '.'.join(str(i) for i in __version_info__)
```

When I run `setup.py` I can't import tablite because the build tools create a new empty virtual environment, so I do this instead:
```
version_file = Path(__file__).parent / "tablite" / "version.py"
exec(version.read_text())
```

`exec(version.read_text())` reads the `version.py`-file as a script and loads the variables into the interpreter. Thereby the function `setup(...)` has the variable `__version__` available and can use it in the build instructions.


All the other attributes are less critical and you can probably deduce from their content what each of them does. If not, please visit the [packaging guidelines](https://packaging.python.org/en/latest/tutorials/packaging-projects/).


### create the package

Now we're ready to create the pypi package. This will require only this command:

```
(tablite310) D:\github\tablite>python -m build --wheel

```

When running it you should see something like the following:

```
* Creating venv isolated environment...
* Installing packages in isolated environment... (setuptools >= 40.8.0, wheel)
* Getting dependencies for wheel...
running egg_info
writing tablite.egg-info\PKG-INFO
writing dependency_links to tablite.egg-info\dependency_links.txt
writing requirements to tablite.egg-info\requires.txt
writing top-level names to tablite.egg-info\top_level.txt
reading manifest file 'tablite.egg-info\SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'tablite.egg-info\SOURCES.txt'
* Installing packages in isolated environment... (wheel)
* Building wheel...
running bdist_wheel
running build
running build_py
running egg_info
writing tablite.egg-info\PKG-INFO
writing dependency_links to tablite.egg-info\dependency_links.txt
writing requirements to tablite.egg-info\requires.txt
writing top-level names to tablite.egg-info\top_level.txt
reading manifest file 'tablite.egg-info\SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'tablite.egg-info\SOURCES.txt'
installing to build\bdist.win-amd64\wheel
running install
running install_lib
creating build\bdist.win-amd64\wheel
creating build\bdist.win-amd64\wheel\tablite
copying build\lib\tablite\config.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\core.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\datatypes.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\file_reader_utils.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\groupby_utils.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\memory_manager.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\sortation.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\utils.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\version.py -> build\bdist.win-amd64\wheel\.\tablite
copying build\lib\tablite\__init__.py -> build\bdist.win-amd64\wheel\.\tablite
running install_data
creating build\bdist.win-amd64\wheel\tablite-2022.7.dev5.data
creating build\bdist.win-amd64\wheel\tablite-2022.7.dev5.data\data
copying LICENSE -> build\bdist.win-amd64\wheel\tablite-2022.7.dev5.data\data\.
copying README.md -> build\bdist.win-amd64\wheel\tablite-2022.7.dev5.data\data\.
copying requirements.txt -> build\bdist.win-amd64\wheel\tablite-2022.7.dev5.data\data\.
running install_egg_info
Copying tablite.egg-info to build\bdist.win-amd64\wheel\.\tablite-2022.7.dev5-py3.10.egg-info
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build\bdist.win-amd64\wheel\tablite-2022.7.dev5.dist-info\WHEEL
creating 'D:\github\tablite\dist\tmptmy21zqo\tablite-2022.7.dev5-py3-none-any.whl' and adding 'build\bdist.win-amd64\wheel' to it
adding 'tablite/__init__.py'
adding 'tablite/config.py'
adding 'tablite/core.py'
adding 'tablite/datatypes.py'
adding 'tablite/file_reader_utils.py'
adding 'tablite/groupby_utils.py'
adding 'tablite/memory_manager.py'
adding 'tablite/sortation.py'
adding 'tablite-2022.7.dev5.dist-info/top_level.txt'
adding 'tablite-2022.7.dev5.dist-info/RECORD'
removing build\bdist.win-amd64\wheel
Successfully built tablite-2022.7.dev5-py3-none-any.whl

```

This long logoutput may be daunting, but it gives you the opportunity to check that all the right files were added.

The file we will upload in a second is the last entry in the log:
```
tablite-2022.7.dev5-py3-none-any.whl
```
which you'll find in the folder `dist\`.

You can also inspect the package using `7-zip` as it's just a zip with the required files.

### upload the package

To upload you'll first need to create an account with `pypi`. Do that now.

[https://pypi.org/manage/projects/](https://pypi.org/manage/projects/)


Once you have an account you can check (1) and publish (2) your package. 

first we check using `twine check dist\*`

```
(tablite310) D:\github\tablite>twine check dist\*
Checking dist\tablite-2022.7.dev5-py3-none-any.whl: PASSED
Checking dist\tablite-2022.7.dev0.tar.gz: PASSED
Checking dist\tablite-2022.7.dev4.tar.gz: PASSED
```

The we upload using `twine upload sdist\*`
```
(tablite310) D:\github\tablite>twine upload dist\tablite-2022.7.dev5-py3-none-any.whl
Uploading distributions to https://upload.pypi.org/legacy/
Enter your username: bjorn.madsen
Enter your password:
Uploading tablite-2022.7.dev5-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB • 00:00 • 3.8 MB/s

View at:
https://pypi.org/project/tablite/2022.7.dev5/

(tablite310) D:\github\tablite>
```

Congratulations. You're now an official pypi contributor!


# Checking packages

So you've built a nice package and uploaded it to pypi. That's nice. Does it work? - Let's check that other users can use it.

First let's create a empty / fresh virtual environment using the original python:

```
C:\Users\madsenbj\AppData\Local\Programs\Python\Python310\python.exe -m venv c:\Data\venv\tmp
```

Next we activate that environment

```
c:\Data\venv\tmp\Scripts\activate.bat
```

Then we do a test-install straight from pypi

```
(tmp) D:> pip install myproject
```

If you get an error like this, don't panic. The hint for the error is in line 3:

```
PS D:\gitlab\mfdesigner> & C:/Data/venv/mfdesigner310/Scripts/Activate.ps1
(mfdesigner310) PS D:\gitlab\mfdesigner> pip install tablite==2022.7.dev5
Collecting tablite==2022.7.dev5
  Using cached tablite-2022.7.dev5.tar.gz (51 kB)  <---------------- HINT: cached!
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.

<cut for brevity>

× Encountered error while generating package metadata.
╰─> See above for output.
```

You've probably built a package earlier and now as you use `pip install` pip uses the cached version. NOT the version that you've just uploaded.

To overcome this problem, make it a habit to use `pip install myproject --no-cache` and now it will work:


```
(mfdesigner310) PS D:\gitlab\mfdesigner> pip install tablite==2022.7.dev5 --no-cache
Collecting tablite==2022.7.dev5
  Downloading tablite-2022.7.dev5-py3-none-any.whl (54 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 kB 1.4 MB/s eta 0:00:00

<cut for brevity>

Collecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Using legacy 'setup.py install' for mplite, since package 'wheel' is not installed.
Using legacy 'setup.py install' for pyperclip, since package 'wheel' is not installed.
Building wheels for collected packages: graph-theory
  Building wheel for graph-theory (pyproject.toml) ... done
  Created wheel for graph-theory: filename=graph_theory-2022.3.9.54615-py3-none-any.whl size=53795 sha256=d675a72c18d0ebf704e737efbd135f80eaac4c12eb65d5f0752838418863bdde
  Stored in directory: C:\Users\madsenbj\AppData\Local\Temp\pip-ephem-wheel-cache-k2ntov1y\wheels\5c\01\5b\8f28bd95cf08edb938b43af28997be35b9e51dc7586036705a
Successfully built graph-theory
Installing collected packages: xlwt, texttable, pyuca, pyperclip, lml, graph-theory, xlrd, pyexcel-io, psutil, numpy, lxml, et-xmlfile, colorama, tqdm, pyexcel-xls, pyexcel-odsr, pyexcel, openpyxl, h5py, pyexcel-xlsx, mplite, tablite
  Running setup.py install for pyperclip ... done
  Running setup.py install for mplite ... done
Successfully installed colorama-0.4.5 et-xmlfile-1.1.0 graph-theory-2022.3.9.54615 h5py-3.7.0 lml-0.1.0 lxml-4.9.1 mplite-1.1.0 numpy-1.23.1 openpyxl-3.0.10 psutil-5.9.1 pyexcel-0.7.0 pyexcel-io-0.6.6 pyexcel-odsr-0.6.0 pyexcel-xls-0.7.0 pyexcel-xlsx-0.6.0 pyperclip-1.8.2 pyuca-1.2 tablite-2022.7.dev5 texttable-1.6.4 tqdm-4.64.0 xlrd-2.0.1 xlwt-1.3.0
(mfdesigner310) PS D:\gitlab\mfdesigner> 

```

Now we're good. We __know__ it works.








