# Packages and modules

All code in python ultimately lives in a module, and a module may or may not live in a package.  A plain script can also  
be considered a module but at the top level (it is not nested inside a package).  In this lesson, we will learn:

- What is a module good for?
- Modularity and Code organization
    - What a module is
    - What a package is
        - What`__init__` is for
    - What is `__main__`?
    - How to best lay out a project's code
- How modules are searched for
    - `sys.path` and site-packages
    - What the PYTHONPATH is
- How modules get loaded
    - relative imports
    - absolute imports
    - importlib
    - sys.modules
    - shadowing builtin modules
    - reloading a module with new changes

## Code organization

Like all programming languages, python has a way to group related and reuseable code together.  This relation might come  
in many forms such as by services, model types, business units, or any other logical grouping you can think of.  At a  
basic level, you can think of the hierarchy of code organization as:

- package
    - package
    - modules
        - classes
        - functions
        - variables (global)
        - type aliases
        - generics (until 3.12)
        - executable code (executed when imported)

So as you can see, a module can be a container for many things, and a package can contain subpackages. A single module  
can have multiple classes or functions.  It is not like Java that requires a single class per file.

### Package vs Module vs Script

Sadly, it can be confusing in python talking about packages, modules and scripts, because they all kind of have fuzzy   
demarcation lines, and often they are used synonymously.  For example, the [Python Package Index](https://pypi.org/)  
(PyPI) talks about python _packages_ and yet, what is there might be a single module.  There are however a few  
distinctions generally:  

- Packages are basically containers of modules and are defined as a directory
    - The `__init__.py` file in the package directory root declares the directory as a _package_
        - As an exception, you can omit the `__init__`, but it does have other uses (eg `__all__` or executable code)
        - Namespace packages can omit `__init__`, because you can spread the package across multiple directories
            - This is useful for example, to define _plugins_
        - When a module in a _package_ is imported, it will execute any code in the `__init__` file (if any)
            - It will execute this only once even if multiple modules are imported
            - Similarly, if you `import` a module twice, it will not run executable code again (unless `reload` is used)
- Modules and packages can both contain executable code
    - Module executable code _should_ be relegated to initialization code only unless designed as an executable module
    - Packages can also execute code from the `__init__.py` file
        - The code in `__init__.py` is invoked, even if you import only a single submodule
- Scripts can be imported
    - A module is just a python file and can be loaded by the `import` statement

## What uses modules?

Before we talk about how to import a module to load the code, we need to ask ourselves, "who does the importing?".

- Modules can import other modules
- A script (the main entrypoint)

### `__main__`

As mentioned above, modules and scripts can have a blurry line.  Typically though, modules are meant as libraries, to be  
used by other code, and any executable code is just meant for initialization.  On the other hand, there is a convention  
to be able to have executable code that is only run when the name of the module  
is `__main__`

> If you have created virtual environments with `python -m venv dir` then you executed the module `venv`.  Typically  
the name of a module is it's file name, and is included in the field `__name__`.  However, if you run python with the  
`-m` option you can pass in the package.module path and the module name will then be `__main__`


In [None]:
print(__name__)

if __name__ == "__main__":
    print("you should see this print")

## How to organize your code

Because packages are really just a directory structure, it's a good idea to structure your code in an intuitive way.

First, the name of your project (what's uploaded to PyPI) should be the main package structure.  This is really the only
hard fundamental rule.  From there, you can create other directories (subpackages) to group related modules together.

Other common top-level directories include:

| Folder         | .gitignore? | Purpose                                                                 |
|----------------|-------------|-------------------------------------------------------------------------|
| dist           | yes         | To hold the `whl` and `sdist` files generated by poetry or distutils
| docker         | no          | Dockerfiles and scripts used in Dockerfiles
| notebooks      | no          | Holds jupyter notebooks
| tests          | no          | Hold test code
| .venv          | yes         | the virtualenv folder used to isolate site-packages
| .gitignore     | no          | your .gitignore
| poetry.lock    | maybe       | the actual version dependencies used in your project
| pyproject.toml | no          | PEP compliant file used by other tooling (pip, autopep, poetry, etc)
| README.md      | no          | Project description  
| {package}      | no          | The name of your package, usually the same name as the top-level folder

Inside the actual package, sometimes it is useful to distinguish between executable scripts, and regular libraries.  
So it is also common to see this layout:

- {package}
    - scripts: for executable "apps"
    - config: for configuration files
    - resources: things like default data files, images, etc


> Note that it is relatively common to have both a subfolder with the same name as the top level folder.  For example,  
in the excursor project, when you clone it, the project is named `excursor`, but inside that folder, is another folder  
named excursor.  The reason why will be explained in a bit

## Importing a module

Loading a module is called importing, and there are many ways to import.  You can import a module

In [None]:
"""This is a module docstring, which is optional.  If used, it must the very first entry.  That means you can either  
have a `from __future__ import foo` statement, or a docstring, but not both, since they both must be the first entry"""

# builtin packages and modules
# importing another module
import os
# import multiple modules from a package
from datetime import datetime, timezone, timedelta


# 3rd party packages
# rename an import
import duckdb as dd

# Internal packages (locally in your directory tree)
# Using package name
from excursor.core.process import Run
# Using relative imports (don't do this unless you can't run code from the base dir)
#from ....excursor.func import Functor

## Defining the path to modules

Generally speaking, when we are coding in python, we think of modules as coming from one of three places:

1. A builtin module like `datetime` or `os`
2. Our own project's modules
3. 3rd party dependencies that were installed by `pip` or `conda`

Fundamentally however, all three look to the same paths; `sys.path`

In [None]:
import sys
from pprint import pprint

pprint(sys.path)

In [None]:
# Now we can look at loaded modules
pprint(sys.modules, indent=2)

## Package Management

There are 2 big guns in the python world when it comes to packaging: pip and [conda](https://www.anaconda.com/).  I  
haven't used conda, mostly because it's not truly free even though it's become extremely popular in the data science  
world.  I'm going to only go over pypi, which is the repository of python packages that pip downloads from.

I'm actually going to go over poetry rather than pip because it has a better dependency resolver than pip.  If you need  
to use pip for your service, you can also look at [pipenv](https://pipenv.pypa.io/en/latest/) instead.

Package management and creating python `whl` and `sdist` tarballs used to be an extremely complicated process involving  
some rather arcane knowledge about setuptools, distutils, and other disparate topics.  To keep things short, I am only
going to cover the basics of poetry:

- Creating a virtualenv
- Installing poetry
- Creating a new poetry project
- Adding a dependency
- The poetry.lock file
- Development dependencies
- Private repos