# Module 1, Lecture 2: Python Ecosystem

Today's focus will be on how Python modules work and how they are structured. We'll also be taking a look at tools for using third party packages.

## Modules

When you write Python, code lives in .py files. 

These files are called modules. Modules are the basic unit of organization in Python.  Modules can be imported into other modules or be run as a script.

Most non-trivial programs are made up of multiple files/modules.

* Code reuse: common functions can be shared between projects (e.g. `import math` or `import pathlib`).
* Namespace partitioning: Sometimes you have two functions with the same name, but they do different things.  You can import them from different modules.
 (e.g. `import json` and `import yaml` both have a `load` function, but they do different things.)

## Packages

Multiple related modules form a package. Packages are created by creating a directory with an `__init__.py` file in it.  The `__init__.py` file can be empty, but it is required to make the directory a package.

You typically group modules based on shared purpose. A hypothetical application might have the following:

* `ui` package: contains modules related to the user interface
* `models` package: contains modules related to the data model
* `utils` package: contains modules with common utility functions (often a sort of catch-all)

These are completely up to you, the only requirement is that there is a top-level package directory with an `__init__.py` file. That file can be empty, but needs to be present to indicate the directory is a python package.

### Example Application

##### Application Structure 

To help us get started thinking about the differences between packages and modules, here's the structure of a simple arithmetic interpreter (located in ``m1/interpreter-repo``): 

```
interpreter-repo
├── LICENSE.md
├── README.md
└── interpreter
    ├── __init__.py
    ├── __main__.py
    ├── app.py
    ├── collections
    │   └── stack.py
    ├── evaluators
    │   └── rpn.py
    └── ui
        ├── basic.py
        ├── enhanced.py
        └── ui_controller.py
```

Notice in the root directory (``interpreter-repo``) there is a single folder called ``interpreter``. This directory (i.e., package) is known as the *top-level* package, and all the packages underneath are known as *subpackages*. The subpackages are ``collections``, ``evaluators``, ``ui`` .  The top-level package also contains a special file in ``__main__.py``. This file is run when we execute our top-level package directly via ``python3 -m interpreter``. We'll talk about what goes in that ``__main__.py`` later. 

### `__main__`

The `__main__.py` file is a special file that is run when you execute a package directly.  For example, if you have a package called `foo`, you can run it by executing `python -m foo`.  The `__main__.py` file is the entry point for the package.

### `import` variations review

Now's a good time to review the different ways you can import modules and packages.

1) `import module_name` 

Imports the module and makes it available in the current namespace.  You can access the module's functions by prefixing them with the module name.  For example, for the module `math` with a function called `sin`, you can access it by calling `math.sin()`.

2) `from module_name import module_attr`

Imports a specific attribute from a module and makes it available in the current namespace.  For example, `from math import sin` will import the `sin` function from the `math` module and make it available in the current namespace.  You can then call it directly by calling `sin()`.

3) `import module_name as alias` or `from module_name import module_attr as alias`

Imports a module or attribute and gives it an alias.  For example, `import pandas as pd` will import the `pandas` module and make it available as `pd`.  You can then access the `DataFrame` class as `pd.DataFrame`. This is commonly used in data science libraries (`import numpy as np`, `import pandas as pd`, etc) but overuse can make code harder to read. It's best to use aliases sparingly.

4) `from module import *`

Makes contents of module available in the current namespace. This is considered bad practice and should not be used. It can lead to namespace collisions and make it hard to tell where functions are coming from.

### Absolute vs Relative Imports

Python has two types of imports: absolute and relative.  Absolute imports are imports that start from the top-level package.  Relative imports are imports that start from the current module.  Relative imports are denoted by a `.`.

In our above application, from outside the package you might import the `ui` package via `import interpreter.ui`.  This is an absolute import.  If you are inside the `ui` package, you might import the `ui_controller` module via `from . import ui_controller`.  This is a relative import.

Relative imports allow you to rename/refactor packages with fewer updates to your imports.

For more details you may want to read [Absolute vs. Relative Imports](https://realpython.com/absolute-vs-relative-python-imports/).

### Running from the Command Line

You can run a Python module directly from the command line by executing `python module_name.py` or `python -m module_name`. When you do this, all of the code in the module is executed.

Sometimes you want to only have some of the code in a module run when it is executed directly.  You can use the `__name__` variable to check if the module is being run directly.  If it is, `__name__` will be set to `__main__`.  For example:

```python
# import_example.py 
import math     # modules can import other modules, these will be resolved in order as they are encountered


print("near the top of the file")


def f(x):
    return math.sin(x) ** 2 - math.cos(x)


print("near the bottom of the file, __name__=", __name__)


if __name__ == "__main__":
    print("module imported as main")
```

Running `python3 example.py` will print `Running module directly`.  Importing the module will not print anything, but will allow access to the `func` function.

In [1]:
# Demo 1: importing the file directly
import import_example

near the top of the file
near the bottom of the file, __name__= import_example


In [5]:
# Demo 2: running from command line (! in `ipython` lets us run things as we would from CLI)
!python3 import_example.py

near the top of the file
near the bottom of the file, __name__= __main__
module imported as main


### Packages

An application like the one above, it is common to make the top-level package executable. To execute a package you can run `python -m package_name`.  This will execute the `__main__.py` file in the package.

Some built in modules have an executable package.

* `python3 -m http.server` - starts a simple web server
* `python3 -m json.tool` - pretty prints JSON
* `python3 -m unittest` - runs unit tests
* `python3 -m venv` - creates a virtual environment

## Command Line Arguments

Whichever way you run a Python module, you can pass command line arguments to it.  These arguments wind up in a special list called `sys.argv`.  The first element is the name of the module.  The second element is the first command line argument, and so on.

```python
# my_module.py
import sys

if __name__ == "__main__":
    print(sys.argv)
```

```bash
$ python3 -m my_module arg1 arg2
['my_module.py', 'arg1', 'arg2']
```

These arguments are available in the `sys.argv` list.  The first argument is the name of the module.  The second argument is the first command line argument, and so on.

If you've used the command line much, you're probably familiar with programs that take all sorts of flags and arguments.
There are several modules that help with turning command line arguments into a more structured format.

* [`argparse`](https://docs.python.org/3/library/argparse.html) - Built in to Python, but a bit verbose. Very flexible.
* [`click`](https://click.palletsprojects.com/) - Very popular and easy to get started with.
* [`typer`](https://typer.tiangolo.com/) - Built on top of `click` and uses type annotations to generate help text.
* [`docopt`](http://docopt.org/) - Uses a docstring to define arguments.

## Third Party Packages

One of the reasons Python is so popular is the ecosystem of third party packages.  There are thousands of packages available for almost any task you can think of.  You can find a list of packages on [PyPI](https://pypi.org/), the Python Package Index.  (It is also possible to install packages from other sources, but this is the default & has hundreds of packages you might use.)

While packages like `math` and `csv` are built in to Python, other packages need to be installed. Some packages you've already used like `pytest` and `pandas` are installed on the CS Linux machines already. You may have found that if you tried to run code with those on your local machine it didn't work.

### Installing Packages

It's possible to install packages from PyPI using a tool called `pip`. `pip` is a package manager that is installed with Python.  You can install packages using `pip install package_name`.  For example, to install `pytest` you would run `pip install pytest`.

Doing things this way installs packages globally. This can lead to issues when different programs rely on conflicting versions of packages. For this reason, it's best to install packages in a virtual environment.

## Virtual Environments

A virtual environment is a self-contained Python installation that contains its own set of packages.  This allows you to have different versions of packages installed in different environments.  This is especially useful when you are working on multiple projects that have conflicting requirements.  

#### `venv` demo

* `python -m venv <dirname>` - create environment in a directory
* `<dirname>/bin/activate` - activate environment in a given session
* `rm <dirname>` - remove the environment
* `pip install -r requirements.txt` - install list of packages from a file

### poetry, pdm, pipenv

Manually managing virtual environments can get tedious, if you plan to distribute your software at all you need to keep a list of dependencies around & make sure they are compatible with your code & as up to date as possible (for security fixes, etc.).

In the last few years, a new generation of tools have risen in popularity.

`poetry`, `pdm`, `pipenv` all essentially do the same thing, maintaining a list of dependencies for you as well as tools to build the virtualenv & run commands within it.

We'll be using `poetry` for this class, but the principles are virtually identical in `pdm` or `pipenv`.

#### `poetry` demo

* `poetry install` - Install poetry packages from `pyproject.toml`
* `poetry add <pkgname>` - Add package to poetry environment (will update `pyproject.toml` and `poetry.lock`)
* `poetry remove <pkgname>` - Remove package from poetry environment.
* `poetry shell` - Same as activating virtualenv.
* `poetry run` - Run a command inside of virtualenv without "activating" it.  (Can be used instead of `shell`, matter of preference.)
* `poetry init` - Create a new poetry project.  (Not needed when a pyproject.toml already exists)