<a href="https://colab.research.google.com/github/fsk-lab/scics/blob/main/06_Modules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modules and Imports in Python

So far, we have written all code that should be executed into a single file (here: a Jupyter notebook). For simple pieces of code, this is perfectly fine. However, if our code becomes more complex – or if we want to use code that other people have written – it makes sense to divide code into multiple Python files. This has a number of practical advantages:
* It makes the code more readable.
* It enables the reuse of code – handy pieces of code can be shared between multiple applications without copy-pasting.
* It simplifies code maintenance.

To make this possible, Python provides a way to define a function or a variable in one file (which we refer to as the **module**), and then use it in another file. We will learn about the idea of modules, and how to use them, in this tutorial.

## Importing from Modules and Packages

Let us start with a simple example. In the same folder where we store this notebook, let us create a file called `test_module.py`.

> 💡  In Google Colabs, we can do this using the folder symbol (📁) on the left-hand side, and open the right-click menu.

Copy the following function to the file:
```
def fibonacci(n):
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a)
        a, b = b, a + b
    return result
```

The file `test_module.py` is now a Python module that defines a function. If we now want to use this module, and the definitions it contains, we need to `import` it.

In [None]:
import test_module

Any definition from within this module can now be accessed using the "dot syntax", i.e. `module.name_within_module`.



In [None]:
a = test_module.fibonacci(10)

print(a)

Alternatively, we can only import selected functions or variables from the module. The syntax for this is `from module import name_within_module`.

In [None]:
from test_module import fibonacci

b = fibonacci(20)

print(b)

Imports are not limited to functions – any variable that is defined within a module can be imported to another file.

As an example, we can add the variable assignment `fibonacci_numbers = [1, 1, 2, 3, 5, 8, 13]` to our module `test_module.py`. Once we do that, we can also import and use this variable in our current code.

In [None]:
from test_module import fibonacci_numbers

print(fibonacci_numbers[-5:-1])

> ❗  Modules are only imported at the first call of `import`! Re-import of a module does not "refresh" the module import – which is problematic when using Jupyter notebooks. If we want to force the Python code to re-import a module, we have to completely re-start the code (and delete all run-time variables).

In principle, we can also assign a new name to any function or variable that we import. For this purpose, we use the `as` statement, that we have learned about already in Tutorial 5 on exception handling.

In [None]:
from test_module import fibonacci as fib

c = fib(100)

print(c)

### Where are modules stored?

When the Python interpreter finds a statement like `import test_module`, it will search for a module named `test_module` in the following locations:


1.   It will check if Python contains a built-in module named `test_module`. We will learn more about built-in modules later.
2.   It will search all directories which are part of the so-called PYTHONPATH. By default, the PYTHONPATH contains the current working directory$^*$, as well as further installation-specific directories.

$^*$ This is the reason why we could import from the `test_module.py`, which is directly in our current working directory.

Any module which is not part of the PYTHONPATH will not be found by the interpreter. The respective import will raise an `ImportError`!

### 🧠 Packages

In many cases, we want to create a set of useful code functionality that easily becomes too complex to be stored in a single file. However, all that code still belongs together, and should be distributed under the same name. For this purpose, Python provides us with the concept of a **package**.

In principle, a package is nothing but a folder that contains multiple modules (or sub-folders with modules). For example, consider the following toy module called `filereader`, which contains a number of sub-modules and sub-sub-modules.

```
filereader/
    __init__.py
    csv.py
    json.py
    office_files/
        __init__.py
        docx.py
        xlsx.py
        pptx.py
    
```

> ❗  Note that the Python interpreter needs the file `__init__.py` to recognize a folder as a Python module. Without this file, we cannot import anything from this folder! Further details on the `__init__.py` file are discussed below.

In other words: A Python **package** is a folder with an `__init__.py` file in it!

If we have a package, then we can import functions or variables using the same "dot syntax" as shown above.

e.g.
```
from filereader.csv import read_csv
from filereader.office_files.docx import read_docx
```

The `__init__.py` file also allows us to make specific functions or variables accessible directly from the package name.

Let us consider the `filereader` example from above, where the `__init__.py` file contains the following code:
```
# filereader/__init__.py

from .csv import read_csv
```

> ❗ Note that within a package, files are imported through *relative* imports imports (i.e. the path to the source file is given relative to the current file). The path of the current file is indicated by `.`. In other words, the import statement above means: "In the same directory where this file is located, there is a file called `csv.py`. From this file, import `read_csv`.

> **IMPORTANT:** This "dot-import" only works within packages!

With the `__init__.py` file described above, then we can directly import the `read_csv` function from the `filereader` namespace through
```
from filereader import read_csv
```

## External Modules

On many occasions, we don't want to write all code by ourselves – but rather re-use code that others have written already. In fact, Python is one of the programming languages with the widest community support, and a large number of modules are publicly available.

By *installing* them – i.e. downloading the packages and modules, and saving them in a directory that is part of the PYTHONPATH – we can directly make them usable in our Python code.

### The Standard Library

By default, every Python installation comes with a number of built-in modules that we can use without downloading or installing any further code. Some of the most common built-in libraries are:
* `os` for interacting with the operating system
* `pathlib` for managing the file system
* `sys` for handling system variables (like the PYTHONPATH)
* `math` for advanced mathematical operations
* `time` and `datetime` for dealing with times
* `itertools` for advanced loops
* `argparse` for passing command-line arguments to Python code
* `logging` for advanced options of parsing code
* `urllib` for accessing webpages

...and many many more!

It is obviously not necessary (or even possible) to know all modules from the standard library. In the next tutorials, however, we will get to know some of these modules and their useful functionalities.

A full overview of all packages of the standard library is given at the [following link](https://docs.python.org/3/library/index.html).

### Sharing and Installing External Packages

There is a vast range of functionality that is not covered by the modules in the standard library – and millions of Python users worldwide have already written modules that can be useful for such purposes. In this case, we don't want to reinvent the wheel, but rather re-use the code that others have written already. This becomes possible by making code available through public **package distributions**.

One of the most common Python package distributions is the Python package index (PyPI), from which we can install any available package using the PyPI installer called `pip`.

Usually, we would use `pip` from the command line (i.e. in a *terminal* application). For example, if we want to install the `numpy` package, which is extremely useful for vector or matrix operations in Python, we would open a terminal, and install the package via
```
pip install numpy
```

> 💡  In a Jupyter notebook, we can install modules through `pip` by starting a code cell with an exclamation mark.

In [None]:
! pip install numpy

Installing a package can be a tedious procedure, especially in those scenarios where the package uses other packages, which need to be installed, too. These other packages are referred to as *requirements* or *dependencies*. An installer like `pip` takes care of installing not only a package, but also its dependencies (and their dependencies, and so on).

However, this can rapidly lead to conflicts, especially if version requirements are involved. In these scenarios, manual resolutions of these conflicts are often required, which can be tedious and time-consuming.

Therefore, for local installations of Python, it is usually best practice to setup separate **environments** for each coding project. Every environment corresponds to a local installation of Python with only those packages installed that are required for that specific project. If we switch to a new project, we create a new environment with new installed packages. This minimizes dependency conflicts between projects, and enables easier transfer of Python environments.

Within this course, you find a separate *best practice* tutorial for locally setting up and managing Python installations.