# Modules and Packages in Python

## Modularization

**Modular programming** is a design technique to separate the functionality of a program into
some smaller independent parts for some specific tasks. As each module is for some specific
tasks, it is much easier to navigate and locate the code. 
Different developers can also work on
different parts of the program independently. 

We have already seen one tool to promote code
modularization: **functions**. 
With the use of functions, we can break down a complex problem into some smaller sub-problems. 
The use of functions also facilitates *code reusing*. 
In this notebook, we introduce two other tools for code modularization: *modules* and *packages*.

## 1. Modules

In Python, a module is a file with the suffix .py that contains Python code. The name of the
module is the same of the name of file. 
For example, the module `circle` is created by writing
relevant code in the `circle.py` file.

The content of the `circle.py` file is as follows (it is available with the Block 10 files):

In [1]:
'''
This module contains functions for circle calculation
'''

pi = 3.14159
def area(radius):
    '''
    return area of a circle given the radius. Radius is assumed to be a non-negative number
    '''
    return pi *(radius**2)

def circumference(radius):
    '''
    return circumference of a circle given the radius. Radius is assumed to be a non-negative number
    '''
    return 2 * pi * radius

Using modules is very similar to using packages, as we have done so far. 
We first import the module by `import module`. 
For example, to import the module `circle` we write:

In [1]:
import circle

Use the function `help()` to display the documentation for the module.

In [2]:
help(circle)

Help on module circle:

NAME
    circle - This module contains functions for circle calculation

FUNCTIONS
    area(radius)
        return area of a circle given the radius. Radius is assumed to be a non-negative number
    
    circumference(radius)
        return circumference of a circle given the radius. Radius is assumed to be a non-negative number

DATA
    pi = 3.14159

FILE
    /Users/moebqr/Documents/GitHub/ST2195-Programming-for-Data-Science/Block10/Block10c/circle.py




The **docstring** of this module tells us that the module aims to do circle-related calculation.
The `help()` function also tells us that the module has two functions for calculating the area and circumference of a circle given the radius, and an object `pi`.

While the `help()` function tells us *how to use* the module and its functions, it does not tell us how the module and its
functions are *implemented*. 
This is *abstraction* - we hide the unnecessary details from the
user.

To get the *attributes* (which includes functions and other objects) in the module, we use
`module.attribute`. 
For functions, we call `module.func()`. 
For example, if we want to use
the function `area()` from the module `circle`, we call the function as follows:

In [3]:
# You can try automcomplete here
circle.area(3)

28.27431

In [4]:
circle.circumference(10)

62.8318

Similarly, if you want to get the retrieve the objects from the module, you need to write `module.obj`. 
For example, if we want to retrieve `pi` from the module `circle`, we need to write:

In [5]:
circle.pi

3.14159

We will get an error if we call the functions or use other objects directly, as functions and
other objects *within* the module **scope**.

In [6]:
# This works because we imported all of circle.py above
pi

NameError: name 'pi' is not defined

If you need to use the functions a lot, calling the functions in the module by `module.func()`
may not convenient. 
Instead you can import the function or other objects from the module by `from module import function`. 
This enables you to use the them directly, even if you do not import all of the module.

In [7]:
from circle import area, pi
print(area(3))

print(pi)


28.27431
3.14159


## 2. Packages

Packages are a collection of modules. 
We have been using packages like **NumPy** and **pandas**
in this course. 
In each package, there is at least one `__init__.py` file. 
The `__init__.py` files tell Python that the directories should be treated as packages. 
The `__init__.py` file can just be an empty file, but often it contains some some initialization code for the package.

Now let’s create our own package my_math for calculation. 
We put the file `__init__.py` to tell Python this is a package. We also put the files `vector.py` and `circle.py` for the vector
and circle-related calculations. 
The structure of the files are as follows:

my_math/ Top-level package

    __init__.py      Initialize the my_math package
    vector.py
    circle.py
    
We can import the package `my_math` in the same way we have imported other third-party packages (found in the Block 10 files as a mtmath.zip file, which you will have to place in the working directory of this notebook and unzip.):

In [8]:
import my_math.circle
my_math.circle.area(3)

28.26

Or we could only import the module `circle`:

In [9]:
from my_math import circle
circle.area(3)

28.26

## Sub-Packages

A package often has a large number of files, and a hierarchical structure is often imposed by
grouping the modules files into different folders and include a `__init__.py` file in the
folders. Take a look of the source code of **NumPy** from [GitHub](https://github.com/numpy/numpy/tree/main/numpy) and you can see that modules
are organized via folders. For example, the modules related to random sampling are under the
sub-folder numpy/random and the modules related to polynomial calculation are under the
sub-folder numpy/polynomial. For each sub-folder, there is another `__init__.py`. Each sub-folder therefore is a sub-package. For NumPy, `numpy.random` is a sub-package.

Let us examine another package `my_math_2`, which with the following structure:

In [10]:
import os 
os.listdir('./my_math_2')

['untitled folder',
 'matrix.py',
 'shape',
 'vector.py',
 '.DS_Store',
 'circle.py',
 '__init__.py',
 '__pycache__',
 '__init__ copy.py',
 'triangle.py',
 'linear-algebra',
 'geometry',
 'linear_algebra']

To use a sub-package from the package we can import it by:

In [11]:
from my_math_2.geometry import circle
circle.area(3)

28.26

The built-in function `dir()` is used to find out which names a module defines. It returns a sorted list of strings:

In [12]:
dir(my_math)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'circle']

Without arguments, dir() lists the names you have defined currently:

In [13]:
dir()

['In',
 'Out',
 '_',
 '_10',
 '_11',
 '_12',
 '_3',
 '_4',
 '_5',
 '_8',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__vsc_ipynb_file__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'area',
 'circle',
 'exit',
 'get_ipython',
 'my_math',
 'open',
 'os',
 'pi',
 'quit']

## Python Package Repositories and Package Installer

The [*Python Package Index (PyPI)*](https://pypi.org/) is a repository of software for the Python programming
language. 

Python packages are typically installed from one of two package repositories:
- The Python Package Index (PyPI)
- Conda

**PyPI** is the official third-party software repository for Python, and it is the default source for
packages for `pip`, which is a popular package installer for Python.

The name Conda is for both the general-purpose package management system and the package repository. 
Unlike PyPI, Conda manages software of *any language*.

## Useful Links

- [Official Python tutorial on modules and packages](https://docs.python.org/3/tutorial/modules.html)
- [Python tutorial on packaging Python objects](https://packaging.python.org/en/latest/tutorials/packaging-projects/)
