<div class="pagebreak"></div>

# Modules
With the exception of existing classes and functions that we have used, everything that we have written so far has been put within a single file.  Now, we will look at how to package our code into multiple files.

Just as functions provided an abstractions for a series of steps that performed a task, a module creates an abstraction of a group of related variables(data), functions, and classes. Modules are key abstraction for reusing code and functionality.


## Using Modules
To use a module, use the following statement
<code>import <i>moduleName</i></code> where moduleName is the name of an existing Python file (but without the .py extension) or a directory.

In [None]:
import statistics

statistics.stdev([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])

You can also rename a module as you import it. This provides an alternate alias to refer to the module in the code.

In [None]:
import statistics as stat

stat.stdev([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])

Why rename imported modules?
- duplicate names
- more mnemonic / following convention (Example: <code>import pandas as pd</code>
- minimize typing

You can also limit what you import from a module
<code>from <i>moduleName</i> import <i>name</i></code>

In [None]:
from statistics import mean

mean([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])

To list all modules currently installed (including built-in modules):

In [None]:
help('modules')

To see the help documentation for a specific module, pass it as a string to help

In [None]:
help('numpy')

## Packages 
Python allows modules to be organized by subdirectories into packages. The directory names form a hierarchy of names.

Prior to Python 3.3, it was necessary to include a file named `__init__.py` in a directory for it to make it a Python package. The file can but used for any special initialization code for the package. The use of `__init__.py` is a common interview question.  Without the `__init__.py`, the package is considered an [implicit namespace package](https://peps.python.org/pep-0420/). The technical details between regular packages and implicit namespace packages are irrelevant for most use cases, but issues generally arise when the same package name is used in more than one location in the search path (see below on "How imports works") - [view more details].(https://web.archive.org/web/20220605062021/http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html)  Note: these links are provided for informational purposes only.

Typically, programmers use the terms "modules" and "packages" interchangeably.

### Installing other modules
The de facto way to install additional modules and packages is to use [pip](https://docs.python.org/3/installing/index.html). 

Technically you can use the 'pip' command to install packages: <pre>pip install <i>packageName</i></pre>
However, the recommended approach is to use call the Python interpreter and use the module name as the command line argument:
<pre>
    python -m pip install <i>packageName</i>
</pre>
By using the `python` executable, we ensure the package is installed into an appropriate environment.

Similarly for Jupyter Notebooks:
<pre>
    import sys
    !{sys.executable} -m pip install <i>packageName</i>
</pre>
For notebooks, you may have seen
<pre>!pip install <i>packageName</i></pre>
However, this will install the pacakage into the environment from which Jupyter was started, not the current environment.

You should ensure <code>setuptools</code> and <code>wheel</code> are installed when using pip.  <code>wheel</code> can install pre-built packages into your environment if compatible. <code>setuptools</code> helps to handle the installation of other packages from source code. The following makes sure all are the latest version.

In [None]:
import sys
!{sys.executable} -m pip install --upgrade pip setuptools wheel

### Commonly Used Modules / Packages
Packages that have a URL with "python.org" are part of the includes modules distributed with the language and do not need to be installed.  [Python Standard Library](https://docs.python.org/3/library/)

Package Name | Description | Import | URL
:-----------|:----|:---|:------
datetime | Supplies classes to represent and manipulate date and times| dt|https://docs.python.org/3/library/datetime.html
json | Exposes APIs to load, parse, and write [JSON Objects](https://datatracker.ietf.org/doc/html/rfc7159.html). | |https://docs.python.org/3/library/json.html
math | Variety of math functions for floats and integers | | https://docs.python.org/3/library/math.html
matplotlib|  Comprehensive visualization library | mpl | https://matplotlib.org
numpy  | Foundational package for scientific computing.  Supports multidimensional arrays and matrices| np | https://numpy.org
pandas | Data analysis and manipulation tool. Core library to perform data science in Python | pd| https://pandas.pydata.org
os | Provides access to common operating system functions. | | https://docs.python.org/3/library/os.html
random | Implements random number generation for various distributions | |https://docs.python.org/3/library/random.html
scipy  | Contains algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems| | https://scipy.org 
seaborn | visualizaion library built on top of matplotlib that provides attractive and informative statistical graphics |sns | https://seaborn.pydata.org
statistics | Provides functions to calculate common statistics || https://docs.python.org/3/library/statistics.html
sys | Provides access to variables and functions used by the Python interpreter | |https://docs.python.org/3/library/sys.html
unittest | Unit testing framework that supports test automation || https://docs.python.org/3/library/sys.html

The "Import" column contains the standard rename alias.

## Developing and using our own modules

At the very simplest level, modules are just a text file that contains python code. 

For example, we could create our own statistics module. The code below exists in a file "mystatistics.py"
<div style="border: 3px solid black;padding: 10px; border-radius: 10px;">
<code>
"""mystatistics provides implementations of common descriptive statistical functions
   - min
   - max
   - range
   - mean
   - median
   - variance
   - std_dev
    
   Each funtion takes a single list.  All contents of that should be a float or an integer
"""

def min(l):
    """ returns the minimum value in the list.  Raises exception if empty"""
    if l:
        s_list = sorted(l)
        return s_list[0]
    else:
        raise ValueError("list empty")

def max(l):
    """ returns the maximum value in the list.  Raises exception if empty"""
    if l:
        s_list = sorted(l)
        return s_list[-1]
    else:
        raise ValueError("list empty")
        
def range(l):
    """ returns the difference between the minimum and maximum value in the list.  Raises exception if empty"""
    if l:
        s_list = sorted(l)
        return s_list[-1] - s_list[0]
    else:
        raise ValueError("list empty")
        
        
def mean(l):
    """computes the mean of the list"""
    if l:
        return sum(l)/len(l)
    else:
        raise ValueError("list empty")

def median(l):
    """Finds the median value of the list"""
    if l:
        s_list = sorted(l)
        return s_list[len(s_list)//2] if len(s_list)%2 == 1 else (s_list[len(s_list)//2] +s_list[1 +len(s_list)//2])/2
    else:
        raise ValueError("list empty")
    
def variance(l):
    """Calculates the population variance for the list"""
    m   = mean(l)
    dif = 0
    for x in l:
        dif += (m-x)**2
    return dif/len(l)

def std_dev(l):
    """Calculates the population standard deviation for list"""
    return variance(l)**.5

if \_\_name\_\_ == "\_\_main\_\_":
    test_list = [10,12,14]
    print("Min:", min(test_list))
    print("Max:", max(test_list))
    print("Range:", range(test_list))
    print("Mean:", mean(test_list))
    print("Median:", median(test_list))
    print("Variance:", variance(test_list))
    print("Std Dev:", std_dev(test_list))
</code>
</div>

We can now use this module by importing it and then using the functions defined within it.

In [None]:
import mystatistics

test_list = [10,12,14]
print("Std Dev:",  mystatistics.std_dev(test_list))

## How Import Works ...
When the Python interpreter executes the <code>import <i>moduleName</i></code> statement, it first checks to see whether or not that module has been previously imported. If it hasn't, then the interpreter will search a list of directories a file named <i>moduleName</i>.py.  This list is available in a Python variable `sys.path` and is created from the following sources:
- the current working directory
- the PYTHONPATH environment variable if it has been set
- an installation-dependent list of directories (created at install time or when a virtual environment was created)

In [None]:
import sys
print(sys.path)

Next, Python interpreter binds the results of that search to a name in the local scope.  This allows us to reference the module name, alias, or specific import item within our code.  The following code shows that the length of the local namespace has grown by one from the import of the PI variable:

In [None]:
print("Local namespace size:",len(locals()))
from math import pi
print("Local namespace size (after import of PI):",len(locals()))

Then the Python interpreter executes the code within the <i>moduleName</i>.py name.  This will create any classes or functions defined within the file.  It will also execute any statements not contained within a class or function declaration.  This later piece is essential to allow the module to perform any necessary initialization steps prior to its use.

The following code in a Python file
<pre>
if __name__ == "__main__":
    <i>statements</i>
</pre>
checks to see whether or not the file has been started from the command line (through a command such as <code>python <i>moduleName</i>.py</code>. If it has, then the statements for that block will execute.  Otherwise, they are skipped.  This allows us to run the module as a main program, but then if the module is imported by other programs to skip running those steps. <code>if \_\_name\_\_ == "\_\_main\_\_":</code> is very common boilerplate code that you will see in a large number of Python files.

## Module Docstrings
To help other developers properly use your modules, you should use a docstring at the top of the file.  The docstring should list the purpose of the module and then list the classes, functions, exceptions, and any other items exported by the module with a short summary of each. [Docstring conventions](https://peps.python.org/pep-0257/#multi-line-docstrings)


## Best Practices
Although certain modules are designed to export only names that follow certain patterns when you use <code>from <i>module</i> import *</code>, it is still considered bad practice. This statement imports all of the module's objects into your local namepace and it becomes difficult to determine what's what.  While having to type <code><i>module.</i></code> is a bit more tedious, it makes your code extremely clear as to where an object originated.

As you create modules, you should only group things together that logically belong together.  Simply because you wrote two functions does not necessarily mean that they should be within the same module.  Quite often, "utility" packages will violate this principle.

While you can distribute your own modules and packages by simply providing the source code to others, you should 'package" these: [Overview of Packaging for Python](https://packaging.python.org/en/latest/overview/) [Tutorial](https://packaging.python.org/en/latest/tutorials/packaging-projects/)

The Python interpreter will only ever load one copy of a module into your program - even if it is imported in multiple locations.  Thus any changes to that module can be reflected wherever that module is also used. As with anything else with Python (or really any programming language), such functionality can be useful or a curse. Programming languages expect developers to not behave maliciously such as in this code:

In [None]:
import statistics

def bad_programmer(l):
    import random
    return random.random()

statistics.stdev = bad_programmer

my_list = [10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7]
print("Std dev:",statistics.stdev(my_list))
print("Std dev:",statistics.stdev(my_list))
# good luck tracking down that one!

## Exercise
