<div class="pagebreak"></div>

# Modules
Except for existing classes and functions we have used, a single file could easily fit any of the programs in these notebooks so far. However, most software projects are not as little as these examples and exercises. Fortunately, Python provides capabilities to organize code into multiple files.

Just as functions provide abstractions for a series of steps that perform a task, a module creates an abstraction of a group of related variables(data), functions, and classes. Modules are a crucial component for code reuse encapsulating functionality. 

## Using Modules
To use a module, use the following statement
<code>import <i>moduleName</i></code> where moduleName is the name of an existing Python file (but without the .py extension) or a directory.

In [None]:
import statistics

statistics.stdev([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])

You can also rename a module as you import it. Renaming provides an alternate alias to refer to the module in the code.

In [None]:
import statistics as stat

stat.stdev([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])

Why rename imported modules?
- avoid duplicate names
- mnemonic
- follow convention (Example: `import pandas as pd`)
- minimize typing

You can also import specific items from a module: 
<code>
    from <i>moduleName</i> import <i>name</i>
</code>

In [None]:
from statistics import mean

mean([10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7])

To list all modules currently installed (including built-in modules):

In [None]:
help('modules')

To see the help documentation for a specific module, pass it as a string to help

In [None]:
help('numpy')

## Packages 
Python organizes modules by subdirectories into _packages_. The directory names form a hierarchy of names.

Before Python 3.3, developers had to create a file named `__init__.py` in a directory for the interpreter to consider the directory a Python package. `__init__.py` is typically empty but can contain any initialization code for the package. Without the `__init__.py` file, the package is considered an [implicit namespace package](https://peps.python.org/pep-0420/). The technical details between packages and implicit namespace packages are irrelevant for most use cases, However, issues generally arise when the same package name appears in more than one location in the search path (see the section below - "How Import Works").

[View more details](https://web.archive.org/web/20220605062021/http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html).  

Note: the last two links are for informational purposes only.

The use of ` __init__.py` is a common interview question.

Typically, programmers use the terms "modules" and "packages" interchangeably.

### Installing other Modules
The de facto way to install additional modules and packages is to use [pip](https://docs.python.org/3/installing/index.html). 

Technically you can use the 'pip' command to install packages: <pre>pip install <i>packageName</i></pre>
However, the recommended approach is to start the Python interpreter and use the module name as the command line argument:
<pre>
    python -m pip install <i>packageName</i>
</pre>
Using the `python` executable ensures the package installs into the correct environment. 

Similarly, for Jupyter Notebooks:
<pre>
    import sys
    !{sys.executable} -m pip install <i>packageName</i>
</pre>
For notebooks, you may have seen
<pre>!pip install <i>packageName</i></pre>
However, this will install the package into the environment from which Jupyter started, not the current environment.

You should ensure the current Python environment has the packages `setuptools` and `wheel` installed when using pip. <code>wheel</code> can install compatible, pre-built packages into your environment if compatible. <code>setuptools</code> helps to handle the installation of other packages from source code. The following code block ensures that the current environment has the most recent versions of these three packages installed. 

In [None]:
import sys
!{sys.executable} -m pip install --upgrade pip setuptools wheel

<div style="border: 3px solid black;padding: 10px; border-radius: 10px;">
<b>Security Note</b>
    
Software supply chain has become one of the more easy to exploit avenues for compromising software security.  Often, developers will include dependencies in their code without validating those dependencies first.
    
Possible ways to mitigate this attack vector:
<ul>
    <li> Use trusted, well known components
    <li> Scan dependencies for known vulnerabilities
    <li> Practice defense in depth. While you may not be to prevent the software issue, can you minimize the damage?
    <li> Use a trusted source for components.    
</ul>
<a href='https://web.archive.org/web/20220606152124/https://blog.gitguardian.com/supply-chain-attack-6-steps-to-harden-your-supply-chain/'>Supply Chain Attacks</a>    
       
Regarding a trusted source, Google announced in May 2022 that they would provide a new Google Cloud Service, "Assured Open Source Software", distributing components curated by the company.  <a href='https://cloud.google.com/blog/products/identity-security/introducing-assured-open-source-software-service'>https://cloud.google.com/blog/products/identity-security/introducing-assured-open-source-software-service</a>     

</div>

### Commonly Used Modules / Packages
The following table contains a list of commonly used modules and a brief description.  
Modules with a URL containing "python.org" belong to Python’s standard library. When installing Python, the process installs these modules as part of the overall environment.  [Python Standard Library](https://docs.python.org/3/library/)

Package Name | Description | Import<br>Alias | URL
:-----------|:----|:---|:------
datetime | Supplies classes to represent and manipulate date and times| dt|https://docs.python.org/3/library/datetime.html
json | Exposes APIs to load, parse, and write [JSON Objects](https://datatracker.ietf.org/doc/html/rfc7159.html). | |https://docs.python.org/3/library/json.html
math | Variety of math functions for floats and integers | | https://docs.python.org/3/library/math.html
matplotlib|  Comprehensive visualization library | mpl | https://matplotlib.org
numpy  | Foundational package for scientific computing.  Supports multidimensional arrays and matrices| np | https://numpy.org
pandas | Data analysis and manipulation tool. Core library to perform data science in Python | pd| https://pandas.pydata.org
os | Provides access to common operating system functions. | | https://docs.python.org/3/library/os.html
random | Implements random number generation for various distributions | |https://docs.python.org/3/library/random.html
scipy  | Contains algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other classes of problems| | https://scipy.org 
seaborn | Visualization library built on top of matplotlib that provides attractive and informative statistical graphics |sns | https://seaborn.pydata.org
statistics | Provides functions to calculate common statistics || https://docs.python.org/3/library/statistics.html
sys | Provides access to variables and functions used by the Python interpreter | |https://docs.python.org/3/library/sys.html
unittest | Automated testing framework || https://docs.python.org/3/library/sys.html

The "Import Alias" column contains the conventional alias used for this package/module during an import.

## Developing and Using  Modules
At the very simplest level, a module is just a text file that contains python code. 

For example, we could create a statistics module. The code below exists in a file "mystatistics.py".
<div style="border: 3px solid black;padding: 10px; border-radius: 10px;">
<code>
"""mystatistics provides implementations of common descriptive statistical functions
   - min
   - max
   - range
   - mean
   - median
   - variance
   - std_dev
    
   Each funtion takes a single list.  All contents of that should be a float or an integer
"""

def min(l):
    """ returns the minimum value in the list.  Raises a ValueError if empty"""
    if l:
        s_list = sorted(l)
        return s_list[0]
    else:
        raise ValueError("list empty")

def max(l):
    """ returns the maximum value in the list.  Raises a ValueError if empty"""
    if l:
        s_list = sorted(l)
        return s_list[-1]
    else:
        raise ValueError("list empty")
        
def range(l):
    """ returns the difference between the minimum and maximum value in the list.  Raises a ValueError if empty"""
    if l:
        s_list = sorted(l)
        return s_list[-1] - s_list[0]
    else:
        raise ValueError("list empty")
        
        
def mean(l):
    """computes the mean of the list"""
    if l:
        return sum(l)/len(l)
    else:
        raise ValueError("list empty")

def median(l):
    """Finds the median value of the list"""
    if l:
        s_list = sorted(l)
        return s_list[len(s_list)//2] if len(s_list)%2 == 1 else (s_list[len(s_list)//2 - 1] +s_list[len(s_list)//2])/2
    else:
        raise ValueError("list empty")
    
def variance(l):
    """Calculates the population variance for the list"""
    m   = mean(l)
    dif = 0
    for x in l:
        dif += (m-x)**2
    return dif/len(l)

def std_dev(l):
    """Calculates the population standard deviation for list"""
    return variance(l)**.5

if \_\_name\_\_ == "\_\_main\_\_":
    test_list = [10,12,14]
    print("Min:", min(test_list))
    print("Max:", max(test_list))
    print("Range:", range(test_list))
    print("Mean:", mean(test_list))
    print("Median:", median(test_list))
    print("Variance:", variance(test_list))
    print("Std Dev:", std_dev(test_list))
</code>
</div>

We can now use this module by importing it and then using the functions defined within it.

In [None]:
import mystatistics

test_list = [10,12,14]
print("Std Dev:",  mystatistics.std_dev(test_list))

## How Import Works
When the Python interpreter executes the <code>import <i>moduleName</i></code> statement, it first checks to see if it has previously imported that module. If not, the interpreter searches a list of directories for a file named <i>moduleName</i>.py or a directory with that name.  This search list is available in a Python variable `sys.path` and is composed of the following sources:
- the current working directory
- the PYTHONPATH environment variable
- an installation-dependent list of directories (created at install time or when creating a virtual environment)

In [None]:
import sys
print(sys.path)

Next, the Python interpreter binds the search results to a name in the current local scope. This binding allows us to reference the module name, alias, or specific import item within our code. The following code shows that the length of the local namespace has grown by one from the import of the PI variable:

In [None]:
print("Local namespace size:",len(locals()))
from math import pi
print("Local namespace size (after import of PI):",len(locals()))

Then the Python interpreter executes the code within the <i>moduleName</i>.py name. The execution creates any classes or functions defined within the file and runs any statements not contained within a class or function declaration. The latter is essential to allow the module to perform any necessary initialization steps before use. 

The following code checks if the file started from the command line (through a command such as <code>python <i>moduleName</i>.py</code>. 
<pre>
if __name__ == "__main__":
    <i>statements</i>
</pre>
If it has, then the statements for that block will execute. This check enables the module to run as a main program, but if imported by other programs, skip that code block. A large number of Python files contain this common boilerplate code: <code>if \_\_name\_\_ == "\_\_main\_\_":</code>. In some ways, the code is equivalent to the main method in C, C++, and Java.

## Module Docstrings
To help other developers properly use your modules, you should use a docstring at the top of the file.  The docstring should list the purpose of the module and then list the classes, functions, exceptions, and any other items exported by the module with a quick summary of each. [Docstring conventions](https://peps.python.org/pep-0257/#multi-line-docstrings)


## Best Practices
Although developers can configure modules to [only export specific](https://docs.python.org/3/tutorial/modules.html#importing-from-a-package) items when another programmer uses <code>from <i>module</i> import *</code>, it is still considered bad practice. This statement imports all of the module’s objects into your local namespace, making it difficult to determine what’s what. While typing <code><i>module.</i></code> is a bit more tedious, it makes your code clear where an object originated.

As you create modules, you should only group things that logically belong together.  Simply because you wrote two functions does not necessarily mean they should be within the same module.  Quite often, "utility" packages violate this principle.

While you can distribute modules and packages by simply providing the source code to others, you should 'package" these: [Overview of Packaging for Python](https://packaging.python.org/en/latest/overview/) [Tutorial](https://packaging.python.org/en/latest/tutorials/packaging-projects/)

The Python interpreter will only load a module once into your program - even if the code imports the module in multiple locations. Thus any changes to that module can be seen by other code that uses that module. As with anything else with Python (or any programming language), such functionality can be beneficial or a curse. Unsurprisingly, programming languages expect developers to not behave maliciously, such as in this code:

In [None]:
import statistics

def bad_programmer(l):
    import random
    return random.random()

statistics.stdev = bad_programmer

my_list = [10,5,12,3,6,4,11,4,8,5,6,7,8,6,5,6,7]
print("Std dev:",statistics.stdev(my_list))
print("Std dev:",statistics.stdev(my_list))
# good luck tracking down that one!

## Exercises
TODO
