<a href="https://colab.research.google.com/github/edoardochiarotti/class_datascience/blob/main/2024/00_Python-Basics/00_Python-Basics_6_Library-Packages-Modules.ipynb" target="_blank" rel="noopener"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python-Basics: Library, Packages and Modules

<img src='https://www.agent-x.com.au/wp-content/uploads/2011/06/Perfect-Programmer-dfe194b-e8d3b11-b960bd5.jpg' width="400">

Source: [Agent-X Comics - Perfect Programming](https://www.agent-x.com.au/comic/perfect-programming/)

## Content

Until now, we have operated on our data using built-in functions, operators, and objects' methods. We already performed quite neat operations thanks to the Python Standard Library, which has lots of built-in modules that contain useful functions and data types for doing specific tasks. But we can also use modules from outside the standard library, and we can even write our own modules!

- [Library, Packages and Modules](#Library,-Packages-and-Modules)
   - [Import packages](#Import-packages)
   - [Style Guide for Python Code](#Style-Guide-for-Python-Code)

## Library, Packages and Modules <a name="Library,-Packages-and-Modules"></a>

A **module** is contained in a file that ends with `.py`. This file can have **classes**, functions, and other objects. We will not discuss classes for now, just remember that a class is like an object constructor, or a "blueprint" for creating objects (Check the [Documentation](https://docs.python.org/3/tutorial/classes.html) to learn more and a nice introduction from [GeeksforGeeks](https://www.geeksforgeeks.org/python-classes-and-objects/)).

A **package** contains several related modules that are all grouped together under one name. For instance, [Pandas](http://pandas.pydata.org) (derived from "panel data") is the go-to package for data analysis and manipulation. Another fundamental package for scientific computing is [NumPy](http://www.numpy.org) (Numerical Python).

A **library** is an umbrella term referring to a reusable chunk of code, which usually contains a collection of related modules and packages. For instance, [Matplotlib](https://matplotlib.org/) is a comprehensive library for creating static, animated, and interactive visualizations. In practice, library and package are often used interchangeably.

Standard Python installations come with the standard library. Outside of the standard library, there are several packages available such as pandas and NumPy. There are currently more than 300,000 packages available through the [Python Package Index](https://pypi.python.org/pypi), PyPI! Usually, you can ask Google about what you are trying to do, and there is often a third party module to help you do it. The most useful (for scientific computing) and thoroughly tested packages and modules are available using `conda`. Others can be installed using `pip`.

We will discover several other packages along our journey, but for now let's discover how to access and use packages and modules.  

### Import packages <a name="Import-packages"></a>

To access a package, we have to `import` it. For instance, let's import the `numpy` package:

In [1]:
import numpy

That's it! Now we can start using the numerous functionalities offered by `numpy` such as means, medians, standard deviations, and lots and lots and lots of other numerical operations. Let's explore what is available in `numpy`. Remember, in Python everything is an object, so if we want to access the methods and attributes available in `numpy`, we use dot syntax. In Colab, we can type `numpy.` and then hit tab to discover what the module offers, Note that this technique works for all objects, so do not hesitate to use it when, for instance, you do not remember the name of a method:

In [None]:
numpy.

That's a lot of options! Let's try the `numpy.mean()` function:

In [3]:
my_lis = [1,2,3,4,5,6]

numpy.mean(my_lis)

3.5

Nice! Let's try the `numpy.median()` function:

In [4]:
numpy.median(my_lis)

3.5

This is cool. It gives the median, including when we have an even number of elements in the sequence of numbers, in which case it automatically interpolates. It is really important to know that it does this interpolation, since, if you are not expecting it, it can give unexpected results. So, here is an important piece of advice:

<div style="color: dodgerblue; text-align: center; font-weight: bold;">

Always check the doc strings of functions.    

</div>

We can access the doc string of the `numpy.median()` function by typing `numpy.median?`:

In [5]:
numpy.median?

[1;31mSignature:[0m [0mnumpy[0m[1;33m.[0m[0mmedian[0m[1;33m([0m[0ma[0m[1;33m,[0m [0maxis[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mout[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0moverwrite_input[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m [0mkeepdims[0m[1;33m=[0m[1;32mFalse[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Compute the median along the specified axis.

Returns the median of the array elements.

Parameters
----------
a : array_like
    Input array or object that can be converted to an array.
axis : {int, sequence of int, None}, optional
    Axis or axes along which the medians are computed. The default
    is to compute the median along a flattened version of the array.
    A sequence of axes is supported since version 1.9.0.
out : ndarray, optional
    Alternative output array in which to place the result. It must
    have the same shape and buffer length as the expected output,
    but the type (of the output) will be cast if n

See in the output: 

    Notes
    -----
    Given a vector ``V`` of length ``N``, the median of ``V`` is the
    middle value of a sorted copy of ``V``, ``V_sorted`` - i
    e., ``V_sorted[(N-1)/2]``, when ``N`` is odd, and the average of the
    two middle values of ``V_sorted`` when ``N`` is even.

This is where the documentation tells you that the median will be reported as the average of two middle values when the number of elements is even. Note that you could also read the [median documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html), which is a bit easier to read. And you can check the [Numpy Documentation](https://numpy.org/doc/stable/) to discover the full extent of `numpy` power! 

As you can see, `numpy`and other modules are super useful and we will use then all the time. There is a drawback: we always have to use the dot syntax with the full name of the module to access the methods it contains, and typing `numpy` over and over again can get annoying... Wait a minute, you do not actually have to do that! We can use the `as` keyword to import a module as an **alias**. Numpy's alias is traditionally `np`, so you shall always use this alias:

In [6]:
import numpy as np

np.mean([1.1, 8.4, 5.3, 6.7, 9.2])

6.14

Finally, you do not have to import the full package/module if you want to use only a specific element. For example, suppose we need the value of pi, which can be accessed via the `math` module ([Documentation](https://docs.python.org/3/library/math.html). We could do as before, importing the full module:

In [7]:
import math

math.pi

3.141592653589793

Alternatively, we only import `pi`:

In [8]:
from math import pi

pi

3.141592653589793

Amazing, when using from-import, we do not need to use the dot syntax! Indeed, in this example, we did not import the full module, rather just `pi` as a variable.

Packages and modules are super convenient. If you want to do something that seems really common, a good programmer (or a team of them) probably already wrote something to do that. So always check online what is there before jumping in into coding complex pieces of codes! 

### Style Guide for Python Code <a name="Style-Guide-for-Python-Code"></a>

There are some good practices when writing Python code. A great Python style guide is [PEP 8](https://www.python.org/dev/peps/pep-0008/). Here is the recommendation about importing libraries, packages, and modules:  

>Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
>
>Imports should be grouped in the following order:
>
>1. standard library imports
>2. related third party imports
>3. local application/library specific imports
>
>You should put a blank line between each group of imports.

Try to follow this guide!