# Lab 0: Introduction to Python

In the first videos, we explained:
- Main files that are included with a traditional Python installation
- How to access the Python interpreter from the console
- How to call functions, create variables and use python console as a calculator.

Then, we explained we ran the Python interpreter from a `py` script to:
- Create organized *Header Blocks*.
- Conditionals
- Explore main Data Structures
- Loops
- Functions
- Import Python packages with PIP

See the file in `dami_dsv/introduction/intro.py` to see a summary of the codes.

## Jupyter Notebooks

In this section, we continue explaining conditionals, loops and functions using Jupyter notebooks.

- IPython is a kernel behind Jupyter notebooks that makes working interactively convenient and efficient.
- It is the way to go when is about Scientific Computing and Data Mining
- Allows to combine codes with text through `MarkDown` language and $\LaTeX$

For example, some repositories with Jupyter notebooks are:

- https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks
- https://nbviewer.jupyter.org/



# Activity: Design of Custom Modules from Jupyter

All the `grading` from the homeworks will require the student to **design their own functions and place them in specific files**. As a practice, we will explain here how to create a function to calculate the mean in a multidimensional `numpy` array. This function is already incorporated in the numpy package, but we will use it as a simple example to learn how to design functions from a Jupyter notebook, and then place them in external modules.

The lab comprises these steps:
- Construction of 2D arrays in `numpy`
- Create a function to calculate the `mean` of the array per axis
- Test the function in Jupyter notebook
- Copy the function to an external module
- Execute the function from the external module

In [None]:
import numpy as np

# Import custom package for the course
import dami_dsv.introduction.my_functions

## Create 2D array

Create a 2D numpy array with size `(10,5)`. The numbers to fill in the array are created pseudo-randomly.

In [None]:
# Guarantees that the pseudo-random process is always going to show the same results. 
# Provides repeatability of experiments
np.random.seed(12345)

In [None]:
array_in_2D = np.random.randint(0, 100, size=(10,5))
array_in_2D

In [None]:
# Example of how to access a specific value
# Access to the list in the index 2 (i.e. [91,80,73,11,77]), and then the element in the index 3. (i.e. 11)
array_in_2D[2,3] 

In [None]:
array_in_2D.shape # Method to extract the size of the numpy array

In [None]:
array_in_2D.max() # Function to calculate the maximum in the array

In [None]:
array_in_2D.mean() # Calculate the mean over all the elements

## Create a function to calculate the `mean` of the array per axis

The function needs to have a parameter `axis` which can take either None, 0 or 1. If None, it averages over all elements. If 0, it averages vertically downwards across rows (in our example, result is an array of length 5). If 1, it averages horizontally across columns (in our example, result is an array of length 10).

**NOTE: The concept of `axis` in numpy arrays is one of the most important concepts for data analysis, but at the same time one of the hardest to comprehend. Please refer to the [official explanation](https://docs.scipy.org/doc/numpy-1.10.0/glossary.html) to familiarize yourself with this parameter.** 

In [None]:
def calculate_mean_2D_array(input_array, axis=None):
    """
    This function uses numpy methods to calculate the 
    mean of a 2D numpy array according to the specified axis.

    Input:
        input_array: 2D numpy array
        axis: Defines how to perform the calculation of the mean
            axis=None (default) - Average all the values in the array
            axis=0 - running vertically downwards across rows
            axis=1 - running horizontally across columns
    Output:
        A single value (if axis=None) or an array containing
        the mean of the elements along the specified axis
    """

    # Local Variables
    result = None   # This variable will contain the final result

    # Help variables for the function
    nrows, ncols = input_array.shape # Extracts the size of the array
    N = nrows * ncols

    # Average over all the elements
    if axis == None:
        # Help variable to store cumulative sum
        cumsum = 0
        # Access each row from the array
        for i in range(nrows):
            # Access each value from the row
            for j in range(ncols):
                # Add to the cumulative sum
                cumsum += input_array[i,j]
        # Calculate total average
        result = cumsum / N

    # Average vertically downwards across rows
    elif axis == 0:
        # The result is a list, in which we will append respective values 
        result = []
        # Access each row from the array
        for j in range(ncols):
            # Sum ALL the elements from the column
            #   and divide by number of elements
            average = input_array[:,j].sum() / nrows
            # Append the value to the result
            result.append(average)
        # Convert from list to numpy array
        result = np.array(result)

    # Average horizontally across columns
    elif axis == 1:
        # The result is a list, in which we will append respective values 
        result = []
        # Access each row from the array
        for i in range(nrows):
            # Sum ALL the elements from the row
            #   and divide by number of elements
            average = input_array[i,:].sum() / ncols
            # Append the value to the result
            result.append(average)
        # Convert from list to numpy array
        result = np.array(result) # Convert from list to numpy array
    
    # Return the variable to the object that called this function
    return result

## Test the function from the Jupyter notebook

In [None]:
# Result custom function with no specific axis
calculate_mean_2D_array(array_in_2D)

In [None]:
calculate_mean_2D_array(array_in_2D, axis=0)

In [None]:
calculate_mean_2D_array(array_in_2D, axis=1)

### Compare the result with built-in function

The function that we have created already exists in numpy (see [here](https://numpy.org/doc/stable/reference/generated/numpy.mean.html)).

Then we can use the implemented function to validate our results, we just need to call the method `.mean()` and the parameter `axis`.

In [None]:
# Result of mean among all elements
array_in_2D.mean()

In [None]:
# Result of mean in axis 0 []
array_in_2D.mean(axis=0)

In [None]:
# Result of mean in axis 1
array_in_2D.mean(axis=1)

As we can see, the results from our custom function `calculate_mean(input_array, axis=0)` returns the same values than the built-in function `array_in_2D.mean(axis=0)`.

## Copy the function to an external module

Now, we copy the function that was created before to an external module in the folder `dami_dsv/introduction/my_functions.py`. This enhances readability of the main code.

## Execute the function from the external module

To work with the function is important to import the correct file. In this case, we need to import the external module using `import dami_dsv.introduction.my_functions`

This allows us to access the function `calculate_mean_2D_array()`, not from the current notebook but from the file `my_functions.py`

In [None]:
dami_dsv.introduction.my_functions.calculate_mean_2D_array(array_in_2D)

In [None]:
dami_dsv.introduction.my_functions.calculate_mean_2D_array(array_in_2D, axis=0)

In [None]:
dami_dsv.introduction.my_functions.calculate_mean_2D_array(array_in_2D, axis=1)

# Final Remarks:
- If you call `calculate_mean_2D_array()`, the function that was created in the cell `[8]` will be executed, which was created directly from the Jupyter notebook.
- If you call `dami_dsv.introduction.my_functions.calculate_mean_2D_array()`, the function that will be executed is stored in the external module. Keep this in mind because **MOST OF THE HOMEWORKS WILL ASK TO MOVE YOUR FUNCTIONS TO A SEPARATE FILE TO FACILITATE THE GRADING.**

## `NOTE:` Remember to **Restart the Kernel** from time to time to guarantee that the notebook works properly when you press **Run All**

# End of Lab 0