# <u>Tut_2.2</u>

## Leaning outcomes

* Function arguments (positional, keyword, default) and `return`
* Variable scope
* Importing modules
* NumPy
* CRISP-DM
* ML Pipeline


---

## Positional function arguments and return
* Position of arguments in function call **matters**
* Keyword argument in function call - position **does not matter**
* Default arguments are provided in function **definition**
* **No argument**
* **No return** value
* **Multiple** return value

In [12]:
def simple_math(a, b):
    print(a + b)
    print(a - b)

In [13]:
simple_math(5, 6)

11
-1


---

## Variable scope

* A variable created **within** the function is **local**. It is only accessible from within the function

In [16]:
a = "other test"
def local_global():
    a = "test string"
    return a

print(a)
 

other test


* A variable created **outside** the function is **global** and is 'visible' anywhere

---

## Importing modules

In [None]:
from random import randint
from math import sqrt

---

## ML libraries - [NumPy](https://numpy.org/)
<img src="https://raw.githubusercontent.com/numpy/numpy/181f273a59744d58f90f45d953a3285484c72cba/branding/logo/primary/numpylogo.svg" width="25%" height="25%" />

* **NumPy** *Numerical Python* It is a general-purpose array-processing package, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions.
* It is used **to process arrays that store values**, aka data. This increases performance and speeds up execution time. Handling data with a NumPy array is faster than Python lists, especially with big data sets.
* NumPy arrays are also used to store and process **images**. **TensorFlow**, a ML library, use NumPy arrays for training models.
* NumPy array is a **list** of lists.
* Python lists can handle elements of different data types, whereas NumPy arrays handle only elements of the same data type.
* NumPy also has a big collection of mathematical and linear algebra function; which are the base of machine learning.

#### Importing a package
*Need to be installed first*:<br>
`%pip install numpy` # command in Jupyter cell<br>
or<br>
`$ pip install numpy` # terminal

In [None]:
import numpy as np

In [20]:
# Convert list into an array
my_list = [11, 12.3, 77]

print(np.array(my_list))

np.array(my_list)

[11.  12.3 77. ]


array([11. , 12.3, 77. ])

In [None]:
print(type(my_list))
print(type(np.array(my_list)))
print(my_list[0]) # list indexing !!

<class 'list'>
<class 'numpy.ndarray'>
11


In [21]:
my_list = [[33,22], [22,55]]

In [22]:
np.array(my_list)

array([[33, 22],
       [22, 55]])

<u>N.b. In Jupyter, last line output of a code cell is printed without `print()` statement! It just looks different!</u>

* Please note the **2 brackets** before the first item, which indicates it is a **2-d array**.
* A two-dimensional array is said to have two axes

---

## CRISP-DM Methodology

* The **Cr**oss-**i**ndustry **s**tandard **p**rocess for **d**ata **m**ining, known as **CRISP-DM**, is an open standard process model that describes common approaches used by data mining experts
* It is the most widely-used analytics model

#### 1. Business understanding
#### 2. Data understanding
#### 3. Data preparation
#### 4. Modelling
#### 5. Evaluation
#### 6. Deployment

---

## Machine Learning (ML) Pipeline

### Introduction
Imagine you want to bake a cake. You don’t just throw ingredients together—you follow a step-by-step process:<br>
1. Gather ingredients (flour, eggs, sugar, etc.).
2. Mix them properly in the right order.
3. Bake the cake at the right temperature.
4. Check if it’s done and make adjustments if needed.
5. Decorate and serve!<br>

Machine Learning (ML) follows a similar step-by-step process to make predictions from data. This process is called an ML pipeline.

**Deploying a <u>pipeline</u> is like telling the story**
#### <u>N.b. In your course work you will **have to** deploy a pipeline - not just your model.</u>



### What is an ML Pipeline?
* An ML pipeline is a structured way to go from raw data to a working machine learning model. It ensures that all the necessary steps are followed to train and evaluate the model.
* Think of it as an assembly line where each step processes the data before passing it to the next stage.

### Main steps
#### 1. Data collection
* Just like gathering ingredients for a recipe, the first step is to collect data.
* Data can come from:
	* CSV files
	* Databases
	* APIs
	* Sensors
* Example: Collecting customer data from a shopping website.

#### 2. Data Preprocessing (Cleaning & Formatting)
* Raw data is often messy (has missing values, duplicates, errors).
* This step cleans the data so the model can learn properly.
* Example Steps in Data Preprocessing:
	* Remove missing values or replace them
	* Convert text into numbers (if necessary)
	* Normalize data (scaling numbers to a standard range)
* Analogy: Like washing and cutting vegetables before cooking!

#### 3. Model Training
* Here, we choose a Machine Learning algorithm and train it on the data.
* The model learns patterns from the training data.
* Example Algorithms:
	* Decision Trees
	* Linear Regression
	* Neural Networks
* Analogy: Like a chef learning a new recipe—practicing until they get it right!

#### 4. Model Evaluation
* We test the model to see how well it performs.
* This step ensures the model can make accurate predictions.
* Example Metrics:
	* Accuracy
	* Precision & Recall
	* Mean Squared Error (MSE)
* Analogy: Like tasting the cake to see if it's good or needs improvement!

#### 5. Deployment & Predictions
* If the model works well, we deploy it to make real-world predictions.
* Users or applications can now use the trained model.
* Example:
	* A model that predicts house prices based on location, size, and number of rooms.
	* A model that recommends movies based on what a user has watched before.
* Analogy: Like serving the cake at a party!


[Sample of ML pipeline and CRISP-DM procedures - how to write them down in README.md](https://github.com/DrSYakovlev/corals_health.git)

---

## Homework

* Revise application of `global` and `nonlocal` keywords. What they do and where to use them
* **Exercise**: Using nested `for` loops, create list of lists (dimensions to be entered by a user) and convert it into NumPy array.
* NumPy library:
	* Revise built-in methods (`np.zeros()`, `np.ones()`, `np.linspace()`, `np.random.seed(seed=1)`)
	* Revise methods and attributes(`shape`, `min()`, `max()`, `reshape()`, `arrange()`)
	