# Coaching DS3 (25 Oct 2025)

## Agenda

- Brief Revision on Git (Optional)
- Review Conda and Jupyter Notebook 
- Numpy vs Pandas and Applications 
- Matrix Computation on ML (Optional) 
- Learn and Managed Stress 
- Class Activity (Video Game Sales)

## Conda

## Jupyter Notebook

Ways to Open Jupyter Notebook:

- VSCode via extension
- Google Colab
- Github Codespace

Others:
- Anaconda
- Jupyter Notebook/Lab Server


Using Notebook (Demo this notebook)
- Markdown 
- Code in Markdown
- Cell (markdown, python, SQL, perl etc)

# Numpy vs Pandas

- Pandas build on top of Numpy
- We use Pandas most of the time

In [None]:
import numpy as np
import pandas as pd

Import may take a while for first time use.

In [None]:
!pwd

In [None]:
pwd

In [11]:
df = pd.read_csv('Mall_Customers.csv')
df

Unnamed: 0,CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40
...,...,...,...,...,...
195,196,Female,35,120,79
196,197,Female,45,126,28
197,198,Male,32,126,74
198,199,Male,32,137,18


In [10]:
df.shape

(200, 5)

In [12]:
data = df.to_numpy()
data

array([[1, 'Male', 19, 15, 39],
       [2, 'Male', 21, 15, 81],
       [3, 'Female', 20, 16, 6],
       [4, 'Female', 23, 16, 77],
       [5, 'Female', 31, 17, 40],
       [6, 'Female', 22, 17, 76],
       [7, 'Female', 35, 18, 6],
       [8, 'Female', 23, 18, 94],
       [9, 'Male', 64, 19, 3],
       [10, 'Female', 30, 19, 72],
       [11, 'Male', 67, 19, 14],
       [12, 'Female', 35, 19, 99],
       [13, 'Female', 58, 20, 15],
       [14, 'Female', 24, 20, 77],
       [15, 'Male', 37, 20, 13],
       [16, 'Male', 22, 20, 79],
       [17, 'Female', 35, 21, 35],
       [18, 'Male', 20, 21, 66],
       [19, 'Male', 52, 23, 29],
       [20, 'Female', 35, 23, 98],
       [21, 'Male', 35, 24, 35],
       [22, 'Male', 25, 24, 73],
       [23, 'Female', 46, 25, 5],
       [24, 'Male', 31, 25, 73],
       [25, 'Female', 54, 28, 14],
       [26, 'Male', 29, 28, 82],
       [27, 'Female', 45, 28, 32],
       [28, 'Male', 35, 28, 61],
       [29, 'Female', 40, 29, 31],
       [30, 'Female', 23

In [None]:


data.shape


### Columns 

- Columns manipulation are more common than partial slice

**List column 2**


In [None]:
data[:,1:2]

In [None]:
df['Gender']

**Change Male to 1 and Female to 0**

In [None]:
data[:,1:2]

In [None]:
# Try np.where without actual assignment. Check if conversion is ok.
np.where(data[:,1:2] == 'Male', 1, 0)

In [None]:
# Perform actual conversion
data[:,1:2] = np.where(data[:,1:2] == 'Male', 1, 0)

In [None]:
data

Pandas

In [None]:
df['Gender']

In [None]:
# Try df.replace without actual assignment. Check if conversion is ok.
df['Gender'].replace({'Male': 1, 'Female': 0})

In [None]:
# Perform actual conversion
df['Gender'] = df['Gender'].replace({'Male': 1, 'Female': 0})

In [None]:
df

#### Numpy Function and Axis

- Also applies to Pandas
- Focus on axis as we usually manipulation rows and columns

In [None]:
data

In [None]:
# Official definition is average across rows for each column
data.mean(axis=0)

In [None]:
# Official definition is average across columns for each row
data.mean(axis=1)

In [None]:
# Average of all values
data.mean()

In [None]:
# Pandas equivalent 
df.mean(axis=0)

#### Why Shape Matter

In [None]:
df = pd.read_csv('Mall_Customers.csv')
df

In [None]:
df.shape

In [None]:
from PIL import Image

In [None]:
img0 = Image.open('four.png')

In [None]:
img0.show()

In [None]:
img0_array = np.array(img0)
img0_array.shape

- In ML, we troubleshoots by shape especially during deep learning 

#### Identify Shapes

In [None]:
np.ndim(img0_array)

In [None]:
np.ndim(data)

In [None]:
v1 = np.array([1,2,3,4,5])
v1.shape

In [None]:
np.ndim(v1)

In [None]:
v2 = np.array([[1,2,3,4,5]])
v2.shape

In [None]:
v3 = np.array([[1],[2],[3],[4],[5]])
v3.shape

In [None]:
v4 = np.zeros((5,1))
v4.shape

In [None]:
v4

In [None]:
v5 = np.array([1,2,3,4,5])
v5 = v5.reshape((5,1))
v5.shape    

#### Differentiate Between Pandas and Numpy

In [None]:
type(df)

In [None]:
type(data)

- In ML, some model can take in Pandas and return Pandas
- some model take in Pandas and return numpy

#### ConvertingDifferentiate Between Pandas and Numpy

Convert Pandas to Numpy

In [None]:
numpy_df = df.to_numpy()

In [None]:
numpy_df

Convert Numpy to Pandas

In [None]:
new_df = pd.DataFrame(data, columns=['CustomerID', 'Gender', 'Age', 'Annual Income (k$)', 'Spending Score (1-100)'])

In [None]:
new_df

#### Summary

- Pandas build on Numpy, most function is the same
- Shapes matter
- Numpy and Pandas was used interchangeably in ML
- Need to cone back to numpy if you want to learn in depth how the ML models perform its computation

### Matrix Computation

- https://github.com/mlnotes2718/Introduction-Machine-Learning/blob/main/ML%20Master%20Notes%203%20-%20Simple%20Linear%20Regression%20(One%20Feature%20No%20Intercept).ipynb

- https://github.com/mlnotes2718/Introduction-Machine-Learning/blob/main/ML%20Master%20Notes%208%20-%20Multiple%20Regression%20and%20Vectorization.ipynb



## Learning and Manage Stress

- Focus on skills acquisition
- Accommodate to multiple solutions to one problem 
- Take in the important information and come back later for depth
- Use Google, Youtube and LLM to assist