**Reminder**: if you are using Google Colab, you may be interested in getting the Pro plan of Google Gemini and Colab for free: https://colab.research.google.com/signup

# Hands-On Practice and Applications

In Week 1, we focused on two main concepts:
- Regression, specifically *linear regression*
- Perceptron, including Multi-Layer Perceptron (MLP)

Like all supervised machine learning algorithms, the key components are
- Dataset with inputs and labelled outputs
- Model with adjustable parameters to fit to training data
- Loss function (aka cost function or objective function)

# Linear Regression





The following dataset gives measurements of stellar masses and stellar radii for a certain class of stars.
The data are in the format $(\log_{10} M/M_\odot, \log_{10} R/R_\odot)$.

In [None]:
# columns are log10(stellar_mass/solar_mass), log10(stellar_radius/solar_radius)
stellar_data = [
    (-0.226564, -0.332539),
    (1.732429, 1.189290),
    (0.988779, 0.539741),
    (0.535439, 0.167849),
    (-0.969537, -0.611534),
    (-0.969619, -0.780734),
    (-1.302516, -0.987461),
    (1.444999, 0.811574),
    (0.543791, 0.260564),
    (0.907447, 0.583963),
    (-1.430013, -1.199721),
    (1.797693, 1.241714),
    (1.330305, 0.812655),
    (-0.778047, -0.651483),
    (-0.881795, -0.756052),
    (-0.876425, -0.506845),
    (-0.465576, -0.402090),
    (0.284172, 0.025135),
    (-0.031387, -0.010602),
    (-0.509821, -0.554198),
]

It's always a good idea to plot the data first, before you pursue any learning algorithms.
And it's always helpful to label the axes appropriately.
What do you notice about the data?

In [4]:
# Put your answer here

Use a linear regression model to fit these data and extract the parameters.

In [None]:
# Put your answer here


Based on this learned regression, what is the stellar mass prediction for a star that has a radius 100 times the radius of the Sun?

# Single-Layer Perceptron

## Introduction to the problem

The NAND ("NOT AND") gate is the universal logic gate; any computational function can be implemented using NAND gates only.
If we hope to use machine learning to learn any arbitrary computational function, it would be nice to show that the ML can learn the logic of the NAND gate.

| Input A | Input B | Output |
|---------|---------|--------|
| 0       | 0       | 1      |
| 0       | 1       | 1      |
| 1       | 0       | 1      |
| 1       | 1       | 0      |

Try plotting this logic table in the same way we plotted the logic table for XOR.
Make inputs A and B on the axes, and make the output a color or a marker style to distinguish output 0 (FALSE) from output 1 (TRUE).

In [2]:
# Put your code here

Do you think you could train a linear algorithm, like the perceptron with a linear activation function, to learn this logic?
Why or why not?

In [None]:
# Put your answer here
# You could also try to use a linear perceptron to see what would happen

## Training the non-linear perceptron

A perceptron with a non-linear activation function can learn the NAND gate.
- Implement a simple perceptron $w_1 x_1 + w_2 x_1 + b$. (Why is this the right dimensionality?)
- Then pick a suitable activation function and try to train the perceptron on the NAND logic table.

What weights and biases accomplish the task? Can you check the results by multiplying the matrices by hand?

*Bonus challenge*: would you get the same results if you scaled up all of the weights by some arbitrary factor $a$?

# Multi-Layer Perceptron

In some cases, the dataset may be so complicated that a multi-layer perceptron is needed.

The dataset below is a set of $(x,y)$ points that have been labeled with a classification: 0 or 1.


In [None]:
# elements are x, y, class
data_points = [
    (1.391964, 0.591951, 0),
    (0.606690, 1.248435, 0),
    (-1.174547, 0.148539, 0),
    (-0.408330, -1.252625, 0),
    (-1.247132, -0.801829, 0),
    (-0.263580, 1.711056, 0),
    (-1.801957, 0.085407, 0),
    (1.105398, -0.359085, 0),
    (-1.447320, 0.975134, 0),
    (-1.129734, 0.120408, 0),
    (0.280262, -1.890873, 0),
    (-0.667936, -1.371906, 0),
    (-1.314393, 0.256176, 0),
    (1.241144, 1.374043, 0),
    (-1.134106, -1.048900, 0),
    (1.377637, 0.825345, 0),
    (1.564710, 0.367663, 0),
    (1.625803, -0.037226, 0),
    (0.136397, -0.434754, 0),
    (-1.113864, 0.051424, 0),
    (-0.537702, 1.533307, 0),
    (-1.632389, -0.437353, 0),
    (-0.495534, -1.667353, 0),
    (0.112744, 1.218114, 0),
    (-0.016509, -1.194730, 0),
    (0.657386, 0.016074, 0),
    (1.032813, 1.569718, 0),
    (-0.793301, 0.793835, 0),
    (0.801178, 0.214648, 0),
    (0.072558, -1.641025, 0),
    (1.145626, -0.679588, 0),
    (-1.452350, 0.959904, 0),
    (1.903333, 0.331205, 0),
    (0.177903, -1.548886, 0),
    (-0.965821, -0.904743, 0),
    (0.292403, 1.151371, 0),
    (0.911768, -1.654384, 0),
    (-1.047602, 1.478948, 0),
    (0.945677, 0.409852, 0),
    (1.768868, 0.015823, 0),
    (0.461423, -0.044894, 0),
    (-1.233307, -0.689414, 0),
    (1.182094, 1.214905, 0),
    (-0.843053, -1.238449, 0),
    (0.068054, 0.009098, 0),
    (-0.206412, 0.255868, 0),
    (-1.280752, 0.545000, 0),
    (0.243811, 0.824579, 0),
    (0.097802, -1.062261, 0),
    (0.295360, 1.287576, 0),
    (2.729192, 1.427382, 1),
    (-2.913639, -2.303564, 1),
    (-2.723984, -2.755627, 1),
    (2.132764, 1.221947, 1),
    (-0.154957, -2.412995, 1),
    (-1.960789, -0.396890, 1),
    (0.810562, -2.728176, 1),
    (0.018818, 2.138939, 1),
    (0.952162, -2.022393, 1),
    (-2.576588, 0.854516, 1),
    (-2.840932, 0.514653, 1),
    (2.641381, 0.452845, 1),
    (2.648789, -0.683384, 1),
    (2.767143, 2.432104, 1),
    (-1.825253, -2.583832, 1),
    (-2.395332, -2.890669, 1),
    (-2.433342, 1.098041, 1),
    (-2.572868, -1.086146, 1),
    (2.069252, -2.860368, 1),
    (1.886811, -1.308871, 1),
    (-2.291011, 1.180423, 1),
    (0.773657, 2.264832, 1),
    (1.410426, 1.820886, 1),
    (-1.307793, -1.935363, 1),
    (1.503689, 1.841008, 1),
    (2.943031, -0.524294, 1),
    (-0.955179, 2.584544, 1),
    (2.150477, -0.426036, 1),
    (1.505226, 1.527257, 1),
    (-2.381257, 2.415317, 1),
    (-1.079702, 2.373139, 1),
    (-0.664790, -2.934974, 1),
    (2.432292, -2.452280, 1),
    (-1.084118, 2.700372, 1),
    (2.703643, 0.440627, 1),
    (1.749474, 1.737709, 1),
    (-2.452763, -0.033478, 1),
    (-2.654647, 0.297173, 1),
    (-0.350817, 2.326225, 1),
    (-0.894510, -2.297598, 1),
    (-2.142050, 1.569064, 1),
    (0.709308, -2.393264, 1),
    (-2.495359, 1.205815, 1),
    (-2.563422, 1.931160, 1),
    (1.237453, -2.511907, 1),
    (-2.490974, 2.919837, 1),
    (1.876797, 2.683491, 1),
    (2.916006, 1.520269, 1),
    (-0.742442, -2.498996, 1),
    (-0.454668, 2.438126, 1),
]

Plot the data set, using colors to represent the classes in the binary classification (0 or 1).

In [5]:
# Put your answer here

Implement an multi-layer perceptron with hidden layer(s) and non-linear activation function to learn this dataset classification.
You may need to experiment with the number of nodes in the hidden layer to get the best results.
Make sure to pay attention to the learning rate and increase it if the learning does not converge fast enough.

In [6]:
# Put your answer here

Based on this learned classification, how would you classify the following points?
- (1.0, -1.0)
- (2.5, -2.5)
- (0.0, 0.0)