## **DeepLearning.AI - Machine Learning Specialisation

### **Linear regression:**

```
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)

sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_train)

b_norm = sgdr.intercept_
w_norm = sgdr.coef_

y_pred_sgd = sgdr.predict(X_norm)
y_pred = np.dot(X_norm, w_norm) + b_norm
```

### **Neural networks:**

- Activation = neuron = single logistic regression model (or other model)
- Layer: a grouping of neurons which takes as input the same or similar features, and that in turn outputs a few numbers together
- Each layer takes in a vector, does a calculation based on logistic regression (or other activation model), and outputs a vector
- Main Python packages: TensorFlow or PyTorch
- Fitting the model is faster if you normalise the data first
- At the output layer, use the common sense activation function for the problem
- On the hidden layers, always use ReLU (rectified linear unit) (basically looks like a call option)
```
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras import Sequential
from tensorflow.keras.losses import BinaryCrossentropy   # Loss function for classification model
from tensorflow.keras.losses import MeanSquaredError     # Loss function for regression model
from tensorflow.keras.activations import linear, relu, sigmoid
from tensorflow.keras.optimizers import Adam

# Normalise data:
X = np.array([[x1, x2, x3]])                             # X must be a row array of features
norm_l = tf.keras.layers.Normalization(axis=-1)
norm_l.adapt(X)
Xn = norm_l(X)

# Define the model
tf.random.set_seed(1234)
model = Sequential([
                    Dense(units, activation),
                    Dense(units, activation)            # Note: final layer needs units = 1
                    ])

# Fit the model
model.compile(loss=BinaryCrossentropy(), optimizer=Adam(learning_rate=0.01))
model.fit(Xn, y, epochs=10)
# View results
model.summary()
W1, b1 = model.get_layer('layer1').get_weights()
print(W1, b1)

# Forward propagation
X_testn = norm_l(X_test)
predictions = model.predict(X_testn)
yhat = (predictions >= 0.5).astype(int)
```
Parameters:
- units: number of activations in the layer
- activation: 'sigmoid', 'linear', 'relu'
- loss: type of loss function (classification or regression)
- optimizer: Adam (gradient descent with dynamic learning rate)
- epochs: no. steps of gradient descent

**Multiclass classification:**
- Softmax regression: a generalisation of logistic regression
- In final layer, use `activation='linear'`
- Use loss function: `SparseCategoricalCrossentropy(from_logits=True)`
- Then need to convert output to probabilities:
```
preds = model.predict(X_train)
probs = tf.nn.softmax(preds).numpy()

classification = []
for i in range(len(preds)):
    classification.append(np.argmax(preds[i]))
```
Two possibilities for loss function:
- SparseCategorialCrossentropy: expects the target to be an integer corresponding to the index. For example, if there are 10 potential target values, y would be between 0 and 9.
- CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded where the value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].

**Convolutional layers:**
- Alternative to Dense layers
- Dense layers use all activation values from the previous layer
- Convolutional layers are limited to only a subset of the previous activation values
- Speeds up computation, needs less training data, and less prone to overfitting

### **Decision trees:**

```

```