## Deep Learning 

#### Logisitic Regression

**Goal:** Classification - Find cats in images. $\begin{cases} 1 & \text{if } x = \text{true} \\0 & \text{if } x = \text{false}\end{cases}$

1. If you have an image and you want to classify it you first need to convert it to a vector input. Suppose you have a 64x64 pixel image. You would have 64x64x3 vectors, times 3 for red, green and blue. 

2. Next you push it into the logisitic operation that is $ \hat{y} = \sigma(xW + b)$. $x$ is the image. Remember the sigmoid just maps any number between 0 and 1. Also remember that this is true so you don't confuse terms: $\hat{y} = \sigma(\theta^T x) = \sigma(Wx + b)$

3. Training:
(i) Init w(weights) and b(bias)
(ii) Optmize w and b
(iii) Use $ \hat{y} = \sigma(xW + b)$ to predict

Equations:
EQ1: Neruon = linear + activation: Logistic Regression is a type of neruon. The $Wx + b$ is the linear part, the sigmod is the activation. 

EQ2: Model = archtechture + parameters: The Architechture would be the one neruon nn logositic regression and the other would be $w$ and $b$.

**Goal 2:** Find cats/ lion/ iguana in images. 

1. Input Layer: 64 x 64 x 3 image


2. Flattening the Input: The 3D 64x64x3 image is flattened into an array.


3. Hidden Layers and Neurons: The architechture will contain the one input layer going to three nodes in the first layer, labeled $ q_1^{(1)}, q_2^{(1)}, q_3^{(1)} $, representing neurons in the first hidden layer.
Each neuron processes the input data using a weighted sum plus a bias, followed by an activation function $ \sigma $ (e.g., sigmoid or ReLU).
For example: 
- $ \hat{y}_1 = a_1 = \sigma(w_1 x + b_1) $: Neuron $ q_1^{(1)} $ 
- $ \hat{y}_2 = a_2 = \sigma(w_2 x + b_2) $: Neuron $ q_2^{(1)} $ 
- $ \hat{y}_3 = a_3 = \sigma(w_3 x + b_3) $: Neuron $ q_3^{(1)} $ 


4. Weights and Biases: Each neuron has its own set of weights ($ w_1, w_2, w_3 $) and biases ($ b_1, b_2, b_3 $) that are learned during training to adjust how much influence each input has. Total of 6. 


5. Output: The outputs $ \hat{y}_1, \hat{y}_2, \hat{y}_3 $ from the neurons could represent different predictions or features (e.g., probabilities for different classes like "smile" or "no smile" if this is a classification task). **Prediction Vector :** As appose to binary 1 or 0 in the first model, this will use a prediction vector $ p^{(class)} = [0, 0, 0] $. 
- Cat = $[1, 0, 0]$
- Lion = $[0, 1, 0]$
- Iguana = $[0, 0, 1]$

**How It Works Together:**
The input image is flattened and passed through the neurons.
Each neuron computes a weighted sum of inputs plus a bias, applies the activation function, and produces an output.
These outputs can be used for further layers (if more are present) or as the final prediction.

The loss fuction would have to change too becomes as aspose to an output of 0 and 1, we need probaility between 0 and 1. In this case the loss fuction will look like:

 $$  \mathcal{L}_{3W} = -\sum_{k=1}^{3} [y_k \log \hat{y}_k + (1 - y_k) \log (1 - \hat{y}_k)]   $$

**Goal 3:** + constraint and unique animal on image.

To train an efficent model where one of the images might not have enough data you would want to train it such that all the outcomes sum to one so that knowing two would be enough to know the third one. This is where the softmax fuction becomes the activation:

${\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}\,.}$

Softmax is used when you have multiple output classes (e.g., 3 classes in your diagram with $ \hat{y}_1, \hat{y}_2, \hat{y}_3 $) and want to interpret the outputs as probabilities that sum to 1.

This is called a softmax multiclass network. 

This is the loss fuction for it:

 $ \mathcal{L}_{CE} = -\sum_{k=1}^{3} y_k \log \hat{y}_k $


 ----------
 So where to these layers come into play? More layers allow for the model to understand more complextities in the data. For example if you are training a model to detect a cat or no cat, the input layer would output in very simple lines or edges of the picutre. The next layer (hidden layer) would result in maybe certain facial features. The output layer would result in the whole cat. 

 ### Propagation Equations 

 First Layer: 
 Linear - $z^{[1]} = w^{[1]}x + b^{[1]}$
 Activation -  $ a^{[1]}= \sigma(z^{[1]})$

 Second Layer: 
 Linear - $z^{[2]} = w^{[2]}a^{[1]} + b^{[2]}$ 
 Activation -  $ a^{[2]}= \sigma(z^{[2]})$

 Output Layer: 
 Linear - $z^{[3]} = w^{[3]}a^{[2]} + b^{[3]}$ 
 Activation -  $ a^{[3]}= \sigma(z^{[3]})$

 ### Processing an Input Batch of $ m $ Examples

Each column of $ Z $ (e.g., $ z^{(1)}, z^{(2)}, ..., z^{(m)} $) will be passed through an activation function (e.g., $ \sigma $ or softmax) to produce the final outputs for each example.

Input Batch: $ X $ represents a batch of $ m $ input examples, where each example is a vector. Notation: $ X = [x^{(1)}, x^{(2)}, ..., x^{(m)}] $, where $ x^{(i)} $ is the $ i $-th example. Each $ x^{(i)} $ is a column vector, so $ X $ is a matrix with the number of rows equal to the input dimension.




