In [None]:
import numpy as np

# Linear Regression

##  Now use vector `w` (each `x_i` is a vector)

* X: (n, m)
* y: (n,)
* w: (m,)
* b: scalar?

In [None]:
X = np.array([[1,2,3], [2,3,4]])
w = np.array([1,2,1])
b = .1
print(X.shape)
print(len(X))
print(w.shape)
print()

print(np.sum(X * w, axis=1) + b)

In [None]:
print (X@w)

In [None]:
l = LinearRegression(3)
yP = l.predict(X)
y = np.array([0,1])
print(yP, y)
print(l.loss(X,y))

In [None]:
print(X.T.shape)
print(y.shape)

In [None]:
print(X.T)
print(yP - y)

In [None]:
print(X.T.dot(yP-y) / 2)

In [None]:
print((yP-y) @ X / 2)

In [None]:
print(y-yP)
print((y-yP).dot(y-yP))

In [None]:
print((y-yP))

In [None]:
print(X.T * (y-yP))

In [None]:
print(np.sum(X.T * (y-yP), axis=1)/2)

In [None]:
X.T.dot(y-yP)/2

In [None]:
import numpy as np
class LinearRegression:
    def __init__(self, m):
        self.w = np.random.randn(m)
        self.b = 0

    def predict(self, X):
        return X @ self.w + self.b

    def loss(self, X, y):
        n = len(X)
        yP = self.predict(X)
        return (y-yP).dot(y-yP) / (2 * n)

    def backprop(self, X, y, learning_rate):
        n = len(X)
        yP = self.predict(X)
        dldw = X.T.dot(yP - y) / n
        dldb = np.sum(yP - y) / n
        
        self.w -= learning_rate * dldw
        self.b -= learning_rate * dldb

    def fit(self, X, y, steps, learning_rate = 0.01):
        for i in range(steps):
            self.backprop(X, y, learning_rate)
            if (i % 50 == 0):
                print("Steps:", i)
                print(f"w: {self.w}, b: {self.b}, yP: {self.predict(X)}, loss: {self.loss(X,y)}")

In [None]:
X = np.random.randn(2,3)
y = np.random.randn(2)
l = LinearRegression(3)
print(f"X={X}, y={y}")
l.fit(X, y, 501, 0.01)

## Review

üí° Suggestions for Improvement

  1. Simpler prediction (optional but cleaner)
```
  def predict(self, X):
      # Current: np.sum(X * self.w, axis=1) + self.b
      # Simpler: use @ for matrix multiplication
      return X @ self.w + self.b  # ‚úÖ More readable
```
  2. Consider adding feature normalization
```
  # In __init__ or fit()
  self.mean = None
  self.std = None

  def fit(self, X, y, steps, learning_rate=0.01, normalize=True):
      if normalize:
          self.mean = X.mean(axis=0)
          self.std = X.std(axis=0)
          X = (X - self.mean) / self.std
      # ... rest of training
```
  3. Return loss history (useful for debugging)
```
  def fit(self, X, y, steps, learning_rate=0.01):
      losses = []
      for i in range(steps):
          self.backprop(X, y, learning_rate)
          if i % 50 == 0:
              loss = self.loss(X, y)
              losses.append(loss)
              print(f"Step {i}: loss = {loss:.4f}")
      return losses  # Can plot this later
```
  4. Add convergence check (early stopping)
```
  def fit(self, X, y, steps, learning_rate=0.01, tol=1e-6):
      prev_loss = float('inf')
      for i in range(steps):
          self.backprop(X, y, learning_rate)
          curr_loss = self.loss(X, y)

          if abs(prev_loss - curr_loss) < tol:
              print(f"Converged at step {i}")
              break
          prev_loss = curr_loss
```
  üìä How to Test It Works
```
  # Generate simple test data
  from sklearn.datasets import make_regression
  X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)

  # Your implementation
  model = LinearRegression(m=3)
  model.fit(X, y, steps=1000, learning_rate=0.01)

  # Compare with sklearn
  from sklearn.linear_model import LinearRegression as SklearnLR
  sklearn_model = SklearnLR()
  sklearn_model.fit(X, y)

  print("\nYour weights:", model.w)
  print("Sklearn weights:", sklearn_model.coef_)
  print("\nYour bias:", model.b)
  print("Sklearn bias:", sklearn_model.intercept_)

  # Should be similar (not exact due to different optimizers)
```
  üéØ Assessment Results

  What this tells me about your current level:

  Strengths:

  - ‚úÖ Understand gradient descent conceptually
  - ‚úÖ Know vectorization with numpy
  - ‚úÖ Can structure code cleanly
  - ‚úÖ Remember MSE loss formula
  - ‚úÖ Understand backpropagation concept

  Areas to Practice:

  - ‚ö†Ô∏è Small syntax errors (typos: b vs self.b, x vs X)
  - ‚ö†Ô∏è Took >1hr (goal: 20-30min for linear regression)
  - üí° Could simplify with @ operator instead of np.sum(... axis=1)

In [None]:
  # Generate simple test data
  from sklearn.datasets import make_regression
  X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)

  # Your implementation
  model = LinearRegression(m=3)
  model.fit(X, y, steps=1000, learning_rate=0.01)

  # Compare with sklearn
  from sklearn.linear_model import LinearRegression as SklearnLR
  sklearn_model = SklearnLR()
  sklearn_model.fit(X, y)

  print("\nYour weights:", model.w)
  print("Sklearn weights:", sklearn_model.coef_)
  print("\nYour bias:", model.b)
  print("Sklearn bias:", sklearn_model.intercept_)

  # Should be similar (not exact due to different optimizers)


  Today (Rest of Day 1):

  1. Try Logistic Regression (give yourself 1 hour max)
    - Will likely be similar difficulty
    - Focus on sigmoid function and binary cross-entropy
  2. Compare with solution in ML-Coding-Questions.md
  3. Note what took longest - that's your focus area

  Day 2:

  1. Review gradient formulas for common algorithms
  2. Practice matrix operation patterns
  3. Identify 3-5 more algorithms to implement this week

  This Week:

  Implement these from scratch (get each under 30 min):
  1. K-Nearest Neighbors (easier - good warmup)
  2. K-Means (medium)
  3. Decision Tree (harder - optional)

# Logistic Regression

* Sigmoid function: sigmoid(z) = 1 / (1 + e^(-z))
* z = X @ w + b
* **Cross entropy loss**: -(y * log(yP) + (1-y) * log(1-yP))

In [None]:
X = np.array([[1,2,3], [8,4,1]])
l = LogisticRegression(3)
y = np.array([0,1])
print(l.predict(X))

yP = l.predict(X)

In [None]:
y = np.array([0,0])
print(yP, y)
print(l.loss(X, y))

In [14]:
import numpy as np

class LogisticRegression:
    def __init__(self, m):
        self.w = np.random.randn(m)
        self.b = 0

    def predict(self, X):
        z = X @ self.w + self.b
        return 1 / (1 + np.exp(-z))

    def loss(self, X, y):
        yP = self.predict(X)
        return -(y.dot(np.log(yP)) + (1-y).dot(np.log(1-yP))) / len(X)

    def back_propagate(self, X, y, learning_rate):
        n = len(X)
        yP = self.predict(X)
        dldw = X.T @ (yP - y) / n
        dldb = np.sum(yP - y) / n
        self.w -= dldw * learning_rate
        self.b -= dldb * learning_rate

    def fit(self, X, y, steps, learning_rate=0.01):
        for i in range(steps):
            self.back_propagate(X, y, learning_rate)
            if i % 50 == 0:
                loss = self.loss(X, y)
                print(f"step: {i}, loss: {loss}")

In [15]:
X = np.random.randn(10, 8)
l = LogisticRegression(8)
y = np.random.randint(0, 2, size=10)
l.fit(X, y, 801)
print(y, l.predict(X))

step: 0, loss: 1.6254716734663455
step: 50, loss: 1.376383049263897
step: 100, loss: 1.154138318236018
step: 150, loss: 0.9645073869374869
step: 200, loss: 0.8126238986235069
step: 250, loss: 0.6981746779318883
step: 300, loss: 0.6147518196331508
step: 350, loss: 0.5538184009938333
step: 400, loss: 0.508107030715272
step: 450, loss: 0.4725406252261659
step: 500, loss: 0.44386114419466205
step: 550, loss: 0.42002873270994057
step: 600, loss: 0.39975412097323904
step: 650, loss: 0.382198348028718
step: 700, loss: 0.36679314010787156
step: 750, loss: 0.35313596045417206
step: 800, loss: 0.34092881492619653
[1 0 1 0 1 0 0 0 1 0] [0.79698729 0.59881974 0.79536481 0.12713059 0.39177987 0.01729298
 0.31109396 0.10481732 0.68050655 0.07809064]


## Review

  üîç Issues & Improvements

  1. Critical Bug in Gradient Calculation ‚ö†Ô∏è
```
  # Your code:
  dldw = (yP - y) @ X / n  # ‚ùå WRONG ORDER!

  # Should be:
  dldw = X.T @ (yP - y) / n  # ‚úÖ CORRECT
```
  Why this matters:
  - (yP - y) has shape (n,) - n samples
  - X has shape (n, m) - n samples, m features
  - (yP - y) @ X gives shape (m,) ‚úÖ (accidentally correct!)
  - BUT conceptually wrong - should be X.T @ (yP - y)

  Your code works because of numpy broadcasting, but in an interview this would be flagged as conceptually incorrect.

  Correct version:
  ```
  def back_propagate(self, X, y, learning_rate):
      n = len(X)
      yP = self.predict(X)
      dldw = X.T @ (yP - y) / n  # X transpose times error
      dldb = np.sum(yP - y) / n
      self.w -= dldw * learning_rate
      self.b -= dldb * learning_rate
  ```

  2. Loss Function - Missing Average
```
  # Your code:
  def loss(self, X, y):
      yP = self.predict(X)
      return -(y.dot(np.log(yP)) + (1-y).dot(np.log(1-yP)))
      # ‚ùå This is the TOTAL loss, not AVERAGE

  # Should be:
  def loss(self, X, y):
      n = len(X)
      yP = self.predict(X)
      return -(y.dot(np.log(yP)) + (1-y).dot(np.log(1-yP))) / n
      # ‚úÖ Now it's Mean Binary Cross-Entropy
```
  Why it matters:
  - Without averaging, loss depends on dataset size
  - Can't compare losses across different batch sizes
  - Standard practice is to report average loss

  3. Numerical Stability Issue (Important!)
```
  # Your code:
  return -(y.dot(np.log(yP)) + (1-y).dot(np.log(1-yP)))
  # ‚ö†Ô∏è Problem: log(0) = -inf if yP = 0 or yP = 1

  # Better:
  def loss(self, X, y):
      n = len(X)
      yP = self.predict(X)
      epsilon = 1e-15  # Small value to prevent log(0)
      yP = np.clip(yP, epsilon, 1 - epsilon)  # Clip predictions
      return -(y.dot(np.log(yP)) + (1-y).dot(np.log(1-yP))) / n
```
  Why: Extreme predictions (0 or 1) cause log(0) ‚Üí numerical errors

  4. Same Constructor Issue (as Linear Regression)
```
  # Your code:
  def __init__(self, m):  # ‚ö†Ô∏è Manual feature count

  # Better:
  def __init__(self):
      self.w = None
      self.b = None

  def fit(self, X, y, n_iterations=1000, learning_rate=0.01):
      if self.w is None:
          n_features = X.shape[1]
          self.w = np.random.randn(n_features) * 0.01  # Small init
          self.b = 0
      # ... rest of code
```
  5. Weight Initialization (Minor but good practice)
```
  # For logistic regression, use smaller initial weights:
  self.w = np.random.randn(m) * 0.01  # Scale down
  # or
  self.w = np.zeros(m)  # Start from zero

  # Large initial weights ‚Üí extreme sigmoid values ‚Üí slow learning
```
  ---
  üìä Assessment - Day 1 Complete!

  Time Analysis:

  - Linear Regression: 1 hour
  - Logistic Regression: 1 hour
  - Looked up: Cross-entropy formula ‚úÖ
  - Struggled with: Computing derivative ‚ö†Ô∏è

  What This Tells Me:

  ‚úÖ Strong Areas:
  - Numpy operations
  - Sigmoid function (got it right!)
  - Code structure (consistent pattern)
  - Loss function (conceptually correct)

  ‚ö†Ô∏è Rusty Areas:
  1. Gradient derivation - This is where you lost time
  2. Matrix dimensions - (yP - y) @ X vs X.T @ (yP - y)
  3. Loss function details - Forgot averaging

  üéØ Key Insight:
  The gradient for logistic regression is beautiful: it simplifies to exactly X.T @ (yP - y) / n, same form as linear regression! This is one of those "aha"
   moments in ML.

  ---
  üßÆ The Math (For Your Reference)

  Binary Cross-Entropy Loss:

  L = -(1/n) * Œ£[y*log(≈∑) + (1-y)*log(1-≈∑)]

  Gradient (the tricky part):

  dL/dw = (1/n) * X^T * (≈∑ - y)

  Why it's beautiful:
  - Same form as linear regression!
  - The sigmoid and log cancel out perfectly
  - This is why logistic regression is so elegant

  Derivation (Quick):

  dL/d≈∑ = -(y/≈∑ - (1-y)/(1-≈∑))
  d≈∑/dz = ≈∑(1-≈∑)           [sigmoid derivative]
  dz/dw = X

  Chain rule: dL/dw = dL/d≈∑ * d≈∑/dz * dz/dw
           = -(y/≈∑ - (1-y)/(1-≈∑)) * ≈∑(1-≈∑) * X
           = (≈∑ - y) * X         [after algebra]

  Memorize this result - you don't need to re-derive in interviews!

  ---
  üìù Day 1 Assessment Summary

  Overall Performance:

  Theory: 8/10 ‚úÖ
  - Understand concepts
  - Know the algorithms
  - Slightly rusty on derivations

  Implementation: 6.5/10 ‚ö†Ô∏è
  - Can implement correctly
  - Takes longer than interview speed (need 20-25 min, you're at 1 hour)
  - Small bugs (matrix order, missing average)

  Debugging: 7/10 ‚úÖ
  - Good systematic approach
  - Print statements help
  - Need to catch dimension errors faster

  Your Rust Profile:

  | Area                 | Status   | Priority |
  |----------------------|----------|----------|
  | Numpy operations     | ‚úÖ Good   | Low      |
  | Algorithm structure  | ‚úÖ Good   | Low      |
  | Loss functions       | üü° Rusty | Medium   |
  | Gradient derivations | üî¥ Rusty | HIGH     |
  | Matrix dimensions    | üü° Rusty | Medium   |
  | Vectorization        | ‚úÖ Good   | Low      |

  ---
  üéØ Your Focus Areas This Week

  Based on Day 1 assessment:

  High Priority:

  1. Gradient formulas - Memorize common ones
    - Linear Regression: X.T @ (yP - y) / n
    - Logistic Regression: X.T @ (yP - y) / n (same!)
    - These appear in 80% of ML algorithms
  2. Matrix dimensions - Build intuition fast
    - X: (n, m) - n samples, m features
    - y: (n,) - n labels
    - w: (m,) - m weights
    - X.T @ (yP - y): (m, n) @ (n,) = (m,) ‚úÖ

  Medium Priority:

  3. Loss functions - Remember to average
  4. Numerical stability - Clipping, epsilon tricks

  Low Priority (You're Good):

  5. Code structure
  6. Basic numpy operations

  ---
  üìö Recommended Practice (Rest of Week 1)

  Day 2 (Tomorrow):

  1. Morning (1 hour): Re-implement both algorithms from scratch
    - Goal: 30 minutes each
    - Don't look at your code
    - Focus on getting gradients right first try
  2. Afternoon (1 hour): Review gradient formulas
    - Write them down on paper
    - Understand why they simplify
    - Memorize the patterns

  Days 3-7:

  Continue with Week 1 plan:
  - Fast.AI lessons (at 1.5x speed)
  - Implement K-NN (easier, confidence builder)
  - Start image classification project
  - Maybe implement K-Means if time

  ---
  üí° Interview Context

  In a 45-minute ML coding interview:
  - You'd get 20-25 min for Logistic Regression
  - Need to code + test in that time
  - Your 1 hour ‚Üí need to speed up ~2-3x

  Good news:
  - With 5-10 more algorithm implementations, you'll hit interview speed
  - Your foundation is solid (no conceptual gaps)
  - Just need muscle memory

  ---
  ‚úÖ Day 1 Complete - Well Done!

  What you proved today:
  1. ‚úÖ You can implement ML algorithms from scratch
  2. ‚úÖ Your numpy skills are solid
  3. ‚úÖ You understand the concepts
  4. ‚ö†Ô∏è Need to speed up (but that's just practice!)

  Confidence check:
  - You got both algorithms correct (with minor bugs)
  - You can figure things out when stuck
  - You're asking the right questions (looking up cross-entropy is smart!)

  You're right on track! The assessment did exactly what it should - identified that you need to:
  1. Practice gradient derivations (speed them up)
  2. Build pattern recognition (matrix dimensions)
  3. Implement more algorithms (build muscle memory)

  Tomorrow, try re-implementing these without looking at your code. You'll probably do it in 30-40 minutes each, which will be a confidence boost! üöÄ

  ---
  Would you like me to:
  1. Give you a quick reference sheet of gradient formulas to memorize?
  2. Suggest which algorithm to tackle next (K-NN is a good confidence builder)?
  3. Review your Week 1 schedule based on today's assessment?


# Retro

## what was difficult

* Figure out how to tranfer matrix operations into numpy
* Figure out the right dimensions of each operand
* Taking derivitive when matrices are involved, and transfer that into numpy

## what was forgotten

* `b` is a scalar or a vector
* dimension of `w` and `X`
* Logistic regression loss function
* Compute derivitive of the cross entropy loss took quite some time