<a href="https://colab.research.google.com/github/rahiakela/kaggle-competition-projects/blob/master/meta-kaggle/01-linear-regression/01_linear_regression_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Linear Regression from scratch

In [1]:
import numpy as np
import torch

In [2]:
# Create tensors
x = torch.tensor(3.0)
w = torch.tensor(4.0, requires_grad=True)
b = torch.tensor(5.0, requires_grad=True)

We can combine tensors with the usual arithmetic operations.

In [3]:
# Arithmetic operations
y = w * x + b
print(y)

tensor(17., grad_fn=<AddBackward0>)


What makes PyTorch special, is that we can automatically compute the derivative of `y` w.r.t. the tensors that have `requires_grad` set to `True` i.e. `w` and `b`.

In [5]:
# Compute gradients
y.backward()

In [7]:
# Display gradients
print(f"dy/dw: {w.grad}")
print(f"dy/db: {b.grad}")

dy/dw: 3.0
dy/db: 1.0


##Problem Statement

We'll create a model that predicts crop yeilds for apples and oranges (*target variables*) by looking at the average temperature, rainfall and humidity (*input variables or features*) in a region. Here's the training data:

<img src="https://i.imgur.com/lBguUV9.png" width="500" />

In a **linear regression** model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yeild_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yeild_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of the temperature, rainfall & humidity.

<img src="https://i.imgur.com/mtkR2lB.png" width="540" >


**Our objective**: Find a suitable set of *weights* and *biases* using the training data, to make accurate predictions.

##Training Data

The training data can be represented using 2 matrices (inputs and targets), each with one row per observation and one column per variable.

In [8]:
# Input (temp, rainfall, humidity)
inputs = np.array([
    [73, 67, 43],
    [91, 88, 64],
    [87, 134, 58],
    [102, 43, 37],
    [69, 96, 70]
], dtype="float32")

In [9]:
# Targets (apples, oranges)
targets = np.array([
    [56, 70],
    [81, 101],
    [119, 133],
    [22, 37],
    [103, 119]
], dtype="float32")

Before we build a model, we need to convert inputs and targets to PyTorch tensors.

In [10]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


##Linear Regression Model

The *weights* and *biases* can also be represented as matrices, initialized with random values. The first row of `w` and the first element of `b` are use to predict the first target variable i.e. yield for apples, and similarly the second for oranges.