In [None]:
from tensor import Tensor, TensorShape, TensorSpec


@register_passable
struct Tensor1d[dtype: DType, size: Int](Stringable):
    var reg: SIMD[dtype, size]

    fn __init__(inout self):
        self.reg = SIMD[dtype, size]()

    fn __init__(inout self, val: SIMD[dtype, 1]):
        self.reg = SIMD[dtype, size](val)

    fn __init__(inout self, *val: SIMD[dtype, 1]):
        self.reg = SIMD[dtype, size]()
        var idx = 0
        for i in range(len(val)):
            if idx < size:
                self.reg[i] = val[i]
            idx += 1

    fn __init__(inout self, reg: SIMD[dtype, size]):
        self.reg = reg

    fn __copyinit__(inout self, rhs: Self):
        self.reg = rhs.reg

    fn __add__(self, rhs: SIMD[dtype, 1]) -> Self:
        """Scalar add across elements."""
        var multiplier = SIMD[dtype, 1](1.0)
        var new = self.reg.fma(multiplier, rhs)
        return Self(new)

    fn __sub__(self, rhs: SIMD[dtype, 1]) -> Self:
        """Scalar subtract across elements."""
        var multiplier = SIMD[dtype, 1](1.0)
        var nrhs = rhs * -1
        var new = self.reg.fma(multiplier, nrhs)
        return Self(new)

    fn __mul__(self, rhs: SIMD[dtype, 1]) -> Self:
        var accum = SIMD[dtype, 1](0.0)
        var new = self.reg.fma(rhs, accum)
        return Self(new)

    fn __str__(self) -> String:
        return str(self.reg)

## Derivation function

If you recall from calculus, a derivative is really the tangent line at a point on the graph.  The tangent is  
calculated like a secant line of two points on the line, and the line connecting the two points.  But as the distance  
betweem the two points gets smaller and smaller (ie, the delta x or |x1 - x2|) and approaches 0, we get the tangent.  

In [None]:
alias F64Tens = Tensor1d[DType.float64, 4]


fn derive(
    f: fn (F64Tens) -> F64Tens, 
    inp: F64Tens, 
    delta: Float64
) -> F64Tens:
    var f1 = f(inp + delta)
    var f2 = f(inp - delta)
    var num = Tensor1d(f1.reg - f2.reg)
    var derivative = num.reg / ( 2 * delta)
    return Tensor1d(derivative)

## Calculus derivative rules

### Derivative of a log



In [None]:
fn func(data: F64Tens) -> F64Tens:
    """Equivalent of f(c) = 2x**3 + 3x**2.
    
    We are using no loops, just single instructions
    """
    var x_3 = F64Tens(data.reg * data.reg * data.reg)
    var two_x_3 = x_3 * 2
    print("two_x_3 ",two_x_3)
    var x_2 = F64Tens(data.reg * data.reg)
    var three_x_2 = x_2 * 3
    print("three_x_2", three_x_2)
    return F64Tens(two_x_3.reg + three_x_2.reg)

var res = derive(func, F64Tens(1.0, 2.0, 3.0, 4.0), 0.00001)
print(res)

## What Machine Learning learns

In the above example, we knew that the function was 2x^3 + 3x^2.  We could have applied the derivative formula to do  
the calculation.  But, what if we do not know what the function is?  Where the function is how we can calculate the  
dependent variable from the independent variable?  

In traditional programming, we have a machine that contains rules (algorithms) for how to operate on data, and this  
gives us our ansers

```bash
                  +-----------+
                  |           |
----- data ------>| algorithm |----------> answers
                  |           |
                  +-----------+
```

In Machine learning, we don't know what the rules (algorithms) are.  Indeed, that is what we are trying to figure out.  
In Supervised Learning, We have a set of inputs with their corresponding outputs (answers).  what we need to know, is  
**how** they map to each other, so that given a new input, we can plug it into the rules to get a guesstimate.  

In essence, we don't know how we got from the inputs to the outputs, so we can't just create he algoithm directly in  
code.  What machine learning is learning, is how to create that algorithm or mapping.  Initially, the network will have  
weights that probably do not predict the given inputs to the known output very well.  So we calculate a cost based on  
a function that determines how far off the predicted value that the network came up with from the actual known value.  

Then, a process called back propagation is called that adjusts the weights of the layers to minimize the loss.  This  
happens iteratively so that (as long as we dont get stuck in a local minima) ideally, we get better and better  
precitions.

```bash
                       +------------ adjust <-----------------+
                       V                                      |
                  +----+-----+                                |    
                  |          |                                |
----- data ------>|    ?     |-----> prediction ---> loss ----^
                  |          |
                  +----------+
```

### What is the ?

So far, we talked about weights and how they get adjusted to minimize the loss (the predicted value vs the actual  
value) but what exactly is in that black box in the 2nd diagram?

That is what makes up your network (aka layers).  The network is a mathematical model that approximates the unknown  
algorithm.
