$\newcommand{\xv}{\mathbf{x}}
\newcommand{\Xv}{\mathbf{X}}
\newcommand{\yv}{\mathbf{y}}
\newcommand{\zv}{\mathbf{z}}
\newcommand{\av}{\mathbf{a}}
\newcommand{\Wv}{\mathbf{W}}
\newcommand{\wv}{\mathbf{w}}
\newcommand{\tv}{\mathbf{t}}
\newcommand{\Tv}{\mathbf{T}}
\newcommand{\muv}{\boldsymbol{\mu}}
\newcommand{\sigmav}{\boldsymbol{\sigma}}
\newcommand{\phiv}{\boldsymbol{\phi}}
\newcommand{\Phiv}{\boldsymbol{\Phi}}
\newcommand{\Sigmav}{\boldsymbol{\Sigma}}
\newcommand{\Lambdav}{\boldsymbol{\Lambda}}
\newcommand{\half}{\frac{1}{2}}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}$

# Assignment 1: Linear Regression

*Type your name here and DELETE ALL TEXT PROVIDED HERE THAT ARE INSTRUCTIONS TO YOU*

## Overview

Describe the objective of this assignment, and very briefly how you accomplish it.  Say things like "linear model", "samples of inputs and known desired outputs" and "minimize the sum of squared errors". DELETE THIS TEXT AND INSERT YOUR OWN.

## Method

Define in code cells the following functions as discussed in class.  Your functions' arguments and return types must be as shown here.

  * ```model = train(X, T)```
  * ```predict = use(model, X)```
  * ```error = rmse(predict, T)```
  
Let ```X``` be a two-dimensional matrix (```np.array```) with each row containing one data sample, and ```T``` be a two-dimensional matrix of one column containing the target values for each sample in ```X```.  So, ```X.shape[0]``` is equal to ```T.shape[0]```.   

Function ```train``` must standardize the input data in ```X``` and return a dictionary with  keys named ```means```, ```stds```, and ```w```.  

Function ```use``` must also standardize its input data X by using the means and standard deviations in the dictionary returned by ```train```.

Function ```rmse``` returns the square root of the mean of the squared error between ```predict``` and ```T```.

Also implement the function

   * ```model = trainSGD(X, T, learningRate, numberOfIterations)```

which performs the incremental training process described in class as stochastic gradient descent (SGC).  The result of this function is a dictionary with the same keys as the dictionary returned by the above ```train``` function.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [26]:
def train(X, T):
    #Code taken from Professor Anderson's notebook
    means = X.mean(0)
    stds = X.std(0)
    Xs = (X - means) / stds
    Xs1 = np.insert(Xs, 0, 1, 1)               
    w = np.linalg.lstsq( Xs1.T @ Xs1, Xs1.T @ T)[0]
    
    #build return dictionary
    ret = {"means" : means,
           "stds" : stds,
           "w" : w
          }
    return ret

In [75]:
def use(model, X):
    #Code taken from Professor Anderson's notebook
    newX = (X - model["means"]) / model["stds"]
    
    #insert columns into array
    newX = np.insert(newX, 0, 1, 1)
    
    #make prediction
    prediction = newX @ model["w"]
    
    return prediction

In [154]:
def rmse(predict, T):
    #find average of the difference, square it and then square root
    return np.sqrt(np.mean((predict -  T)**2))
    

In [214]:
def trainSGD(X, T, learningRate, numberOfIterations):
    
    X1 = np.insert(X, 0, 1, axis=1)
    w = np.zeros((2,1))

    # Collect the weights after each update in a list for later plotting. 
    # This is not part of the training algorithm
    ws = [w.copy()]
    
    print(X.size)
    
    #newX = np.insert(X, 0, 1, axis=1)
    step = 0
    for iter in range(numberOfIterations):
        for n in range(0,X1.size):
        
            step += 1
        
            predicted = X1[n:n+1,:] @ w  # n:n+1 is used instead of n to preserve the 2-dimensional matrix structure
            # Update w using negative derivative of error for nth sample
            w += learningRate * X1[n:n+1, :].T * (T[n:n+1, :] - predicted)
            ws.append(w.copy())
                        
    return w

## Examples

In [215]:
import numpy as np

X = np.arange(10).reshape((5,2))
T = X[:,0:1] + 2 * X[:,1:2] + np.random.uniform(-1, 1,(5, 1))
print('Inputs')
print(X)
print('Targets')
print(T)

Inputs
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
Targets
[[  2.46961891]
 [  7.44551751]
 [ 14.71266202]
 [ 19.91872468]
 [ 26.03373184]]


In [216]:
model = train(X, T)
model

{'means': array([ 4.,  5.]),
 'stds': array([ 2.82842712,  2.82842712]),
 'w': array([[ 14.11605099],
        [  4.21445775],
        [  4.21445775]])}

In [217]:
predicted = use(model, X)
predicted

array([[  2.19576439],
       [  8.15590769],
       [ 14.11605099],
       [ 20.07619429],
       [ 26.0363376 ]])

In [218]:
rmse(predicted, T)

0.43826902478089091

In [219]:
modelSGD = trainSGD(X, T, 0.01, 100)
modelSGD

10


ValueError: shapes (1,3) and (2,1) not aligned: 3 (dim 1) != 2 (dim 0)

In [8]:
predicted = use(modelSGD, X)
predicted

array([[  1.28111839],
       [  7.6025443 ],
       [ 13.92397021],
       [ 20.24539612],
       [ 26.56682203]])

In [9]:
rmse(predicted, T)

0.44179878702587055

## Data

Download ```energydata_complete.csv``` from the [Appliances energy prediction Data Set ](https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction) at the UCI Machine Learning Repository. Ignore the first column (date and time), use the next two columns as target variables, and use all but the last two columns (named rv1 and rv2) as input variables. 

In this section include a summary of this data, including the number of samples, the number and kinds of input variables, and the number and kinds of target variables.  Also mention who recorded the data and how.  Some of this information can be found in the paper that is linked to at the UCI site for this data set.  Also show some plots of target variables versus some of the input variables to investigate whether or not linear relationships might exist.  Discuss your observations of these plots.

In [220]:
file = np.genfromtxt('energydata_complete.csv', dtype='str',delimiter=',',deletechars='"')
file = np.char.replace(file, '"', '')
file = np.char.replace(file, ' ', '')
file = np.delete(file, 0,1)
file = np.delete(file, -1,1)
file = np.delete(file, -1,1)
names = file[0]
names = names.astype(np.str)
data = file[1:]
data = data.astype(np.float)

Tenergy = np.take(data,[0,1],1)
Xenergy = np.take(data,range(2,26),1)
Tnames = np.take(names,[0,1])
Xnames = np.take(names,range(2,26))

## Results

Apply your functions to the data.  Compare the error you get as a result of both training functions.  Experiment with different learning rates for ```trainSGD``` and discuss the errors.

Make some plots of the predicted energy uses and the actual energy uses versus the sample index.  Also plot predicted energy use versus actual energy use.  Show the above plots for the appliances energy use and repeat them for the lights energy use. Discuss your observations of each graph.

Show the values of the resulting weights and discuss which ones might be least relevant for fitting your linear model.  Remove them, fit the linear model again, plot the results, and discuss what you see.

In [226]:
trained = train(Xenergy,Tenergy)
#trained
used = use(trained, Xenergy)
used
rmsed = rmse(used,Tenergy)
rmsed

67.160009853101315

## Grading

Your notebook will be run and graded automatically.  Test this grading process by first downloading [A1grader.tar](http://www.cs.colostate.edu/~anderson/cs445/notebooks/A1grader.tar) and extract `A1grader.py` from it. Run the code in the following cell to demonstrate an example grading session.  You should see a perfect execution score of 70/70 if your functions are defined correctly. The remaining 30 points will be based on the results you obtain from the energy data and on your discussions.

For the grading script to run correctly, you must first name this notebook as 'Lastname-A1.ipynb' with 'Lastname' being your last name, and then save this notebook.

A different, but similar, grading script will be used to grade your checked-in notebook.  It will include additional tests.  You need not include code to test that the values passed in to your functions are the correct form.  

In [162]:
%run -i "A1grader.py"



Extracting python code from notebook and storing in notebookcode.py
Removing all statements that are not function or class defs or import statements.
Testing rmse(A, B) with
 A =
[[1 2 3]
 [4 5 6]]
 and B =
[[2 3 4]
 [5 6 7]]

--- 10/10 points. Correctly returned 1.0

Testing model = train(X, T) with
 X=
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [18 20 22]
 [24 26 28]]
 and T=
[[  0.2   1.2]
 [  5.    6. ]
 [ 11.6  12.6]
 [ 62.   64. ]
 [ 96.8  98.8]]

--- 5/5 points. Model correctly includes a key named 'means'.

--- 5/5 points. Model correctly includes a key named 'stds'.

--- 5/5 points. Model correctly includes a key named 'w'.

Testing rmse(T, use(model, X))

--- 15/15 points. Error is correctly calculated as 5.2427092232928585.

Testing model = trainSGD(X, T, 0.01, 1000) with
 X=
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [18 20 22]
 [24 26 28]]
 and T=
[[  0.2   1.2]
 [  5.    6. ]
 [ 11.6  12.6]
 [ 62.   64. ]
 [ 96.8  98.8]]

--- 0/15 points. trainSGD raised the exception
 'float' obje

## Check-in

Do not include this section in your notebook.

Name your notebook ```Lastname-A1.ipynb```.  So, for me it would be ```Anderson-A1.ipynb```.  Submit the file using the ```Assignment 1``` link on [Canvas](https://colostate.instructure.com/courses/41327).

Grading will be based on 

  * correct behavior of the required functions listed above,
  * easy to understand plots in your notebook,
  * readability of the notebook,
  * effort in making interesting observations, and in formatting your notebook.

## Extra Credit

Download a second data set and repeat all of the steps of this assignment on that data set.