## Linear Regression with Boston Housing Dataset

In [None]:
# import required libraries

##### SOME RULES: 

- Provide some documentation to your code in the form of comments. Your code should be understandable and easy to navigate. 
- Answer the questions in the basic data analysis section by typing in the markdown cell. 
- Ensure that your code runs completely without errors and your model trains over the epochs.


### Basic Data Analysis

#### Tasks: 

- Load dataset as a Pandas Dataset
- Use Pandas functions to perform the following: 
    - 1) Get info about the columns: the number of entries in each column, the datatype of the entries in each column.   
    - 2) Get basic statistical estimators of each column of the data, i.e. mean, standard deviation, min, 25%, 50% and 75% quantiles, max.
    - 3) Plot histograms of all the features (columns) in data. 
    - 4) Plot scatter plots of each feature vs the target ('medv'). 
    - 5) (Optional) Check correlations of each feature with the target. Use the Pearson correlation coefficient for this. 
    - 6) (Optional) Plot a heatmap of these correlations using the 'heatmap' function of the Seaborn library. 

Note: The above tasks can be performed by simply using certain methods of the Pandas library. You are also free to make your own functions for these. But with Pandas, you can do these tasks very easily.

- Answer the following questions in short:
    - 1) How many samples are there? 
    - 2) How many features are there? 
    - 3) Are there any columns with non-numeric entries? 
    - 4) Are there any missing (null) entries in the data?
    - 5) From the basic statistical estimators, answer the follwing: 
        - Are all the features on the same scale, i.e. do they all have the same range, means and standard deviations?
        - Is the 'chas' feature a categorical? 
    - 6) Observe the histograms. Are any of the distributions symmetric? Do any of the distrubtions show outliers? 
    - 7) Observe the scatter plots. Select four features that, based on your observation, show a strong correlation with the target 'medv'. 
    - 8) (Optional) List the correlation coefficients of all the features with the target 'medv' in descending order. 
    - 9) (Optional) From the heatmap, do you observe if any of the features are strongly correlated with each other? If yes, name any two pairs of such correlated features. (Here, we do not check the correlation between the feature and the target, but we check if any of the features are strongly interdependent. Strong correlation means that the absoulute value of the Pearson correlation coefficient is >= 0.5)
    - 10) (Optional) Having interdepent features is a good thing or a bad thing for linear regression? 


Note: Correct answers to the optional tasks and questions shall be awarded bonus points. 

In [None]:
# load dataset. Ensure that the path is correct. 

### Preparing Training and Testing Data

#### Tasks:


   - 1) From the basic data analysis, list the four features that you have considered for the linear regression model. 
   - 2) Extract the data of these four features from the dataset. 
   - 3) Create X (design) matrix.
   - 4) Create y (label/target) vector. 
   - 5) Create a train test split of the X matrix and y vector (80 % training, 20 % testing). Print number of training samples and the number of test samples. 
   - 6) Scale these samples using the MinMaxScaler from the 'scikit learn' library.

### Model Setup and Training

#### Tasks: 

   - 1) Create a random initialization function that initializes the weights randomly. 
   - 2) Create a function that computes loss. Use MSE loss. 
   - 3) Create a function that results the prediction vector by taking the weights and X_train matrix as input arguments. 
   - 4) Create functions for mini-batch gradient descent, stochastic gradient descent and batch gradient descent with regularization.
 
- 5) Train the model.
        - Use the functions created in previous tasks for training the model. 
        - Save the training losses over all epochs. 
        - Plot the change in the losses over epochs.


Note: Choose the learning rate and regularlization parameters suitably. In the loss vs. epochs plot, an overall decrease in the loss over epochs should be observable. 

### Test the Model Performance

#### Tasks: 
    
   - 1) Use the trained weights and the testing data to make predictions. 
   - 2) Calculate the testing loss using these predictions and the test labels. 

### Effect of Variations of Gradient Descent

#### Tasks: 
    
   - 1) Train the model using batch gradient descent. Plot the loss vs. epoch curve. Compute and print the testing loss. 
   - 2) Train the model using stochastic gradient descent. Plot the loss vs. epoch curve. Compute and print the testing loss. 
   - 3) Train the model using mini-batch gradient descent. Plot the loss vs. epoch curve. Compute and print the testing loss.

### Effect of varying regularization lambdas

#### Tasks: 
    
Fix the number of epochs to 1000. Use the following regularization parameter values: [10, 1, 0.1, 0.05, 0.01, 0.005].
   - 1) Train model using batch gradient descent with the regularization parameter values in the above list. 
   - 2) Plot the loss vs. epoch curves for different regularization parameter values in one plot for comparison. 
   - 3) Compute testing losses using the weights obtained after training the model for different regularization parameter values and print them. 
   - 4) Provide the learning rate and regularization parameters of the model that perform's the best. 
   
Note: The best performing model is the one which has the least testing loss. 