# Facets of Mathematics: Regression Modelling

## End of theme assessment

Your completed assessment should consist of

1. This completed Jupyter notebook, including all requested code.

* **In-code comments** should be included to explain the steps of your code.

2. A LaTeX document containing the requested explanations. You can use `RM_eot.tex` as a starting point.

* The final document must be **no longer than three pages**.

* For each question, you should provide **explicit computations** for solving the problems. Whenever appropriate, you should also provide **theoretical justification** for your results, quoting lemmas, theorems, etc from the lectures, tutorial worksheets and/or references. 

* The Latex document should showcase the **ideas** and **methodology** that go into solving the problems. For instance, some questions require repeating the same computations over and over again. In such cases, it may be desireable to describe a single example in the Latex document, rather than spelling out every single calculation. 

* You can get a feel for what is required by considering the markdown text in the tutorial worksheets and solutions.


## Submission

Please upload **both** your final compiled PDF **and** your completed Jupyter notebook, via Learn.

You can download your completed Jupyter notebook from Noteable using `File` $\rightarrow$ `Download as` $\rightarrow$ `Notebook (.ipynb)`.

## Marking

This assessment is marked out of 15. Marks will be based on

**10 marks** Completeness and correctness.
- Questions should be answered fully and correctly.
- The discussion should be complete, relevant, and correct.
- You should correctly describe the methods used, and where appropriate relate the discussion to the theoretic background.
- All code should be clear, correct, and appropriately commented.

**5 marks** Presentation and use of LaTeX.
- LaTeX should be used correctly and appropriately.
- Text should be in grammatical sentences and free of typographical errors.
- Any formulae should be appropriately typeset.

A detailed mark scheme is available on Learn.

---

It may be useful for you to know that python allows you to store figures both as 'png' or 'pdf' files, using one of 
* `plt.savefig("figname.png", dpi=250)`
* `plt.savefig("figname.pdf", dpi=250)`
where the 'dpi=' specifies the resolution. You may want to choose another resolution but be aware that choosing it too large will result in massive files.

---

### Data

The data relevant to the assessment is stored in numpy array files called 'rm_eot_x.npy' (predictors) and 'rm_eot_y.npy' (responses). A description of the data is stored in the file 'README.txt'. 

---

### Question 1 - Data Description

Read the data description in the 'README.txt' file. Plot the response against each of the variables `[Existence of a graduate employment programme at the school, Motivation of teaching staff, Grade average]`. Include appropriate titles, labels, and legends. Add the plots to your LaTeX write-up. In your LaTeX write-up describe the data, referencing your figures where appropriate.

### Question 2 - Regression Analysis
Two regression models are proposed:
* A linear model: $y=\beta_0+\sum_j \beta_jx_j$.
* A logistic model: $y=g^{-1}\left(\beta_0+\sum_j \beta_jx_j\right)$ with $g(p)=\log\left(\frac{p}{1-p}\right)$.

Fit both models to obtain estimates $\hat\beta_{\mathrm{lin}}$ (for the linear model) and $\hat\beta_{\mathrm{log}}$ (for the logistic model). If necessary, you may use the function IRLS in the file 'IRLS.py'. In your write-up, include the respective estimates, and describe how you calculated them.

Create a plot with the data and the fitted values (for both models). Include appropriate titles, labels, and legends. Add the plots to your write-up, adding captions and references where appropriate.

Compute the root mean squared error for both models, and include these in your write-up.

### Question 3 - Model Comparison
Add a brief discussion to your write-up arguing which model is preferable. Give both a qualitative and a quantitative reason.



In [1]:
''' Load necessary '''
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

''' Some commands to make the plots look nice and big. '''
plt.rcParams['figure.figsize'] = (8,5)
plt.rcParams['figure.dpi'] = 120
plt.rcParams['lines.markersize'] = 7
plt.rcParams['lines.linewidth'] = 2

''' Define a function that allows one to quickly calculate the Root Mean Squared Error '''
def RMSE(obs, fit, axis=1):
    return np.sqrt(np.sum((obs-fit)**2)/len(obs))

''' Run the IRLS python file to have access to the Iteratively Reweighted Least Squares algorihtm '''
%run IRLS.py

''' Load the data which is stored in numpy arrays '''
x = np.load('rm_eot_x.npy') 
y = np.load('rm_eot_y.npy')