# Hand in 1 - 2019 Machine Learning Class
For the first hand in you will implement Logistic Regression and Softmax Regression for classification.
The descriptions below describe what you are meant to do and hand in. 

**Start Early, Use The Study Cafe and the Discussion Board, Check Your Shapes, try and deal with numerial issues**

## Logistic Regression
### Implementing Logistic Regression
In this exercise you must implement logistic regression and test it on the text classification.
We have provided starter code in the file **logistic_regression.py**. 
Here you should complete the following methods to implement a Logistic Regression Classifier. 

* logistic 
* predict 
* score
* cost_grad
* fit

where *predict, score, cost_grad, fit* are class methods of the classifier you must implement.
The interface for each function is described in the file. 

All needed equations can be found in the slides.

You can test your implementation by running *python logistic_regression.py*. 
This is a small non-exhaustive test. You should consider writing your own test cases.  

Cost and gradient computations sometimes suffers from numerical issues if we are not careful. Exponentation of large numbers and log of small numbers can lead to numerical issues. It is possible to implement the algorithm to be more numerically stable if you do not compute numbers you do not need. The algorithm should work well enough even if you are not so carefull and it is not a requirement for passing the hand in to make the ultimate nummerically stable algorithm.

### Applying Logistic Regression 
**Run python logistic_test.py** and see your in sample and test accuracy on the text classification on industri codes (real data)

The code automatically saves the generated plot  to include in your report. With a correct implementation and setting of learning rate, batch_size, epochs **you should get above 95 percent test accuracy.**



### Report
Add a section called "PART I: Logistic Regression" with subsections "Code" and "Theory" to your report. In the code subsection you should have the following subsubsections

* Summary and Results: 
 Include the plot generated by logistic_test and include the in sample and test accuracy you achieve.
 Add at most two lines explaining the plot(s) and comment anything you believe sticks out.
 Explain if anything does not work.
* Actual Code: Include in your handin code snippets **cost_grad** and **fit** (using for instance verbatim enviroment in latex)

Furthermore you must answer the following three theoretical questions

### Theoretical Questions

1. What is the running time of your mini batch gradient descentt algorithm?
  
  The parameters:
  * **n**: number of training samples
  * **d**: dimensionality of training samples
  * **epochs**: number of epochs run
  * **mini_batch_size**: batch_size for mini_batch_gradient_descent
  
  Write both the time to compute the cost and the gradient for log_cost
  You can assume that multiplying an $a \times b$ matrix with a $b \times c$ matrix takes $O(abc)$ time.


2. Sanity Check:

Assume you are using Logistic Regression for classyfing images of cats and dogs.
What happens if we randomly permute the pixels in each image (with the same permutation) before we train the classifier? Will we get a classifier that is better, worse, or the same than if we used the raw data? Give a short explanation (at most three sentences). 
  HINT: The location of pixels relative to each other seem to hold some kind of information. Does a random permutation of all pixels position affect this locality? Does the model we use exploit pixel locality? For inspiration see the visualization of the softmax model applied to digits.

3. Linear Separable Data:

If the data is linearly separable, what happens
to weights when we implement logistic regression with gradient
descent? That is, how do the weights that minimize the negative log likelihood look like?
You may assume that we have full precision (that is, ignore floating point errors) and we can run gradient descent as long as we want (i. e. what happens with the weights in the limit). 

Do they converge to some fixed number (fluctuate around it) or do they
keep increasing in magnitude (absolute value)?

Give a short explanation for your answer. You may include math if it helps (at most 5 lines).

 


## Softmax Regression


### Implementing Softmax
In "*softmax.py*" you must complete the implementation of 

* logistic
* predict
* score
* cost_grad
* fit

The interface for each function is described in the file. All needed equations can be found in the slides and in the softmax notebook.
Note that we have added a helper function **def one_in_k_encoding(vec, k)** that encodes a vector og length $n$ of integer labels with class labels in $0,\dots, k-1$ to a $n \times k$ matrix of one-in-k encoded labels.


You can test your implementation by running "*python softmax.py*". 
This is a small non-exhaustive test. You should consider writing your own test cases.

As for Logistic Regression, softmax sometimes suffers from numerical issues if we are not careful. Exponentation of large numbers and log of small numbers can lead to numerical issues. It is possible to implement the algorithm to be quite numerically stable if you use the trick provided in the note and ensure you do not take log to numbers you do not need to!

### Applying Softmax
- Run **python softmax_test.py -wine** to test your implementation on the wine data set.
The built in python implementation we tested in week one got above 90 percent test accuracy, so your implemenation should so as well.

- Run **python softmax_test.py -show_digits** show a small subset of the data set of MNIST digits, a data set for Optical Character Recognition 
- Run **python softmax_test.py -digits** to run you classifier on MNIST digits - the generated plot is automatically saved
- Run **python softmax_test.py -visualize** to visualize the a classifier trained on MNIST digits - the generated plot is automatically saved

You can tune the epochs, mini_batch_size, and initial learning rate from the command line i.e.

**python softmax_test.py -show_digits -epochs 100 -lr 0.42 -bs 666**

but the provided values should work well enough.

### Report
Add a section "Part II: Softmax" with subsections "code" and "theory" to your report. 
In the "code" subsection **you should do the same 2 points as you did for logistic regression**.

Include the plots generated by softmax_test and remember to include the in sample and test accuracy achieved.

There is a single theory question specified in the next section. 


### Theoretical Question(s):
Assume that you use your softmax implementation on a problem with $K$ classes with n,d, epochs, batch_size defined as for logistic_regression.
* What is the running time of your softmax implementation i.e how long does your implementation of cost_grad take to compute the cost and the gradient.


# Uploading to Black Board
Make a zip archive of the two code files **logistic.py and softmax.py**

Upload one pdf with the report to blackboard together with the zip file.

**Ensure you do upload the pdf separately!**

**Remeber to put your names and student ids inside the pdf report!**

**The PDF should be at the most 5 pages!**