# Model 1 **Logistic regression using Scikit-learn** 

##  Logistic regression

Logistic regression is named for the function used at the core of the method, the logistic function. In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic Regression is used when response variable is categorical in nature.

### Logistic regression in Scikit-learn architecture:
```
Tolerance for stopping criteria = 0.0001
Maximum number of iterations = 1000

```
Number of inputs = 1024, that correspond to the flatten image.

Number of outputs = 43, thath correspond to each class.
```
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=1000, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)
```


---

### Model Training and Evaluation

For **model 1** we have to put the following comand:
```
python app.py train -m model_1 -d images/train
```
The output will be the accuracy of model 1:
```
Model is logistic regression with scikit-learn
Train accuracy of model 1 is: 85.77405857740585
```
We've been able to reach a maximum accuracy of **85.77%** on the validation set.

---

### Testing the Model using the Test Set

For **model 1** we have to put the following comand:
```
python app.py test -m model_1 -d images/test
```
The output will be the accuracy of model 1:
```
Model is logistic regression with scikit-learn
Test accuraccy of model 1 is: 89.03654485049833
```
We've been able to reach a maximum accuracy of **89.03%** on the test set.

---
## Charge the trained model

The trained model is on **'models\model1\saved'** the name of the model file is: **log_reg_sci(model_1).p **

In [None]:
# Importing Python libraries
import pickle
import numpy as np
import matplotlib.pyplot as plt
import random
import cv2
import csv

### Variables declaration

In [None]:
global X_validation  # images for test model
X_validation = []
global y_validation   # label for validation data
y_validation = []

### Step 1: Load The test set to test the model 1

In [None]:
from sklearn.metrics import accuracy_score

def load_test_data():
    # Open csv of the test folder and save the data to X_test and y_test
    path = '../images/test'
    with open(path+'/test_file.csv') as csvfile:
        readCSV = csv.reader(csvfile, delimiter=';')
        for row in readCSV:          
            im = cv2.imread(path+'/'+str(row[0]),0)
            im = cv2.resize(im, (32, 32))            
            global X_validation
            X_validation.append(im)
            global y_validation
            y_validation.append(int(row[1]))

    X_validation = preprocess(X_validation,"Validation dataset")
    X_validation = np.array(X_validation)         #Convert to numpy array
    y_validation = np.array(y_validation)         #Convert to numpy array  


### Step 2: Preprocess the images

In this step, we will apply several preprocessing steps to the input images to achieve the best possible results.

**We will use the following preprocessing techniques:**
1. Grayscaling.
2. Local Histogram Equalization.
3. Normalization.

**1. Grayscaling**: I use `OpenCV` to convert the training images into grey scale.

**2. Local Histogram Equalization**: Spreads out the most frequent intensity values in an image, resulting in enhancing images with low contrast. 

**3. Normalization**: Normalization is a process that changes the range of pixel intensity values.

In [None]:
import skimage.morphology as morp
from skimage.filters import rank

def local_histo_equalize(image):
    """
    Apply local histogram equalization to grayscale images.
        Parameters:
            image: A grayscale image.
    """    
    kernel = morp.disk(30)
    img_local = rank.equalize(image, selem=kernel)
    return img_local

def image_normalize(image):
    """
    Normalize images to [0, 1] scale.
        Parameters:
            image: An np.array compatible with plt.imshow.
    """
    image = np.divide(image, 255)
    return image

def preprocess(dataset,text):    
    global sample_idx   # For plot porpose
    sample_idx = np.random.randint(len(dataset), size=18)  
    # Local Histogram Equalization 
    equalized_images = list(map(local_histo_equalize, dataset))
    # Normalization 
    normalized_images = list(map(image_normalize, equalized_images))
    return normalized_images

### Testing the model 1 with images from test folder

In [None]:
load_test_data()
global X_validation    
# Convert shape of input (856x32x32) to (856x1024)
nsamples, nx, ny = X_validation.shape
X_validation = X_validation.reshape((nsamples,nx*ny))  #Reshape X_train: 853x32x32 to 853x1024

# load the model from folder model1/saved/
loaded_model = pickle.load(open('../models/model1/saved/log_reg_sci(model_1).p', 'rb'))    
predicted = loaded_model.predict(X_validation)
print("Test accuraccy of model 1 is: "+str(accuracy_score(y_validation, predicted)*100))
