# Predicting with Linear SVR

This notebook predicts the meter reading with Linear SVR. 
  
Please change the paths in the second cell to where you store the csv files  

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVR
import numpy as np # linear algebra

import pandas as pd

### Import data and train the Model

First, we import all the necessary csv files.

In [2]:
train_preprocessed = pd.read_csv("../CSV/train_preprocessed_knn7.csv")
train_target = pd.read_csv("../CSV/train_target_knn7.csv")

test_preprocessed = pd.read_csv("../CSV/test_preprocessed_knn7_with_buildingId.csv")
test_row = pd.read_csv("../CSV/test_row_knn7_with_buildingId.csv")

In [3]:
## Function to reduce the DF size
def reduce_mem_usage(df, verbose=True):
    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    start_mem = df.memory_usage().sum() / 1024**2    
    for col in df.columns:
        col_type = df[col].dtypes
        if col_type in numerics:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)    
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
    return df

In [4]:
## Rducing memory
train_preprocessed = reduce_mem_usage(train_preprocessed)
train_target = reduce_mem_usage(train_target)
test_preprocessed = reduce_mem_usage(test_preprocessed)
test_row = reduce_mem_usage(test_row)

Mem. usage decreased to 578.39 Mb (75.0% reduction)
Mem. usage decreased to 77.12 Mb (50.0% reduction)
Mem. usage decreased to 1192.98 Mb (75.0% reduction)
Mem. usage decreased to 159.06 Mb (50.0% reduction)


Next, we split the train_preprocessed and train_target into test and training sets.  
Then, we train a linear SVR and display their score.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(train_preprocessed, train_target, random_state=0)
LinearSVR = LinearSVR(random_state=0, tol=1e-5)
LinearSVR.fit(X_train, y_train)
print("Score on training set: {:.4f}".format(LinearSVR.score(X_train, y_train)))
print("Score on test set: {:.4f}".format(LinearSVR.score(X_test, y_test)))

  y = column_or_1d(y, warn=True)


Score on training set: -0.0001
Score on test set: -0.0001


In [6]:
#BEST SCORE SO FAR WITH KNN = 7 with RF
# Score on training set: 0.9153
# Score on test set: 0.6368

In [7]:
#BEST SCORE SO FAR WITH KNN = 4  with RF
# Score on training set: 0.9184
# Score on test set: 0.6292

In [8]:
#BEST SCORE SO FAR WITH MEAN IMPUTE
# Score on training set: 0.9179
# Score on test set: 0.6543

### Predict and save prediction to CSV file

Finally, we make a prediction based on the test_final dataset and the previously trained linear SVR.  
The final predictions gets saved as a CSV file

In [9]:
PredictionLinearSVR = LinearSVR.predict(test_preprocessed)
PredictionLinearSVR = pd.DataFrame(PredictionLinearSVR, columns = ["meter_reading"])

PredictionLinearSVRCombined = pd.concat([test_row,PredictionLinearSVR],axis=1)

PredictionLinearSVRCombined.to_csv('../PredictionLinearSVR_knn7_with_buildingID.csv', index = False)
PredictionLinearSVRCombined.head()

Unnamed: 0,row_id,meter_reading
0,0,17.500604
1,1,10.036637
2,2,8.110651
3,3,35.023355
4,4,145.674119
