# Neural Network Classification task - Room occupancy

The goal of this taks is to predict a room occupancy based on Temperature, Humidity, Light and CO2 measurements using neural networks in Keras. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

## Data source
[http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+](http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+)

## Feature description
* **Date** - time stamp in the followign format: year-month-day hour:minute:second 
* **Temperature** - temperature in degrees of Celsius 
* **Relative Humidity** - Relative humidity in % 
* **Light** - light intensity in Lux 
* **CO2** - amount of CO2 in the air, measured in ppm 
* **Humidity Ratio** - Humidity ratio derived from temperature and relative humidity, in kgwater-vapor/kg-air 
* **Occupancy** - a target binary value, 0 for not occupied, 1 for occupied status

In [1]:
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/mlcollege/introduction-to-ml/master/data/occupancy.csv', sep=',')
data.head()

Unnamed: 0,Date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,2015-02-04 17:51:00,23.18,27.272,426.0,721.25,0.004793,1
1,2015-02-04 17:51:59,23.15,27.2675,429.5,714.0,0.004783,1
2,2015-02-04 17:53:00,23.15,27.245,426.0,713.5,0.004779,1
3,2015-02-04 17:54:00,23.15,27.2,426.0,708.25,0.004772,1
4,2015-02-04 17:55:00,23.1,27.2,426.0,704.5,0.004757,1


## Neural Network Classifier
Implement a neural network classifier based on all numerical features.

### Data preparation

In [5]:
from sklearn.model_selection import train_test_split

X_all = data[['Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio']]
y_all = data['Occupancy']

X_train, X_test, y_train, y_test = train_test_split(
    X_all, 
    y_all,
    random_state=1,
    test_size=0.1)

print('Train size: {}'.format(len(X_train)))
print('Test size: {}'.format(len(X_test)))

Train size: 18504
Test size: 2056


Standardize the features

In [6]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Since the target values are binary, we don't need to encode them in one-hot representation.

In [7]:
print(y_test[:5])

16483    0
4625     0
14896    0
213      0
2052     0
Name: Occupancy, dtype: int64


### Training a classifier

Design and train a classification model. Use the [binary crossentropy](https://keras.io/losses/) loss function and Sigmoid output function. Experiment with various architectures and [optimizers](https://keras.io/optimizers/).

In [28]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout

model = Sequential()

model.add(Dense(1, input_shape=(5, )))
model.add(Activation('tanh'))
model.add(Dense(1))
model.add(Activation('sigmoid'))

Compile the model

In [29]:
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Train the model

In [30]:
model.fit(X_train, y_train,
          batch_size = 128, epochs = 10, verbose=1,
          validation_data=(X_test, y_test))

Train on 18504 samples, validate on 2056 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fb308a8db38>

### Evaluate the model

Predict target values and convert probabilities to binary values.

In [31]:
from numpy import int32
y_pred = model.predict(X_test)
y_pred = (y_pred >= 0.5).astype(int32)

Print evaluation metrics

In [32]:
from sklearn import metrics
from sklearn.metrics import accuracy_score

print ("Test accuracy: {:.4f}".format(accuracy_score(y_test, y_pred)))
print ()
print(metrics.classification_report(y_test, y_pred, digits=4))

Test accuracy: 0.9864

             precision    recall  f1-score   support

          0     0.9968    0.9854    0.9910      1570
          1     0.9544    0.9897    0.9717       486

avg / total     0.9868    0.9864    0.9865      2056



In [33]:
y_pred = model.predict(X_train)
y_pred = (y_pred >= 0.5).astype(int32)

print ("Train accuracy: {:.4f}".format(accuracy_score(y_train, y_pred)))
print ()
print(metrics.classification_report(y_train, y_pred, digits=4))

Train accuracy: 0.9885

             precision    recall  f1-score   support

          0     0.9991    0.9859    0.9925     14240
          1     0.9549    0.9972    0.9756      4264

avg / total     0.9889    0.9885    0.9886     18504

