# Dataset:  occupancy.csv

Source: Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Luis M. Candanedo, VÃ©ronique Feldheim. Energy and Buildings. Volume 112, 15 January 2016, Pages 28-39.

Description: Experimental data used for binary classification (room occupancy) from Temperature,Humidity,Light and CO2. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

Variables/Columns

- Temperature, in Celsius
- Relative Humidity %
- Light in Lux
- CO2 in ppm
- Humidity Ratio, Derived quantity from temperature and relative humidity, in kgwater-vapor/kg-air
- Occupancy 0 or 1 
    - 0 for not occupied
    - 1 for occupied 

In [1]:
# Import required dependencies
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC 

In [2]:
# Import data
file_path = "https://static.bc-edx.com/mbc/ai/m4/datasets/occupancy.csv"
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,23.18,27.272,426.0,721.25,0.004793,1
1,23.15,27.2675,429.5,714.0,0.004783,1
2,23.15,27.245,426.0,713.5,0.004779,1
3,23.15,27.2,426.0,708.25,0.004772,1
4,23.1,27.2,426.0,704.5,0.004757,1


## Split the data into training and testing sets

In [3]:
# Get the target variable (the "Occupancy" column)
y = df['Occupancy']
y

0       1
1       1
2       1
3       1
4       1
       ..
8138    1
8139    1
8140    1
8141    1
8142    1
Name: Occupancy, Length: 8143, dtype: int64

In [4]:
# Get the features (everything except the "Occupancy" column)
X = df.drop(columns="Occupancy")
X

Unnamed: 0,Temperature,Humidity,Light,CO2,HumidityRatio
0,23.18,27.2720,426.0,721.250000,0.004793
1,23.15,27.2675,429.5,714.000000,0.004783
2,23.15,27.2450,426.0,713.500000,0.004779
3,23.15,27.2000,426.0,708.250000,0.004772
4,23.10,27.2000,426.0,704.500000,0.004757
...,...,...,...,...,...
8138,21.05,36.0975,433.0,787.250000,0.005579
8139,21.05,35.9950,433.0,789.500000,0.005563
8140,21.10,36.0950,433.0,798.500000,0.005596
8141,21.10,36.2600,433.0,820.333333,0.005621


In [5]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y)

## Model and Fit to a Support Vector Machine

In [6]:
# Create the support vector machine classifier model with a 'linear' kernel
from sklearn.svm import SVC
svm_classification_model = SVC(kernel='linear')

In [7]:
# Fit the model to the training data

svm_classification_model.fit(X_train, y_train)


In [8]:
# Validate the model by checking the model accuracy with model.score
print(f"Train Accuracy: {svm_classification_model.score(X_train, y_train)}")
print(f"Test Accuracy: {svm_classification_model.score(X_test, y_test)}")

Train Accuracy: 0.9865727853283117
Test Accuracy: 0.9847740667976425


## Predict the Testing Labels

In [9]:
# Make and save testing predictions with the saved SVM model using the testing data
y_prediction = svm_classification_model.predict(X_test)

# Review the predictions
y_prediction

array([1, 0, 0, ..., 1, 0, 1])

## Evaluate the Model

In [10]:
# Display the accuracy score for the testing dataset
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_prediction)

0.9847740667976425