# LightBot Machine Learning Service

## Overview
Using user action histories, come up with a predicted user action profile for the current day.

## Strategy
Use Logistic Regression or Naive Bayes machine learning classification models to predict, given a specific time interval, whether or not the user will initiate a specific intent.

## Imports

Pymongo (to connect to db)

Numpy, pandas, and scikitlearn for machine learning.

In [1]:
from pymongo import MongoClient
from bson.objectid import ObjectId
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

## Connect to MongoDB and get the user's action History

For now, we are using the dummy demo account action history.

In [2]:
client = MongoClient("mongodb://jandrewtorres:Andr3w135246@lightbot-shard-00-00-m9lrc.mongodb.net:27017,lightbot-shard-00-01-m9lrc.mongodb.net:27017,lightbot-shard-00-02-m9lrc.mongodb.net:27017/test?ssl=true&replicaSet=LightBot-shard-0&authSource=admin")
db = client.test
collection = db.users
doc = collection.find_one({"_id" : ObjectId("59fe6d473664e30004cb25fb")})
actionHistory = doc['actionHistory']

## Create a Pandas DataFrame from the Action History
The action history is a dict.

Lets take a look at the dataframe in the output.

In [3]:
df = pd.DataFrame.from_dict(actionHistory)

In [4]:
df = df.drop(['_id', 'intentColor'], 1)

In [5]:
unique_intents = df['intent'].unique()

In [6]:
df['act_hist_index'] = [i for i in range(len(actionHistory))]

# Group by month
monthDataFrames = []
months = df.groupby(['month'])
monthGroups = months.groups
for group in monthGroups:
    monthDataFrames.append(months.get_group(group))

# Group by unique day
dayDataFrames = []
for monthDataFrame in monthDataFrames:
    days = df.groupby(['dayOfMonth'])
    dayGroups = days.groups
    for group in dayGroups:
        dayDataFrames.append(days.get_group(group))

for dayFrame in dayDataFrames:
    dayFrame = dayFrame.drop(['dayOfMonth', 'dayOfWeek', 'month'], 1)

In [7]:
minutes_in_day = 60 * 24
interval_minutes = 30
num_intervals = int(minutes_in_day / interval_minutes)

def interval_to_string(minutes):
    hour  = int(minutes / 60)
    minutes = int(minutes % 60)
    return str(hour) + ":" + str(minutes)

minute_intervals = [i * interval_minutes for i in range(num_intervals)]
interval_col_names = [interval_to_string(i) for i in minute_intervals]

In [8]:
def getIntervalFromAction(action):
    hour = action['hour']
    minutes = action['minutes']
    interval = (hour * 60) + minutes - (minutes % interval_minutes)
    return interval_to_string(interval)

def generateBlankXProfile():
    generatedProfile = pd.DataFrame()
    # create interval columns
    for interval_col in interval_col_names:
        generatedProfile[interval_col] = 0
        
    # add interval '1' identifier for each row (time interval)
    for i in range(len(interval_col_names)):
        generatedProfile.loc[i] = [0 for n in range(len(generatedProfile.columns))]
        generatedProfile.at[i, interval_col_names[i]] = 1 
    
    return generatedProfile
        
def generateProfileFromActions(dayDataFrame):
    generatedProfile = generateBlankXProfile()
    
    # create action history index map column
    generatedProfile['act_hist_index'] = 0
    
    # create intent columns
    for intent in unique_intents:
        generatedProfile[intent] = 0
    
    # for each action in dayFrame, set intent val to one in generated profile
    for index, row in dayDataFrame.iterrows():
        currAction = actionHistory[row['act_hist_index']]
        interval = getIntervalFromAction(actionHistory[row['act_hist_index']])
        interval_index = interval_col_names.index(interval)
        generatedProfile.at[interval_index, currAction['intent']] = 1
        generatedProfile.at[interval_index, 'act_hist_index'] = row['act_hist_index']
    return generatedProfile
    

generatedDayProfiles = []
for day in dayDataFrames:
    generatedDayProfiles.append(generateProfileFromActions(day))

In [9]:
all_day_profiles = pd.DataFrame(data=None, columns=generatedDayProfiles[0].columns)
for dayProfile in generatedDayProfiles:
    all_day_profiles = all_day_profiles.append(dayProfile)
all_day_profiles = all_day_profiles.reset_index(drop=True)

## Create a Dict of DataFrames by Intent
We need to make predictions based on a single intent. So we must have a different user action dataframe for each type of intent.

In [10]:
all_day_profiles = pd.concat(generatedDayProfiles, axis=0, ignore_index=True)

In [11]:
UserActionsByIntent = {elem : pd.DataFrame for elem in unique_intents}
for intent in unique_intents:
    UserActionsByIntent[intent] = all_day_profiles[interval_col_names + [intent]]

## Inspect our DataFrameByIntent

In [12]:
intentX = {}
intenty = {}
for intent in unique_intents:
    intentX[intent] = UserActionsByIntent[intent][interval_col_names]
    intenty[intent] = UserActionsByIntent[intent][intent]

In [13]:
intentX

{'light_off':      0:0  0:30  1:0  1:30  2:0  2:30  3:0  3:30  4:0  4:30  ...    19:0  \
 0      1     0    0     0    0     0    0     0    0     0  ...       0   
 1      0     1    0     0    0     0    0     0    0     0  ...       0   
 2      0     0    1     0    0     0    0     0    0     0  ...       0   
 3      0     0    0     1    0     0    0     0    0     0  ...       0   
 4      0     0    0     0    1     0    0     0    0     0  ...       0   
 5      0     0    0     0    0     1    0     0    0     0  ...       0   
 6      0     0    0     0    0     0    1     0    0     0  ...       0   
 7      0     0    0     0    0     0    0     1    0     0  ...       0   
 8      0     0    0     0    0     0    0     0    1     0  ...       0   
 9      0     0    0     0    0     0    0     0    0     1  ...       0   
 10     0     0    0     0    0     0    0     0    0     0  ...       0   
 11     0     0    0     0    0     0    0     0    0     0  ...       0   

In [14]:
from sklearn.ensemble import RandomForestClassifier

def getIntentPredictedVals(intentX, intenty, predX):
    Xtrain, Xtest, ytrain, ytest = train_test_split(intentX, intenty, random_state=0)
    model = RandomForestClassifier(class_weight="balanced")
    model.fit(Xtrain, ytrain)
    print(model.score(Xtest, ytest))
    print(ytest)
    predy = model.predict(predX)
    return predy

predictedProfiles = {}
for intent in unique_intents:
    predictedProfiles[intent] = getIntentPredictedVals(intentX[intent], intenty[intent], generateBlankXProfile())

1.0
109    0
71     0
37     1
74     0
108    0
227    0
156    0
220    0
152    0
194    0
76     0
202    0
83     0
157    0
234    0
134    0
184    0
111    0
221    0
8      0
101    0
179    0
89     0
122    0
5      0
22     0
199    0
97     0
12     0
166    0
55     0
44     0
149    0
125    0
144    0
118    0
145    0
170    0
64     0
92     0
154    0
45     0
219    0
18     0
106    0
15     0
104    0
7      0
110    0
239    0
63     0
153    0
233    0
139    0
96     0
33     0
231    0
158    0
116    0
168    0
Name: light_on, dtype: int64
1.0
109    0
71     0
37     0
74     0
108    0
227    0
156    0
220    0
152    0
194    0
76     0
202    0
83     0
157    0
234    0
134    0
184    0
111    0
221    0
8      0
101    0
179    0
89     0
122    0
5      0
22     0
199    0
97     0
12     0
166    0
55     0
44     0
149    0
125    0
144    0
118    0
145    0
170    0
64     0
92     0
154    0
45     0
219    0
18     0
106    0
15     0
104    0


In [15]:
predictedProfiles

{'light_off': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 0]),
 'light_on': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])}

In [40]:
def createPredicted(profile):
    profileDF = pd.DataFrame(index=interval_col_names, data=profile)
    return profileDF
predictedDFs = {intent : createPredicted(predictedProfiles[intent]) for intent in predictedProfiles}
predictedDFs['light_off'] = predictedDFs['light_off'].rename(columns={0: 'predicted_light_off'})
predictedDFs['light_on'] = predictedDFs['light_on'].rename(columns={0: 'predicted_light_on'})
pd.concat([predictedDFs['light_off'][30:45], predictedDFs['light_on'][30:45]], axis=1)

Unnamed: 0,predicted_light_off,predicted_light_on
15:0,0,0
15:30,0,0
16:0,0,0
16:30,0,0
17:0,0,0
17:30,0,0
18:0,0,0
18:30,0,1
19:0,0,0
19:30,1,0


## Conclusions
### What worked:
- It predicted the time intervals well
- We overcome the obstacle of unique intents found in Linear Regression. For example: if a user turns the light off (same intent) twice in a day, this method will work.

### What didn't work:
- When intervals get too small, we get more than one '1' in the predicted column.
- Over time, if the dataset gets too large, we will get funny values. This will only work if the user is on a pretty rigid schedule.

### How it could work:
- If the user doesn't care about the 30 minute intervals (which they probably do). This method is not exact.
- Used in conjunction with other machine learning programs, or a filter program.
- If we only use it for recent user action data, instead of the full history, as to not convalute it.

### The problem