# Recommendation Engine

### Collaborative Filtering using Matrix Factorization with Implicit Data

This script creates a recommendation engine that gives users travel options based off other user's histroy. This notebook uses mock data, but is carefully modeled to replicate real airline travel data. 

The data we will be working with includes:
- User ID                
- Origin
- Destination
- Origin Destination ID
- Destination Region

Implicit Data is used in this script, which is data gathered by user behavior - in this example the user's previous flights destinations/regions. Explicit Data is more common in reccomendation systems using data based on a type of rating such as a streaming service.

This recommendation engine is still a work in progress so feel free to leave comments and suggestions!

# Import & Manipulate Data

In [1]:
import pandas as pd
import numpy as np
import sys
import random
import string

In [2]:
# Import User Flight Data
# Mock Data Set will be available soon
user_data = pd.read_csv(r"",
                        delimiter = ' *, *', # strips whitespace 
                        engine = 'python')

# Create flight orgin destination 'od' path column
user_data['od'] = user_data.orig + user_data.dest

# Geographical Region DataFrame to join to the recommendations results at the end
region_data = user_data[['od', 'region']].drop_duplicates()

In [3]:
# Create unique ids for users and the od paths 
user_data['od_id'] = user_data.groupby('od').ngroup()
user_data['user_id'] = user_data.groupby('user_id').ngroup()

# The 'counts' DataFrame is aggregated by the user's total trips to a certain geographic region
counts = user_data.groupby(['user_id', 'region'], as_index = False)['dest'].count()
counts = counts.rename(columns = {'dest': 'visits'})
user_data = pd.merge(user_data, counts, on = ['user_id', 'region'], how = 'left')

# User Inputs

In [4]:
origin = input('Where are you flying from: ')
destination = input('Where do you want to go: ')
#user_id = int(input('Enter a User ID to get recommendations: '))
user_id = random.choice(user_data.user_id.unique())
user_data = user_data[user_data.orig == origin]

try:
    reg = user_data[user_data.od == origin + destination]['region'].values[0]
except:
    pass

Where are you flying from: EWR
Where do you want to go: LHR


# Alternating Least Square (ALS) Algorithm

In [5]:
import implicit
import scipy.sparse as sparse

'''
The implicit library expects user_data as a item-user matrix so we create two matricies:
1.) Fitting the Model (item-user) 
2.) Recommendations (user-item)
'''

sparse_item_user = sparse.csr_matrix((user_data['visits'].astype(float), (user_data['od_id'], user_data['user_id'])))
sparse_user_item = sparse.csr_matrix((user_data['visits'].astype(float), (user_data['user_id'], user_data['od_id'])))

In [6]:
# Initialize the ALS model and fit it using the sparse item-user matrix
model = implicit.als.AlternatingLeastSquares(factors = 20, 
                                             regularization = 0.1,
                                             iterations = 10000
                                            )

# Calculate the confidence by multiplying it by the alpha value.
alpha_value = 25
data_confidence = (sparse_item_user * alpha_value).astype('double')

#Fit the model
model.fit(data_confidence)



HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))




#### Compute the n Most Similar Items for the User-Defined Origin Destination Pair ID

In [7]:
item_id = user_data[user_data.dest == destination]['od_id'].values[0]
n_similar = 10                                    # Number of similar items to compute
similar = model.similar_items(item_id, n_similar) # Use implicit to get similar items

for item in similar:                              # Print names of the most similar airports
    i, score = item
    print(user_data.dest.loc[user_data.od_id == i].iloc[0])

LHR
BOG
EYW
TUL
LIR
BCN
CDG
FLL
BNA
PEK


#### Create Recommendations for a Unique User ID

In [8]:
recommended = model.recommend(user_id, sparse_user_item) # Use the implicit recommender
airports = []                                            # Empty list to append data
scores = []

for item in recommended:                                 # Get airport names from ids
    i, score = item
    try:
        airports.append(user_data.dest.loc[user_data.od_id == i].iloc[0])
        scores.append(score.round(3))
    except:
        continue
        
# Create DataFrame with  recommended airports and scores
recommendations = pd.DataFrame({'origin': origin,
                                'destination': airports,
                                'score': scores
                               })

#### Join tables and prepare data - to provide supplement information about the recommendations

In [9]:
# Join Airport Codes to imported .csv file with it's respective city/country
cities = pd.read_csv('city_codes.csv', engine = 'python')

# Join DataFrames and clean up columns 
recs = pd.merge(recommendations, cities, on = 'destination', how = 'left')
recs['od'] = recs.origin + recs.destination
recs = pd.merge(recs, region_data, on = 'od', how = 'left')
#recs = recs[recs.region == reg]
recs = recs[['origin', 'destination', 'region', 'city', 'country', 'score']].reset_index(drop = True)

# Print Results
print('Input: {}{}'.format(origin, destination))
recs

Input: EWRLHR


Unnamed: 0,origin,destination,region,city,country,score
0,EWR,CUN,MBH,Cancun,Mexico,0.585
1,EWR,AVL,49S,Fletcher,United States,0.495
2,EWR,RDU,49S,Raleigh/Durham,United States,0.441
3,EWR,MIA,49S,Miami,United States,0.433
4,EWR,YUL,49S,Montreal,Canada,0.431
5,EWR,BOM,ATL,Mumbai,India,0.384
6,EWR,BTV,49S,Burlington,United States,0.348
7,EWR,TUL,49S,Tulsa,United States,0.337
8,EWR,AMS,ATL,Haarlemmermeer,The Netherlands,0.328
9,EWR,HKG,PAC,Hong Kong,China,0.324
