# Avant Ski - Content Based System

by: Stephanie Ciaccia

# Overview

Skiing holds a prominent place for those seeking winter recreational activities in the United States. With its stunning mountain ranges and diverse terrain, the country boasts numerous ski resorts that cater to all skill levels, from beginners to seasoned professionals. Skiing offers a unique blend of adventure, physical activity, and natural beauty, making it a popular choice for winter enthusiasts seeking both relaxation and excitement.

The ski market in the United States is thriving, contributing significantly to the economy. According to the [National Ski Areas Association (NSAA)](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://nsaa.org/webdocs/Media_Public/IndustryStats/Historical_Skier_Days_1979_2022.pdf), approximately 60.7 million skiers and snowboarders visited 473 ski resorts in the 2021-2022 winter season.

# Business Problem 
Skiing is an exhilarating winter activity enjoyed by many, but barriers such as high costs and limited accessibility often hinder people from fully experiencing its joys. Choosing the right ski resort can be overwhelming due to the multitude of options available, and existing websites lack dynamic filtering capabilities based on user preferences.

To address these challenges, I'm developing Avant Ski, a ski resort recommendation app. Avant Ski simplifies the ski resort selection process by leveraging data and user preferences. With dynamic filtering features, users can personalize their search based on budget, location, amenities, and skill level. By bridging the gap between ski enthusiasts and their dream destinations, Avant Ski makes skiing accessible to a wider audience, empowering them to plan unforgettable ski trips with confidence.

# Data Understading

In [1]:
import pandas as pd
import numpy as np
import math
from datetime import datetime
import datetime
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
%matplotlib inline
import plotly
import plotly.express as px
import plotly.io as pio
from matplotlib.ticker import StrMethodFormatter

from collections import Counter
from nltk.corpus import stopwords

from IPython.display import display

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.neighbors import NearestNeighbors

from surprise import SVDpp, SVD
from surprise import accuracy
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import GridSearchCV, cross_validate, train_test_split

import glob
import os

In [2]:
def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')

## Importing Data
### Data Source #1 - Final Feature Data
Importing main ski resort and features dataframe that I scraped and cleaned from OnTheSnow in cleaning notebook.

In [70]:
content_df = pd.read_csv("data/cleaned_data_exports/scraped_feature_df_2.csv")

### Data Source #2 - Final User/Review Data

Importing final cleaned user review data from the cleaning notebook.

In [71]:
final_user_df = pd.read_csv("data/cleaned_data_exports/user_df_model_2.csv")

### Content Modeling

To begin our content modeling, we will need to create a feature matrix that will store all of the feature information. This matrix will allow us to calculate the similarties between item vectors, so we can determine which ski resorts are similar.

In [72]:
content_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 330 entries, 0 to 329
Data columns (total 100 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Unnamed: 0            330 non-null    int64  
 1   ski_resort            330 non-null    object 
 2   address               330 non-null    object 
 3   city                  330 non-null    object 
 4   state                 330 non-null    object 
 5   zipcode               330 non-null    object 
 6   summit                330 non-null    int64  
 7   drop                  330 non-null    int64  
 8   base                  330 non-null    int64  
 9   gondolas_and_trams    330 non-null    float64
 10  fast_eight            330 non-null    float64
 11  high_speed_sixes      330 non-null    float64
 12  quad_chairs           330 non-null    float64
 13  triple_chairs         330 non-null    float64
 14  double_chairs         330 non-null    float64
 15  surface_lifts         

In [73]:
#making a copy of the finaldataframe
content_matrix = content_df.copy()

In [74]:
drop_list = ['full_address_y', 'full_address_x', 'address', 'zipcode', 'Url', 'projectedOpening', 'projectedClosing', 'Unnamed: 0',
            'airport_1', "airport_2", "airport_3", "distance_2", "distance_3", "distance_1", "longitude_y",
            "latitude_x", "latitude_y", "longitude_y", "daysOpenLastYear", 'projectedOpening', 'projectedClosing']

content_matrix.drop(columns=drop_list, inplace=True)

In [75]:
content_matrix.head()

Unnamed: 0,ski_resort,city,state,summit,drop,base,gondolas_and_trams,fast_eight,high_speed_sixes,quad_chairs,...,mar_min_4_guests,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,longitude_x,total_lifts
0,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,3.0,6.0,4.0,1.0,...,210.0,1320.0,463.542857,178.0,1260.0,331.956522,122.0,825.0,-120.235705,36.0
1,Mammoth Mountain,Mammoth Mountain Lakes,California,11053,3100,7953,3.0,9.0,2.0,1.0,...,156.0,589.0,325.411765,126.0,699.0,144.571429,64.0,246.0,-118.972079,25.0
2,Donner Ski Ranch,Norden,California,8012,750,7031,0.0,0.0,0.0,0.0,...,190.0,739.0,349.257143,165.0,996.0,231.272727,83.0,643.0,-120.354182,8.0
3,Sugar Bowl,Norden,California,8383,1500,6883,1.0,5.0,0.0,3.0,...,190.0,739.0,349.257143,165.0,996.0,245.590909,120.0,643.0,-120.354182,12.0
4,Kirkwood,Kirkwood,California,9800,2000,7800,0.0,2.0,0.0,2.0,...,146.0,1179.0,420.290323,150.0,950.0,309.2,114.0,590.0,-120.072244,13.0


In [76]:
content_matrix['state'].unique()

array(['California', 'Nevada', 'Colorado', 'Washington', 'Utah', 'Oregon',
       'Alaska', 'Idaho', 'North Carolina', 'Wyoming', 'Arizona',
       'Minnesota', 'New Mexico', 'Montana', 'Michigan', 'Vermont',
       'Wisconsin', 'New Hampshire', 'New Jersey', 'Massachusetts',
       'Pennsylvania', 'New York', 'Iowa', 'Maine', 'West Virginia',
       'Illinois', 'Ohio', 'Virginia', 'Missouri', 'Connecticut',
       'Tennessee', 'Indiana', 'South Dakota', 'Maryland', 'Rhode Island'],
      dtype=object)

In [77]:
content_matrix['ski_resort']

0               Palisades Tahoe
1              Mammoth Mountain
2              Donner Ski Ranch
3                    Sugar Bowl
4                      Kirkwood
                 ...           
325                Oak Mountain
326                Mt. Pleasant
327                 Hunt Hollow
328    Powder Ridge Connecticut
329            Shawnee Mountain
Name: ski_resort, Length: 330, dtype: object

### One Hot Encoding Categorical Variables

I will be one hot encoding the state column, as this is the only column in the dataframe that is a caterogial values. I would like to keep this in the final model, as the location of a resort often plays an important role in deciding where to ski.

In [78]:
# Instantiating OHE
ohe = OneHotEncoder()

# fit and transforming
ohe_state = pd.DataFrame(ohe.fit_transform(content_matrix[['state']]).toarray())

# renaming based on original names
ohe_state.columns = ohe.get_feature_names(['state'])

In [47]:
ohe_state

Unnamed: 0,state_Alaska,state_Arizona,state_California,state_Colorado,state_Connecticut,state_Idaho,state_Illinois,state_Indiana,state_Iowa,state_Maine,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
325,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
326,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
327,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
328,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [79]:
#setting index
ohe_state = ohe_state.set_index(content_matrix['ski_resort'])

In [80]:
ohe_state

Unnamed: 0_level_0,state_Alaska,state_Arizona,state_California,state_Colorado,state_Connecticut,state_Idaho,state_Illinois,state_Indiana,state_Iowa,state_Maine,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Mammoth Mountain,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Donner Ski Ranch,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Sugar Bowl,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Kirkwood,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Oak Mountain,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Mt. Pleasant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hunt Hollow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Powder Ridge Connecticut,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [81]:
#resetting index as ski_resort
content_matrix = content_matrix.set_index("ski_resort")

In [85]:
#dropping state column
final_content_matrix = content_matrix.drop(columns=["state", "city"])

In [86]:
#filling null matrix values with 0
final_content_matrix = final_content_matrix.fillna(0)

In [87]:
final_content_matrix

Unnamed: 0_level_0,summit,drop,base,gondolas_and_trams,fast_eight,high_speed_sixes,quad_chairs,triple_chairs,double_chairs,surface_lifts,...,mar_min_4_guests,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,longitude_x,total_lifts
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,9050,2850,6200,3.0,6.0,4.0,1.0,12.0,5.0,5.0,...,210.0,1320.0,463.542857,178.0,1260.0,331.956522,122.0,825.0,-120.235705,36.0
Mammoth Mountain,11053,3100,7953,3.0,9.0,2.0,1.0,6.0,4.0,0.0,...,156.0,589.0,325.411765,126.0,699.0,144.571429,64.0,246.0,-118.972079,25.0
Donner Ski Ranch,8012,750,7031,0.0,0.0,0.0,0.0,1.0,5.0,2.0,...,190.0,739.0,349.257143,165.0,996.0,231.272727,83.0,643.0,-120.354182,8.0
Sugar Bowl,8383,1500,6883,1.0,5.0,0.0,3.0,1.0,0.0,2.0,...,190.0,739.0,349.257143,165.0,996.0,245.590909,120.0,643.0,-120.354182,12.0
Kirkwood,9800,2000,7800,0.0,2.0,0.0,2.0,5.0,1.0,3.0,...,146.0,1179.0,420.290323,150.0,950.0,309.200000,114.0,590.0,-120.072244,13.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Oak Mountain,2400,650,1750,0.0,0.0,0.0,1.0,0.0,0.0,3.0,...,100.0,769.0,285.028571,125.0,769.0,219.541667,95.0,460.0,-74.361618,4.0
Mt. Pleasant,1540,340,1200,0.0,0.0,0.0,0.0,1.0,0.0,1.0,...,55.0,350.0,184.942857,55.0,350.0,189.666667,76.0,400.0,-80.097363,2.0
Hunt Hollow,2030,825,1000,0.0,0.0,0.0,0.0,1.0,1.0,1.0,...,125.0,800.0,307.342857,124.0,800.0,242.541667,99.0,444.0,-77.469117,3.0
Powder Ridge Connecticut,720,550,170,0.0,0.0,0.0,0.0,1.0,2.0,3.0,...,67.0,977.0,237.628571,68.0,1199.0,148.208333,73.0,250.0,-72.736408,6.0


### Scaling Data

I will be using StandardScaler to scale the values in the matrix to ensure they are on the same scale. This is necessary to continue modeling.

In [88]:
#instantiating minmaxscaler
scaler = StandardScaler()

#scaling array
scaled = scaler.fit_transform(final_content_matrix)

#saving as dataframe
scaled_ski_df = pd.DataFrame(scaled, index=final_content_matrix.index, columns=final_content_matrix.columns)

In [89]:
scaled_ski_df

Unnamed: 0_level_0,summit,drop,base,gondolas_and_trams,fast_eight,high_speed_sixes,quad_chairs,triple_chairs,double_chairs,surface_lifts,...,mar_min_4_guests,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,longitude_x,total_lifts
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,1.186487,1.710642,0.902438,4.759106,2.238096,5.503542,0.006503,6.302092,1.902088,1.080304,...,2.537409,2.180666,2.334353,1.841660,1.669874,2.007235,1.059825,1.368909,-1.322582,4.654503
Mammoth Mountain,1.722887,1.973881,1.465619,4.759106,3.594102,2.599995,0.006503,2.697776,1.325698,-1.290043,...,1.139202,-0.155575,0.683662,0.415601,0.110550,-1.047057,-1.007546,-0.928538,-1.263832,2.801482
Donner Ski Ranch,0.908512,-0.500562,1.169411,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,1.902088,-0.341905,...,2.019554,0.323818,0.968618,1.485145,0.936075,0.366135,-0.330304,0.646741,-1.328090,-0.062278
Sugar Bowl,1.007865,0.289154,1.121864,1.370746,1.786094,-0.303553,1.437096,-0.305821,-0.979864,-0.341905,...,2.019554,0.323818,0.968618,1.485145,0.936075,0.599515,0.988536,0.646741,-1.328090,0.611548
Kirkwood,1.387336,0.815631,1.416465,-0.323434,0.430087,-0.303553,0.721800,2.097056,-0.403473,0.132165,...,0.880274,1.730037,1.817477,1.073782,0.808216,1.636314,0.774670,0.436439,-1.314982,0.780005
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Oak Mountain,-0.594373,-0.605857,-0.527200,-0.323434,-0.473917,-0.303553,0.006503,-0.906540,-0.979864,0.132165,...,-0.310791,0.419696,0.201076,0.388177,0.305118,0.174924,0.097428,-0.079395,0.810231,-0.736104
Mt. Pleasant,-0.824680,-0.932273,-0.703897,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,-0.979864,-0.815974,...,-1.475964,-0.919408,-0.994966,-1.531517,-0.859511,-0.312025,-0.579814,-0.317473,0.543560,-1.073017
Hunt Hollow,-0.693458,-0.421590,-0.768150,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,-0.403473,-0.815974,...,0.336527,0.518771,0.467735,0.360753,0.391284,0.549814,0.240005,-0.142883,0.665755,-0.904560
Powder Ridge Connecticut,-1.044275,-0.711153,-1.034802,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,0.172917,0.132165,...,-1.165251,1.084454,-0.365363,-1.175002,1.500322,-0.987777,-0.686747,-0.912666,0.885791,-0.399191


In [90]:
#merging scaled_ski_df and one hot encoded dataframes
final_content_df = scaled_ski_df.join(ohe_state)

### Cosine Similarity

I will start the content based modeling using cosine similarty to determine the distance between related ski resorts. 

In [91]:
sim_df = pd.DataFrame(cosine_similarity(final_content_df), index=final_content_df.index, columns=final_content_df.index)

In [24]:
#sim_df.to_csv("data/sim_matrix_2.csv")

In [92]:
sim_df.head()

ski_resort,Palisades Tahoe,Mammoth Mountain,Donner Ski Ranch,Sugar Bowl,Kirkwood,Boreal,Sierra at Tahoe,Mt. Rose Ski Tahoe,Soda Springs,Wolf Creek,...,Elko SnoBowl,Eagle Point,Pine Knob,Whaleback,Little Switzerland,Oak Mountain,Mt. Pleasant,Hunt Hollow,Powder Ridge Connecticut,Shawnee Mountain
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,1.0,0.812728,0.524291,0.723115,0.491642,0.35168,0.578468,-0.03613,0.244984,0.111618,...,-0.575488,-0.080861,-0.310361,-0.630973,-0.258119,-0.283465,-0.751206,-0.280424,-0.3109,-0.520782
Mammoth Mountain,0.812728,1.0,0.438776,0.75145,0.196383,0.415334,0.359269,0.145509,0.253692,0.282254,...,-0.492047,-0.041203,-0.416099,-0.615235,-0.413433,-0.284296,-0.541792,-0.356923,-0.391026,-0.369085
Donner Ski Ranch,0.524291,0.438776,1.0,0.703097,0.289897,0.576089,0.510096,-0.307487,0.681608,-0.140293,...,-0.683752,-0.06983,-0.368183,-0.651586,-0.06108,0.26435,-0.71477,-0.080147,-0.240708,-0.418714
Sugar Bowl,0.723115,0.75145,0.703097,1.0,0.394729,0.602689,0.514203,-0.062353,0.499575,0.179256,...,-0.678966,-0.009697,-0.360782,-0.675635,-0.285352,-0.029496,-0.719183,-0.28137,-0.379911,-0.571552
Kirkwood,0.491642,0.196383,0.289897,0.394729,1.0,0.393319,0.621378,-0.085773,0.249534,0.211833,...,-0.29516,0.244995,-0.237003,-0.475502,-0.005876,-0.129791,-0.578336,-0.155009,-0.141679,-0.573446


In [71]:
final_content_df.to_csv("data/cleaned_data_exports/final_content_df.csv")

In [64]:
sim_df.to_csv("data/cleaned_data_exports/similarity_matrix.csv")

### Function Building

In [28]:
# Input for mountain name
mountain_name = str(input("What is your favorite ski resort? "))

# input to ask user how many recommendations they would like
n_recs = int(input('How many recommendations would you like? '))
    
#what month would you like to travel
travel_date = str(input('What month would you like to travel? '))

What is your favorite ski resort? Park City Mountain
How many recommendations would you like? 3
What month would you like to travel? December


In [29]:
# Pulling out an individual mountain

y = final_content_df.loc[[mountain_name]]
y

Unnamed: 0_level_0,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,tripleChairs,doubleChairs,surfeLifts,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Park City Mountain,1.445217,2.103827,1.092152,6.443543,2.679172,8.394596,5.72241,3.897192,1.35265,1.570315,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
# Using cosine_similarity to return similarity scores
cos_sim = cosine_similarity(final_content_df, y)

# Create a dataframe with similairty scores based on ski resort
cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
cos_sim_df

#sorting in descending order
cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

#saving as new dataframe
cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
cos_sim_df

Unnamed: 0,ski_resort,0
0,Park City Mountain,1.0
1,Heavenly Mountain,0.760417
2,Keystone,0.758374
3,Steamboat,0.692832


In [31]:
#making list for column names
rec_list = []
    
#grabbing rows from content_matrix 
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]

concat_df = concat_df.reset_index()

concat_df

Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs
0,Park City Mountain,Park City,Utah,10026,3226,6800,430,153,8,41,28,23
1,Heavenly Mountain,Stateline,California,10067,3500,7170,225,189,7,60,27,5
2,Keystone,Keystone,Colorado,12408,3128,9280,390,131,16,43,41,0
3,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,12,43,40,5


In [32]:
#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

#for loop that changes the user input to the month appreviation that's in the column names
selected_columns = []
for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]

#resetting index
result = result.reset_index()

result

Unnamed: 0,ski_resort,dec_mean_4_guests,dec_mean_2_guests
0,Park City Mountain,288.151515,156.878788
1,Heavenly Mountain,318.909091,219.636364
2,Keystone,229.142857,172.514286
3,Steamboat,246.771429,189.138889


Testing first part of function by hard coding example user inputs

In [33]:
#Input for book title that returns the 'asin' index number for the book to be used to call dataframe
mountain_name = "Park City Mountain"
    
# input to ask user how many recommendations they would like
n_recs = 3
    
#what month would you like to travel
travel_date = "March"
    
# Pulling out an individual resort
y = final_content_df.loc[[mountain_name]]

# Using cosine_similarity to return similarity scores
cos_sim = cosine_similarity(final_content_df, y)

# Create a dataframe with similairty scores based on ski resort
cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
cos_sim_df

#sorting in descending order
cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

#saving as new dataframe
cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
cos_sim_df

#making list for column names
rec_list = []
    
#grabbing rows from content_matrix 
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]

concat_df = concat_df.reset_index()

#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

#for loop that changes the user input to the month appreviation that's in the column names
selected_columns = []
for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]

#resetting index
result = result.reset_index()

result

Unnamed: 0,ski_resort,mar_mean_4_guests,mar_mean_2_guests
0,Park City Mountain,395.342857,229.060606
1,Heavenly Mountain,364.363636,262.058824
2,Keystone,335.457143,255.194444
3,Steamboat,393.114286,280.055556


Making final function. I will need to add a line of code to drop the row where ski_resort matches the user's input to ensure this is not part of their recommendations.

In [72]:
final_content_df

Unnamed: 0_level_0,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,tripleChairs,doubleChairs,surfeLifts,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49 Degrees North,0.305234,0.654541,0.170126,-0.323977,-0.027422,-0.304056,0.004345,-0.907215,1.352650,-0.812682,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Afton Alps,-0.832604,-0.927553,-0.714607,-0.323977,-0.478521,-0.304056,0.004345,0.293887,7.169931,0.617117,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alpental,0.210325,1.106718,-0.084492,-0.323977,-0.027422,-0.304056,-0.710413,-0.907215,0.770922,-0.812682,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Alpine Valley Ohio,-0.840647,-1.054036,-0.688888,-0.323977,-0.478521,-0.304056,0.004345,0.293887,-0.974262,-0.336082,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alpine Valley Wisconsin,-0.963976,-0.887500,-0.830342,-0.323977,0.874776,-0.304056,-0.710413,1.494989,-0.974262,1.570315,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Woods Valley,-0.867458,-0.769449,-0.804623,-0.323977,-0.478521,-0.304056,-0.710413,-0.907215,0.189194,0.617117,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Yawgoo Valley,-1.158352,-1.038226,-1.071457,-0.323977,-0.478521,-0.304056,-0.710413,-0.907215,0.189194,-0.336082,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Badger Pass,0.848415,-0.664046,1.220747,-0.323977,-0.478521,-0.304056,-0.710413,-0.306664,0.770922,-0.812682,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Shawnee Mountain,-0.880863,-0.558644,-0.884995,-0.323977,-0.027422,-0.304056,0.004345,-0.907215,0.770922,0.617117,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [69]:
content_matrix.columns

Index(['state', 'summit', 'drop', 'base', 'gondolas_and_trams', 'fast_eight',
       'high_speed_sixes', 'quad_chairs', 'triple_chairs', 'double_chairs',
       'surface_lifts', 'total_runs', 'longest_run', 'skiable_terrain',
       'snow_making', 'averageSnowfall', 'nov_snow', 'dec_snow', 'jan_snow',
       'feb_snow', 'mar_snow', 'apr_snow', 'childrenWeekdayPrice',
       'childrenWeekendPrice', 'teenagerWeekdayPrice', 'teenagerWeekendPrice',
       'adultWeekdayPrice', 'adultWeekendPrice', 'seniorWeekdayPrice',
       'seniorWeekendPrice', 'childrenPrice_season', 'teenagerPrice_season',
       'adultPrice_season', 'beginner_runs', 'intermediate_runs',
       'advanced_runs', 'expert_runs', 'night_skiing', 'epic',
       'mountain_collective', 'ikon', 'indy', 'dec_mean_2_guests',
       'dec_min_2_guests', 'dec_max_2_guests', 'jan_mean_2_guests',
       'jan_min_2_guests', 'jan_max_2_guests', 'feb_mean_2_guests',
       'feb_min_2_guests', 'feb_max_2_guests', 'mar_mean_2_guests',
   

In [94]:
# Content-based model
def content_model():
    
    #user inputs
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    
    # Pulling out an individual resort
    y = final_content_df.loc[[mountain_name]]

    # Using cosine_similarity to return similarity scores
    cos_sim = cosine_similarity(final_content_df, y)

    # Create a dataframe with similairty scores based on ski resort
    cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
    cos_sim_df

    # sorting in descending order
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

    #saving as new dataframe
    cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
    cos_sim_df
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix 
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice", "adultWeekendPrice",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 
    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
    #dropping mountain name from the results
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

    #showing final dataframe
    return(final_concat_df.head(n_recs))

In [95]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Breckenridge,Breckenridge,Colorado,12998,3398,9600,149.0,179.0,13,23,36,28.0,266.028571,192.194444
2,Keystone,Keystone,Colorado,12408,3128,9280,195.0,225.0,16,43,41,0.0,229.142857,172.514286
3,Vail,Vail,Colorado,11570,3450,8120,225.0,245.0,23,35,40,2.0,421.485714,260.472222


### Collaborative Model

Importing the final cleaned user/review surprise dataframe from the collaborative model notebook. 

In [96]:
final_user_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2807 entries, 0 to 2806
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   2795 non-null   float64
 1   review_date  2807 non-null   object 
 2   state        2795 non-null   object 
 3   ski_resort   2795 non-null   object 
 4   rating       2795 non-null   float64
 5   review       2795 non-null   object 
 6   user_name    2783 non-null   object 
dtypes: float64(2), object(5)
memory usage: 153.6+ KB


In [97]:
final_user_df.drop(columns="Unnamed: 0", inplace=True)

In [98]:
from surprise import Reader, Dataset
from surprise.model_selection import GridSearchCV, cross_validate, train_test_split

#copying final rewview dataframe
surprise_df = final_user_df.copy()

#dropping unneeded columns
surprise_df = surprise_df[['user_name', 'ski_resort', 'rating']]

# counting the number of reviews for each user
value_counts = surprise_df['user_name'].value_counts()

# selecting only users with more than three reviews
selected_users = value_counts[value_counts > 2].index

# selecting only the rows where the user_name is in the selected_users list
surprise_df = surprise_df[surprise_df['user_name'].isin(selected_users)]

#saving for streamlit app
surprise_df.to_csv("data/cleaned_data_exports/surprise_df.csv")

In [99]:
surprise_df.head()

Unnamed: 0,user_name,ski_resort,rating
0,anon_1,Winter Park,4.0
1,anon_1,Arapahoe Basin,5.0
2,anon_1,Steamboat,5.0
3,anon_1,Copper Mountain,5.0
4,anon_2,Solitude Mountain,5.0


In [100]:
#saving Reader information
reader = Reader(rating_scale=(1, 5))

#loading final dataset
data = Dataset.load_from_df(surprise_df[['user_name', 'ski_resort', 'rating']], reader)

#making trainset
trainset = data.build_full_trainset()

#instantiating model and training
algo = SVDpp(n_factors=140, n_epochs=120, init_mean=.01)
algo.fit(trainset) 

<surprise.prediction_algorithms.matrix_factorization.SVDpp at 0x7fa6513449a0>

In [101]:
#saving new dataframe with only user information
user_df = surprise_df.reset_index()
user_df.set_index('user_name', inplace = True)
user_df.drop(columns = ['rating', 'index'], inplace =True)
user_df.head()

Unnamed: 0_level_0,ski_resort
user_name,Unnamed: 1_level_1
anon_1,Winter Park
anon_1,Arapahoe Basin
anon_1,Steamboat
anon_1,Copper Mountain
anon_2,Solitude Mountain


In [102]:
#looking at number of users
print('Number of users: ', trainset.n_users, '\n')
print('Number of items: ', trainset.n_items)

Number of users:  537 

Number of items:  269


# Collaborative and Content Based Models

### Final Collaborative Model

In [105]:
#Collaborative model
def collaborative_model():
    
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    
    have_rated = list(user_df.loc[user, 'ski_resort'])
    not_rated = content_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]  # & (not_rated['state'] == state)]
    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    not_rated = not_rated[['ski_resort', 'state', 'city', "adultWeekdayPrice", "adultWeekendPrice", 'summit', 'drop',
                           'base','ikon', 'epic','mountain_collective',
                          'advanced_runs',  'intermediate_runs', 'expert_runs', 'predicted_rating']].copy()

    return not_rated.head(n_recs)

In [106]:
collaborative_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adultWeekdayPrice,adultWeekendPrice,summit,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
45,Telluride,Colorado,Telluride,209.0,219.0,13150,4425,8725,0,0,0,21,30,34.0,4.813599
37,Jackson Hole,Wyoming,Teton Village,215.0,215.0,10450,4139,6311,1,0,1,38,41,17.0,4.494822
12,Alta,Utah,Alta,159.0,,11068,2538,8530,1,0,1,0,0,0.0,4.45404


### Final Content Model

In [107]:
# Content-based model
def content_model():
    
    #user inputs
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    
    # Pulling out an individual resort
    y = final_content_df.loc[[mountain_name]]

    # Using cosine_similarity to return similarity scores
    cos_sim = cosine_similarity(final_content_df, y)

    # Create a dataframe with similairty scores based on ski resort
    cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
    cos_sim_df

    # sorting in descending order
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

    #saving as new dataframe
    cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
    cos_sim_df
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix 
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice", "adultWeekendPrice", 
                           'ikon', 'epic','mountain_collective',
                        "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 
    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
    #dropping mountain name from the results
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

    #showing final dataframe
    return(final_concat_df.head(n_recs))

In [108]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Breckenridge,Breckenridge,Colorado,12998,3398,9600,149.0,179.0,0,1,0,13,23,36,28.0,266.028571,192.194444
2,Keystone,Keystone,Colorado,12408,3128,9280,195.0,225.0,0,1,0,16,43,41,0.0,229.142857,172.514286
3,Vail,Vail,Colorado,11570,3450,8120,225.0,245.0,0,1,0,23,35,40,2.0,421.485714,260.472222


## Cascade-Hybrid Model

I will be creating a cascade hybrid model for my final recommendation system. 

Unlike traditional collaborative user-based models commonly used in music and streaming platforms, these models have limitations when applied to the context of ski trip planning, given the higher opportunity cost involved.

The hybrid model begins with a collaborative model, and takes the top 50 resorts and then will refine the final recommendations by using the content based system, which will use the user's input as a guide for the final recommendations. 

In combining the models, there were a few adjustments that neede to be made:
- Since it is not guaranteed that the user's input for their mountain preference will be selected by the collaborative model, I added in the user's mountain as the top recommendation for the collaborative model. I did this my adding in the row to the final output dataframe, and then assigning the predicted rating to 5 to ensure that it appeared at the top of all results.
- I adjusted the content based model to use the final dataframe from the collaborative model. This included the top 50 results.

In [111]:
#Collaborative model
def hybrid_model():
    
#inputs
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))

#making a list of resorts the user has rated
    have_rated = list(user_df.loc[user, 'ski_resort'])
    
#dropping rated list to create new dataframe of unrated resorts
    not_rated = content_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]  # & (not_rated['state'] == state)]

# adding mountain_name row to the dataframe to ensure this is included in final results for content based model
    if mountain_name not in not_rated['ski_resort'].values:
        mountain_row = final_user_df.loc[final_user_df['ski_resort'] == mountain_name].copy()
        mountain_row['predicted_rating'] = algo.predict(user, mountain_name).est
        not_rated = pd.concat([not_rated, mountain_row])

    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    not_rated = not_rated[['ski_resort', 'state', 'city', 'summit', 'drop', 'base', 'intermediate_runs',
                                  'advanced_runs', 'expert_runs', 'predicted_rating', 'ikon', 'epic',
                                  'mountain_collective', "adultWeekdayPrice", "adultWeekendPrice",]].copy()

    not_rated.loc[not_rated['ski_resort'] == mountain_name, 'predicted_rating'] = 5

    not_rated = not_rated.sort_values(by="predicted_rating", ascending=False)
    
#updating dataframe for collaborative model
    
    #saving top 50 ski resorts as list
    not_rated_top_list = not_rated['ski_resort'].head(50)

    #making list for column names
    collab_list = []
    
    #grabbing rows from final_condent_df
    for x in not_rated_top_list:
        try:
            col_df = final_content_df.loc[[x]]
            collab_list.append(col_df)
        except KeyError:
            continue

    collab_df = pd.concat(collab_list)
    
    #grabbing top 20 results from collaborative
    
    y = collab_df.loc[[mountain_name]]

    # Utilize cosine_similarity from to return similarity scores based on cosine distance
    cos_sim = cosine_similarity(collab_df, y)

    # Create a dataframe with similairty scores based on ski resort
    cos_sim_df = pd.DataFrame(data=cos_sim, index=collab_df.index)
    cos_sim_df

    # sorting in descending order
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

    #saving as new dataframe
    cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
    #cos_sim_df = cos_sim_df.rename(columns={0:"sim_score"})
    cos_sim_df
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix for final output
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice", "adultWeekendPrice",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 

    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]
    
    #showing final dataframe
    return final_concat_df.head(n_recs)

In [112]:
hybrid_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,149.0,229.0,0,0,0,0.0,353.65625,284.09375
2,Mammoth Mountain,Mammoth Mountain Lakes,California,11053,3100,7953,149.0,179.0,15,48,24,13.0,260.466667,191.96875
3,Big Sky,Big Sky,Montana,11166,4350,7500,152.0,194.0,15,25,42,18.0,314.657143,208.142857


## Testing Model Results

I will be testing the model results with users whose mountain preferences have been definied:

- Alexandria Kelly - Mountains where the majority of skiiers are there for the sport, and that do not feel overly "bougey". Skies in expert terrain.
- Raghava Kamalesh - Enjoys apres ski, large mountains, expert terrain, and well known mountains.
- Joseph Lewis - Skies locally in the NY area. Is looking to explore more mountains.

#### Alexandria's Results

In [49]:
content_model()

How many resort recommendations do you want? 5
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Heavenly Mountain,Stateline,California,10067,3500,7170,225,189,0,1,0,7,60,27,5,318.909091,219.636364
2,Keystone,Keystone,Colorado,12408,3128,9280,390,131,0,1,0,16,43,41,0,229.142857,172.514286
3,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,1,0,0,12,43,40,5,246.771429,189.138889
4,Breckenridge,Breckenridge,Colorado,12998,3398,9600,55,116,0,1,0,13,23,36,28,266.028571,192.194444
5,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,229,149,1,0,1,0,0,0,0,353.65625,284.09375


In [50]:
collaborative_model()

Name: Alexandria Kelly
How many resort recommendations do you want? 5


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
45,Breckenridge,Colorado,Breckenridge,55,116,12998,3398,9600,0,1,0,36,23,28,4.426055
27,Big Sky,Montana,Big Sky,113,106,11166,4350,7500,1,0,1,42,25,18,4.395938
142,Lutsen Mountains,Minnesota,Lutsen,80,90,1688,825,800,0,0,0,24,58,8,4.379112
170,Mt. Baker,Washington,Bellingham,54,54,5000,1500,3500,0,0,0,0,0,0,4.378622
279,Taos Ski Valley,New Mexico,Taos Ski Valley,150,150,12481,3281,9200,1,0,1,30,16,40,4.318287


In [65]:
hybrid_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 5
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,12,43,40,5,246.771429,189.138889
2,Taos Ski Valley,Taos Ski Valley,New Mexico,12481,3281,9200,150,150,14,16,30,40,235.342857,160.542857
3,Big Sky,Big Sky,Montana,11166,4350,7500,113,106,15,25,42,18,314.657143,208.142857
4,Jackson Hole,Teton Village,Wyoming,10450,4139,6311,140,194,4,41,38,17,436.314286,254.5
5,Copper Mountain,Copper Mountain,Colorado,12313,2738,9712,55,124,25,24,34,17,274.714286,218.305556


#### Raghava's Results

In [52]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Telluride
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Aspen Snowmass,Aspen,Colorado,12510,4406,8104,199,189,1,0,1,0,0,0,0,624.257143,316.828571
2,Solitude Mountain,Brighton,Utah,10488,2494,7994,115,115,1,0,0,6,46,30,18,375.794118,322.657143
3,Jackson Hole,Teton Village,Wyoming,10450,4139,6311,140,194,1,0,1,4,41,38,17,436.314286,254.5


In [53]:
collaborative_model()

Name: Raghava Kamalesh
How many resort recommendations do you want? 5


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
302,Whiteface Mountain,New York,Wilngton,90,90,4650,3430,1220,0,0,0,31,46,0,4.770633
231,Ski Brule,Michigan,Iron River,55,55,1860,500,1360,0,0,0,24,35,6,4.736598
247,Snowbasin,Utah,Huntsville,109,89,9350,2900,6450,1,0,1,52,33,6,4.725923
170,Mt. Bohemia,Michigan,Michigan,55,68,1500,900,600,0,0,0,8,2,90,4.684545
248,Snowbird,Utah,Snowbird,94,110,11000,3240,7760,1,0,1,43,25,24,4.658808


In [54]:
hybrid_model()

Name: Raghava Kamalesh
How many resort recommendations do you want? 5
What's your favorite ski resort? Telluride
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Brighton,Brighton,Utah,10500,1745,8755,53,53,0,0,0,0,375.794118,322.714286
2,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,229,149,0,0,0,0,353.65625,284.09375
3,Bromley Mountain,Manchester Center,Vermont,3284,1334,1950,85,81,30,36,32,2,366.0,295.0
4,Big Sky,Big Sky,Montana,11166,4350,7500,113,106,15,25,42,18,314.657143,208.142857
5,Kirkwood,Kirkwood,California,9800,2000,7800,55,46,0,0,0,0,371.53125,304.967742


### Joseph's Results

In [56]:
collaborative_model()

Name: Joseph Lewis
How many resort recommendations do you want? 5


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
279,Telluride,Colorado,Telluride,179,209,13150,4425,8725,0,0,0,21,30,34,4.651366
128,Killington,Vermont,Killington,101,119,4241,3050,1165,1,0,0,0,0,0,4.584891
189,Nubs Nob,Michigan,Harbor Springs,73,60,1338,427,911,0,0,0,0,0,0,4.582267
168,Mt. Baker,Washington,Bellingham,54,54,5000,1500,3500,0,0,0,0,0,0,4.57911
125,June Mountain,California,June Lake,55,46,10090,2590,7545,1,0,0,28,40,16,4.526083


In [57]:
content_model()

How many resort recommendations do you want? 5
What's your favorite ski resort? Killington
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,1,0,0,12,43,40,5,246.771429,189.138889
2,Mammoth Mountain,Mammoth Lakes,California,11053,3100,7953,179,149,1,0,1,15,48,24,13,260.466667,191.96875
3,Gore Mountain,North Creek,New York,3600,2537,998,85,85,0,0,0,13,45,37,5,245.657143,193.771429
4,Big Sky,Big Sky,Montana,11166,4350,7500,113,106,1,0,1,15,25,42,18,314.657143,208.142857
5,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,229,149,1,0,1,0,0,0,0,353.65625,284.09375


In [59]:
hybrid_model()

Name: Joseph Lewis
How many resort recommendations do you want? 5
What's your favorite ski resort? Killington
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,12,43,40,5,246.771429,189.138889
2,Big Sky,Big Sky,Montana,11166,4350,7500,113,106,15,25,42,18,314.657143,208.142857
3,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,229,149,0,0,0,0,353.65625,284.09375
4,Taos Ski Valley,Taos Ski Valley,New Mexico,12481,3281,9200,150,150,14,16,30,40,235.342857,160.542857
5,Stratton Mountain,Stratton Mountain,Vermont,3875,2003,1872,125,125,0,0,0,0,439.21875,310.823529


# Collaborative Model #2

In this hybrid model, content based filtering is the primary filter and collaborative system is secondary. This places a larger emphasis on user inputs.

In [113]:
def hybrid_model_content():
    # User inputs
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    mtn_pass = str(input('Are you using a multi-resort pass?  '))
    
    # Content-based model
    y = final_content_df.loc[[mountain_name]]
    cos_sim = cosine_similarity(final_content_df, y)
    cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)
    cos_sim_df = cos_sim_df.reset_index()

    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix for final output
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base","adultWeekdayPrice", "adultWeekendPrice",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs",
                        "ikon", "epic", "mountain_collective"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()                        
    content_recommendations = pd.merge(concat_df, result, on="ski_resort")
    
    #adding mountain fil
    if mtn_pass == "Ikon":
        content_recommendations = content_recommendations.loc[content_recommendations['ikon'] == 1]
    elif mtn_pass == "Epic":
        content_recommendations = content_recommendations.loc[content_recommendations['epic'] == 1]
    elif mtn_pass == "Mountain_collective":
        content_recommendations = content_recommendations.loc[content_recommendations['mountain_collective'] == 1]
    elif mtn_pass == "No":
        pass
    
    content_recommendations = content_recommendations[content_recommendations.ski_resort != mountain_name].head(20)

    # Collaborative model
    have_rated = list(user_df.loc[user, 'ski_resort'])
    not_rated = final_user_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]
    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    collaborative_recommendations = not_rated[['ski_resort', 'predicted_rating']]

    # Combine content-based and collaborative recommendations
    combined_recommendations = pd.merge(content_recommendations, collaborative_recommendations, on='ski_resort', how='left')
    combined_recommendations = combined_recommendations.drop_duplicates(subset=['ski_resort'])
    combined_recommendations.sort_values(by='predicted_rating', ascending=False, inplace=True)
    combined_recommendations.drop(columns=['ikon', 'mountain_collective', 'epic'], inplace=True)
    return combined_recommendations.head(n_recs)

In [114]:
hybrid_model_content()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December
Are you using a multi-resort pass?  Ikon


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
8,Jackson Hole,Teton Village,Wyoming,10450,4139,6311,215.0,215.0,4,41,38,17.0,436.314286,254.5,4.494822
7,Taos Ski Valley,Taos Ski Valley Ski Valley,New Mexico,12481,3281,9200,195.0,195.0,14,16,30,40.0,235.342857,160.542857,4.222966
3,Big Sky,Big Sky,Montana,11166,4350,7500,152.0,194.0,15,25,42,18.0,314.657143,208.142857,4.214887
