# Avant Ski - Content Based System

by: Stephanie Ciaccia

# Overview

Skiing holds a prominent place for those seeking winter recreational activities in the United States. With its stunning mountain ranges and diverse terrain, the country boasts numerous ski resorts that cater to all skill levels, from beginners to seasoned professionals. Skiing offers a unique blend of adventure, physical activity, and natural beauty, making it a popular choice for winter enthusiasts seeking both relaxation and excitement.

The ski market in the United States is thriving, contributing significantly to the economy. According to the [National Ski Areas Association (NSAA)](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://nsaa.org/webdocs/Media_Public/IndustryStats/Historical_Skier_Days_1979_2022.pdf), approximately 60.7 million skiers and snowboarders visited 473 ski resorts in the 2021-2022 winter season.

# Business Problem 
Skiing is an exhilarating winter activity enjoyed by many, but barriers such as high costs and limited accessibility often hinder people from fully experiencing its joys. Choosing the right ski resort can be overwhelming due to the multitude of options available, and existing websites lack dynamic filtering capabilities based on user preferences.

To address these challenges, I'm developing Avant Ski, a ski resort recommendation app. Avant Ski simplifies the ski resort selection process by leveraging data and user preferences. With dynamic filtering features, users can personalize their search based on budget, location, amenities, and skill level. By bridging the gap between ski enthusiasts and their dream destinations, Avant Ski makes skiing accessible to a wider audience, empowering them to plan unforgettable ski trips with confidence.

# Data Understading

In [1]:
import pandas as pd
import numpy as np
import math
from datetime import datetime
import datetime
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
%matplotlib inline
import plotly
import plotly.express as px
import plotly.io as pio
from matplotlib.ticker import StrMethodFormatter

from collections import Counter
from nltk.corpus import stopwords

from IPython.display import display

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.neighbors import NearestNeighbors

from surprise import SVDpp, SVD
from surprise import accuracy
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import GridSearchCV, cross_validate, train_test_split

import glob
import os

In [2]:
def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')

## Importing Data
### Data Source #1 - Final Feature Data
Importing main ski resort and features dataframe that I scraped and cleaned from OnTheSnow in cleaning notebook.

In [77]:
content_df = pd.read_csv("data/cleaned_data_exports/scraped_feature_df_3.csv")

In [78]:
content_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 330 entries, 0 to 329
Columns: 103 entries, Unnamed: 0 to total_lifts
dtypes: float64(20), int64(70), object(13)
memory usage: 265.7+ KB


### Data Source #2 - Final User/Review Data

Importing final cleaned user review data from the cleaning notebook.

In [71]:
final_user_df = pd.read_csv("data/cleaned_data_exports/user_df_model_2.csv")

### Content Modeling

To begin our content modeling, we will need to create a feature matrix that will store all of the feature information. This matrix will allow us to calculate the similarties between item vectors, so we can determine which ski resorts are similar.

In [81]:
content_df.columns.to_list()

['Unnamed: 0',
 'ski_resort',
 'address',
 'city',
 'state',
 'zipcode',
 'summit',
 'drop',
 'base',
 'gondolas_and_trams',
 'fast_eight',
 'high_speed_sixes',
 'quad_chairs',
 'triple_chairs',
 'double_chairs',
 'surface_lifts',
 'total_runs',
 'longest_run',
 'skiable_terrain',
 'snow_making',
 'daysOpenLastYear',
 'averageSnowfall',
 'projectedOpening',
 'projectedClosing',
 'nov_snow',
 'dec_snow',
 'jan_snow',
 'feb_snow',
 'mar_snow',
 'apr_snow',
 'childrenWeekdayPrice',
 'childrenWeekendPrice',
 'teenagerWeekdayPrice',
 'teenagerWeekendPrice',
 'adultWeekdayPrice',
 'adultWeekendPrice',
 'seniorWeekdayPrice',
 'seniorWeekendPrice',
 'childrenPrice_season',
 'teenagerPrice_season',
 'adultPrice_season',
 'Url',
 'beginner_runs',
 'intermediate_runs',
 'advanced_runs',
 'expert_runs',
 'night_skiing',
 'epic',
 'mountain_collective',
 'ikon',
 'indy',
 'dec_mean_2_guests',
 'dec_min_2_guests',
 'dec_max_2_guests',
 'jan_mean_2_guests',
 'jan_min_2_guests',
 'jan_max_2_guests',
 

In [95]:
#making a copy of the finaldataframe
content_matrix = content_df.copy()

In [96]:
drop_list = ['address', 'zipcode', 'Url', 'projectedOpening', 'projectedClosing', 'Unnamed: 0',
             "daysOpenLastYear", 'projectedOpening', 'projectedClosing',
             'full_address','airport_1','distance_1','lat', 'long',
             'airport_2','distance_2','lat_2','long_2','airport_3','distance_3','lat_3','long_3']

content_matrix.drop(columns=drop_list, inplace=True)

In [97]:
content_matrix.head()

Unnamed: 0,ski_resort,city,state,summit,drop,base,gondolas_and_trams,fast_eight,high_speed_sixes,quad_chairs,...,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,latitude,longitude,total_lifts
0,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,3,6,4,1,...,1320,463,178,1260,331,122,825,39.19698,-120.235705,36
1,Mammoth Mountain,Mammoth Mountain Lakes,California,11053,3100,7953,3,9,2,1,...,589,325,126,699,144,64,246,37.648546,-118.972079,25
2,Donner Ski Ranch,Norden,California,8012,750,7031,0,0,0,0,...,739,349,165,996,231,83,643,39.317356,-120.354182,8
3,Sugar Bowl,Norden,California,8383,1500,6883,1,5,0,3,...,739,349,165,996,245,120,643,39.317356,-120.354182,12
4,Kirkwood,Kirkwood,California,9800,2000,7800,0,2,0,2,...,1179,420,150,950,309,114,590,38.702308,-120.072244,13


### One Hot Encoding Categorical Variables

I will be one hot encoding the state column, as this is the only column in the dataframe that is a caterogial values. I would like to keep this in the final model, as the location of a resort often plays an important role in deciding where to ski.

In [98]:
# Instantiating OHE
ohe = OneHotEncoder()

# fit and transforming
ohe_state = pd.DataFrame(ohe.fit_transform(content_matrix[['state']]).toarray())

# renaming based on original names
ohe_state.columns = ohe.get_feature_names(['state'])

In [99]:
ohe_state

Unnamed: 0,state_Alaska,state_Arizona,state_California,state_Colorado,state_Connecticut,state_Idaho,state_Illinois,state_Indiana,state_Iowa,state_Maine,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
325,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
326,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
327,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
328,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [100]:
#setting index
ohe_state = ohe_state.set_index(content_matrix['ski_resort'])

In [101]:
ohe_state

Unnamed: 0_level_0,state_Alaska,state_Arizona,state_California,state_Colorado,state_Connecticut,state_Idaho,state_Illinois,state_Indiana,state_Iowa,state_Maine,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Mammoth Mountain,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Donner Ski Ranch,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Sugar Bowl,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Kirkwood,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Oak Mountain,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Mt. Pleasant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hunt Hollow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Powder Ridge Connecticut,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [102]:
#resetting index as ski_resort
content_matrix = content_matrix.set_index("ski_resort")

In [103]:
#dropping state column
final_content_matrix = content_matrix.drop(columns=["state", "city", 'latitude','longitude'])

In [104]:
#filling null matrix values with 0
final_content_matrix = final_content_matrix.fillna(0)

In [105]:
final_content_matrix.info()

<class 'pandas.core.frame.DataFrame'>
Index: 330 entries, Palisades Tahoe to Shawnee Mountain
Data columns (total 78 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   summit                330 non-null    int64  
 1   drop                  330 non-null    int64  
 2   base                  330 non-null    int64  
 3   gondolas_and_trams    330 non-null    int64  
 4   fast_eight            330 non-null    int64  
 5   high_speed_sixes      330 non-null    int64  
 6   quad_chairs           330 non-null    int64  
 7   triple_chairs         330 non-null    int64  
 8   double_chairs         330 non-null    int64  
 9   surface_lifts         330 non-null    int64  
 10  total_runs            330 non-null    int64  
 11  longest_run           330 non-null    int64  
 12  skiable_terrain       330 non-null    int64  
 13  snow_making           330 non-null    int64  
 14  averageSnowfall       330 non-null    int64  
 15  n

### Scaling Data

I will be using StandardScaler to scale the values in the matrix to ensure they are on the same scale. This is necessary to continue modeling.

In [106]:
#instantiating minmaxscaler
scaler = StandardScaler()

#scaling array
scaled = scaler.fit_transform(final_content_matrix)

#saving as dataframe
scaled_ski_df = pd.DataFrame(scaled, index=final_content_matrix.index, columns=final_content_matrix.columns)

In [107]:
scaled_ski_df

Unnamed: 0_level_0,summit,drop,base,gondolas_and_trams,fast_eight,high_speed_sixes,quad_chairs,triple_chairs,double_chairs,surface_lifts,...,mar_mean_4_guests,mar_min_4_guests,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,total_lifts
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,1.186487,1.710642,0.902438,4.759106,2.238096,5.503542,0.006503,6.302092,1.902088,1.080304,...,2.541470,2.537409,2.180666,2.334115,1.841660,1.669874,2.000277,1.059825,1.368909,4.654503
Mammoth Mountain,1.722887,1.973881,1.465619,4.759106,3.594102,2.599995,0.006503,2.697776,1.325698,-1.290043,...,0.792226,1.139202,-0.155575,0.684433,0.415601,0.110550,-1.049510,-1.007546,-0.928538,2.801482
Donner Ski Ranch,0.908512,-0.500562,1.169411,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,1.902088,-0.341905,...,0.648301,2.019554,0.323818,0.971334,1.485145,0.936075,0.369375,-0.330304,0.646741,-0.062278
Sugar Bowl,1.007865,0.289154,1.121864,1.370746,1.786094,-0.303553,1.437096,-0.305821,-0.979864,-0.341905,...,0.902937,2.019554,0.323818,0.971334,1.485145,0.936075,0.597701,0.988536,0.646741,0.611548
Kirkwood,1.387336,0.815631,1.416465,-0.323434,0.430087,-0.303553,0.721800,2.097056,-0.403473,0.132165,...,1.733275,0.880274,1.730037,1.820084,1.073782,0.808216,1.641478,0.774670,0.436439,0.780005
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Oak Mountain,-0.594373,-0.605857,-0.527200,-0.323434,-0.473917,-0.303553,0.006503,-0.906540,-0.979864,0.132165,...,0.028315,-0.310791,0.419696,0.206265,0.388177,0.305118,0.173666,0.097428,-0.079395,-0.736104
Mt. Pleasant,-0.824680,-0.932273,-0.703897,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,-0.979864,-0.815974,...,-1.012374,-1.475964,-0.919408,-1.001111,-1.531517,-0.859511,-0.315604,-0.579814,-0.317473,-1.073017
Hunt Hollow,-0.693458,-0.421590,-0.768150,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,-0.403473,-0.815974,...,0.515447,0.336527,0.518771,0.469257,0.360753,0.391284,0.548774,0.240005,-0.142883,-0.904560
Powder Ridge Connecticut,-1.044275,-0.711153,-1.034802,-0.323434,-0.473917,-0.303553,-0.708794,-0.305821,0.172917,0.132165,...,-0.469887,-1.165251,1.084454,-0.367538,-1.175002,1.500322,-0.984274,-0.686747,-0.912666,-0.399191


In [108]:
#merging scaled_ski_df and one hot encoded dataframes
final_content_df = scaled_ski_df.join(ohe_state)

In [149]:
final_content_df.colu

<class 'pandas.core.frame.DataFrame'>
Index: 330 entries, Palisades Tahoe to Shawnee Mountain
Columns: 113 entries, summit to state_Wyoming
dtypes: float64(113)
memory usage: 303.9+ KB


### Cosine Similarity

I will start the content based modeling using cosine similarty to determine the distance between related ski resorts. 

In [109]:
sim_df = pd.DataFrame(cosine_similarity(final_content_df), index=final_content_df.index, columns=final_content_df.index)

In [110]:
sim_df.to_csv("data/sim_matrix_2.csv")

In [125]:
sim_df.head()

ski_resort,Palisades Tahoe,Mammoth Mountain,Donner Ski Ranch,Sugar Bowl,Kirkwood,Boreal,Sierra at Tahoe,Mt. Rose Ski Tahoe,Soda Springs,Wolf Creek,...,Elko SnoBowl,Eagle Point,Pine Knob,Whaleback,Little Switzerland,Oak Mountain,Mt. Pleasant,Hunt Hollow,Powder Ridge Connecticut,Shawnee Mountain
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Palisades Tahoe,1.0,0.812026,0.521999,0.722072,0.488924,0.347305,0.576504,-0.045906,0.238735,0.107452,...,-0.585955,-0.092606,-0.308238,-0.630418,-0.257438,-0.277569,-0.75017,-0.276447,-0.30644,-0.518049
Mammoth Mountain,0.812026,1.0,0.432824,0.749503,0.189126,0.408623,0.353622,0.13479,0.243831,0.277572,...,-0.504907,-0.056583,-0.413794,-0.611706,-0.41305,-0.275003,-0.538788,-0.351374,-0.385144,-0.362633
Donner Ski Ranch,0.521999,0.432824,1.0,0.698997,0.279095,0.566718,0.503167,-0.336018,0.674215,-0.153057,...,-0.708967,-0.096141,-0.365107,-0.644028,-0.058645,0.294224,-0.713626,-0.06652,-0.228173,-0.408549
Sugar Bowl,0.722072,0.749503,0.698997,1.0,0.387612,0.59692,0.508803,-0.079504,0.491566,0.172657,...,-0.697247,-0.028167,-0.357809,-0.671171,-0.284763,-0.012591,-0.717228,-0.273639,-0.37215,-0.565831
Kirkwood,0.488924,0.189126,0.279095,0.387612,1.0,0.383553,0.617336,-0.10382,0.236078,0.20523,...,-0.30993,0.23138,-0.233022,-0.46737,-0.003632,-0.114437,-0.575676,-0.145585,-0.130862,-0.567742


In [112]:
final_content_df.to_csv("data/cleaned_data_exports/final_content_df.csv")

In [113]:
sim_df.to_csv("data/cleaned_data_exports/similarity_matrix.csv")

### Function Building

In [28]:
# Input for mountain name
mountain_name = str(input("What is your favorite ski resort? "))

# input to ask user how many recommendations they would like
n_recs = int(input('How many recommendations would you like? '))
    
#what month would you like to travel
travel_date = str(input('What month would you like to travel? '))

What is your favorite ski resort? Park City Mountain
How many recommendations would you like? 3
What month would you like to travel? December


In [144]:
# Pulling out an individual mountain

y = sim_df.loc[[mountain_name]].T
cos_sim_df = y.reset_index().sort_values(by=mountain_name, ascending=False).head(n_recs + 1)

In [145]:
cos_sim_df

ski_resort,ski_resort.1,Park City Mountain
35,Park City Mountain,1.0
53,Breckenridge,0.749671
86,Keystone,0.730817
77,Vail,0.721524


In [151]:
#making list for column names
rec_list = []
    
#grabbing rows from content_matrix 
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice",
                           "beginner_runs", "intermediate_runs", "adultWeekendPrice", "expert_runs"]]

concat_df = concat_df.reset_index()

concat_df

Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,beginner_runs,intermediate_runs,adultWeekendPrice,expert_runs
0,Park City Mountain,Park City Mountain,Utah,10026,3226,6800,215.0,8,41,239.0,23.0
1,Breckenridge,Breckenridge,Colorado,12998,3398,9600,149.0,13,23,179.0,28.0
2,Keystone,Keystone,Colorado,12408,3128,9280,195.0,16,43,225.0,0.0
3,Vail,Vail,Colorado,11570,3450,8120,225.0,23,35,245.0,2.0


In [152]:
#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

#for loop that changes the user input to the month appreviation that's in the column names
selected_columns = []
for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]

#resetting index
result = result.reset_index()

result

Unnamed: 0,ski_resort,mar_mean_4_guests,mar_mean_2_guests
0,Park City Mountain,395,229
1,Breckenridge,436,258
2,Keystone,335,255
3,Vail,462,338


Testing first part of function by hard coding example user inputs

In [156]:
#Input for book title that returns the 'asin' index number for the book to be used to call dataframe
mountain_name = "Park City Mountain"
    
# input to ask user how many recommendations they would like
n_recs = 3
    
#what month would you like to travel
travel_date = "March"
    
# Pulling out an individual resort
y = sim_df.loc[[mountain_name]].T

#sorting values by similarity score
cos_sim_df = y.reset_index().sort_values(by=mountain_name, ascending=False).head(n_recs + 1)

#making list for column names
rec_list = []
    
#grabbing rows from content_matrix 
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice",
                           "beginner_runs", "intermediate_runs", "adultWeekendPrice", "expert_runs"]]

concat_df = concat_df.reset_index()

#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

#for loop that changes the user input to the month appreviation that's in the column names
selected_columns = []

for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]

#resetting index
result = result.reset_index()

#merging dataframes 
final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
#dropping mountain name from the results
final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

#showing final dataframe
final_concat_df.head(n_recs)

Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,beginner_runs,intermediate_runs,adultWeekendPrice,expert_runs,mar_mean_4_guests,mar_mean_2_guests
1,Breckenridge,Breckenridge,Colorado,12998,3398,9600,149.0,13,23,179.0,28.0,436,258
2,Keystone,Keystone,Colorado,12408,3128,9280,195.0,16,43,225.0,0.0,335,255
3,Vail,Vail,Colorado,11570,3450,8120,225.0,23,35,245.0,2.0,462,338


Making final function. I will need to add a line of code to drop the row where ski_resort matches the user's input to ensure this is not part of their recommendations.

In [180]:
# Content-based model
def content_model():
    
    #user inputs
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    
    # Pulling out an individual resort
    y = sim_df.loc[[mountain_name]].T

    #sorting values by similarity score
    cos_sim_df = y.reset_index().sort_values(by=mountain_name, ascending=False).head(n_recs + 1)

    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix 
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice", "adultWeekendPrice",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 
    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
    #dropping mountain name from the results
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

    #showing final dataframe
    return(final_concat_df.head(n_recs))

In [181]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Breckenridge,Breckenridge,Colorado,12998,3398,9600,149.0,179.0,13,23,36,28.0,266,192
2,Keystone,Keystone,Colorado,12408,3128,9280,195.0,225.0,16,43,41,0.0,229,172
3,Vail,Vail,Colorado,11570,3450,8120,225.0,245.0,23,35,40,2.0,421,260


### Collaborative Model

Importing the final cleaned user/review surprise dataframe from the collaborative model notebook. 

In [160]:
final_user_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2807 entries, 0 to 2806
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   review_date  2807 non-null   object 
 1   state        2795 non-null   object 
 2   ski_resort   2795 non-null   object 
 3   rating       2795 non-null   float64
 4   review       2795 non-null   object 
 5   user_name    2783 non-null   object 
dtypes: float64(1), object(5)
memory usage: 131.7+ KB


In [162]:
final_user_df.drop(columns="Unnamed: 0", inplace=True)

In [98]:
from surprise import Reader, Dataset
from surprise.model_selection import GridSearchCV, cross_validate, train_test_split

#copying final rewview dataframe
surprise_df = final_user_df.copy()

#dropping unneeded columns
surprise_df = surprise_df[['user_name', 'ski_resort', 'rating']]

# counting the number of reviews for each user
value_counts = surprise_df['user_name'].value_counts()

# selecting only users with more than three reviews
selected_users = value_counts[value_counts > 2].index

# selecting only the rows where the user_name is in the selected_users list
surprise_df = surprise_df[surprise_df['user_name'].isin(selected_users)]

#saving for streamlit app
surprise_df.to_csv("data/cleaned_data_exports/surprise_df.csv")

In [163]:
surprise_df.head()

Unnamed: 0,user_name,ski_resort,rating
0,anon_1,Winter Park,4.0
1,anon_1,Arapahoe Basin,5.0
2,anon_1,Steamboat,5.0
3,anon_1,Copper Mountain,5.0
4,anon_2,Solitude Mountain,5.0


In [322]:
#saving Reader information
reader = Reader(rating_scale=(1, 5))

#loading final dataset
data = Dataset.load_from_df(surprise_df[['user_name', 'ski_resort', 'rating']], reader)

#making trainset
trainset = data.build_full_trainset()

#instantiating model and training
algo = SVD(n_factors=140, n_epochs=40, biased=True)
#algo = SVDpp(n_factors=125, n_epochs=35, init_mean=.02, reg_all=.03)
algo.fit(trainset) 

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7ff07058a730>

In [117]:
#saving new dataframe with only user information
user_df = surprise_df.reset_index()
user_df.set_index('user_name', inplace = True)
user_df.drop(columns = ['rating', 'index'], inplace =True)
user_df.head()

Unnamed: 0_level_0,ski_resort
user_name,Unnamed: 1_level_1
anon_1,Winter Park
anon_1,Arapahoe Basin
anon_1,Steamboat
anon_1,Copper Mountain
anon_2,Solitude Mountain


In [118]:
#looking at number of users
print('Number of users: ', trainset.n_users, '\n')
print('Number of items: ', trainset.n_items)

Number of users:  537 

Number of items:  269


# Collaborative and Content Based Models

### Final Collaborative Model

In [184]:
content_df.columns.to_list()

['Unnamed: 0',
 'ski_resort',
 'address',
 'city',
 'state',
 'zipcode',
 'summit',
 'drop',
 'base',
 'gondolas_and_trams',
 'fast_eight',
 'high_speed_sixes',
 'quad_chairs',
 'triple_chairs',
 'double_chairs',
 'surface_lifts',
 'total_runs',
 'longest_run',
 'skiable_terrain',
 'snow_making',
 'daysOpenLastYear',
 'averageSnowfall',
 'projectedOpening',
 'projectedClosing',
 'nov_snow',
 'dec_snow',
 'jan_snow',
 'feb_snow',
 'mar_snow',
 'apr_snow',
 'childrenWeekdayPrice',
 'childrenWeekendPrice',
 'teenagerWeekdayPrice',
 'teenagerWeekendPrice',
 'adultWeekdayPrice',
 'adultWeekendPrice',
 'seniorWeekdayPrice',
 'seniorWeekendPrice',
 'childrenPrice_season',
 'teenagerPrice_season',
 'adultPrice_season',
 'Url',
 'beginner_runs',
 'intermediate_runs',
 'advanced_runs',
 'expert_runs',
 'night_skiing',
 'epic',
 'mountain_collective',
 'ikon',
 'indy',
 'dec_mean_2_guests',
 'dec_min_2_guests',
 'dec_max_2_guests',
 'jan_mean_2_guests',
 'jan_min_2_guests',
 'jan_max_2_guests',
 

In [165]:
#Collaborative model
def collaborative_model():
    
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    
    have_rated = list(user_df.loc[user, 'ski_resort'])
    not_rated = content_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]  # & (not_rated['state'] == state)]
    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    not_rated = not_rated[['ski_resort', 'state', 'city', "adultWeekdayPrice", "adultWeekendPrice", 'summit', 'drop',
                           'base','ikon', 'epic','mountain_collective',
                          'advanced_runs',  'intermediate_runs', 'expert_runs', 'predicted_rating']].copy()

    return not_rated.head(n_recs)

In [323]:
collaborative_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adultWeekdayPrice,adultWeekendPrice,summit,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
45,Telluride,Colorado,Telluride,209.0,219.0,13150,4425,8725,0,0,0,21,30,34.0,4.633803
47,Taos Ski Valley,New Mexico,Taos Ski Valley Ski Valley,195.0,195.0,12481,3281,9200,1,0,1,30,16,40.0,4.503358
12,Alta,Utah,Alta,159.0,,11068,2538,8530,1,0,1,0,0,0.0,4.472603


### Final Content Model

In [167]:
# Content-based model
def content_model():
    
    #user inputs
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    
    # Pulling out an individual resort
    y = sim_df.loc[[mountain_name]].T

    #sorting values by similarity score
    cos_sim_df = y.reset_index().sort_values(by=mountain_name, ascending=False).head(n_recs + 1)
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix 
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base", "adultWeekdayPrice", "adultWeekendPrice", 
                           'ikon', 'epic','mountain_collective',
                        "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 
    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
    #dropping mountain name from the results
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

    #showing final dataframe
    return(final_concat_df.head(n_recs))

In [171]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Snowbird
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Alta,Alta,Utah,11068,2538,8530,159.0,,1,0,1,0,0,0,0.0,352,277
2,Snowbasin,Huntsville,Utah,9350,2900,6450,149.0,169.0,1,0,1,9,33,52,6.0,240,215
3,Crystal Mountain Washington,Crystal Mountain,Washington,7012,3100,4400,99.0,99.0,1,0,0,8,31,32,29.0,293,210


## Cascade-Hybrid Model

I will be creating a cascade hybrid model for my final recommendation system. 

Unlike traditional collaborative user-based models commonly used in music and streaming platforms, these models have limitations when applied to the context of ski trip planning, given the higher opportunity cost involved.

The hybrid model begins with a collaborative model, and takes the top 50 resorts and then will refine the final recommendations by using the content based system, which will use the user's input as a guide for the final recommendations. 

In combining the models, there were a few adjustments that neede to be made:
- Since it is not guaranteed that the user's input for their mountain preference will be selected by the collaborative model, I added in the user's mountain as the top recommendation for the collaborative model. I did this my adding in the row to the final output dataframe, and then assigning the predicted rating to 5 to ensure that it appeared at the top of all results.
- I adjusted the content based model to use the final dataframe from the collaborative model. This included the top 50 results.

In [295]:
# User inputs
user = "Stephanie Ciaccia"
n_recs = 5
mountain_name = "Stevens Pass"
travel_date = "December"
mtn_pass = "Epic"
    
# Pulling out an individual resort
y = sim_df.loc[[mountain_name]].T
    
#sorting values by similarity score
cos_sim_df = y.reset_index().sort_values(by=mountain_name, ascending=False)
    
#making list for column names
rec_list = []
    
#grabbing rows from content_matrix for final output
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "summit", "drop", "base","adultWeekdayPrice", "adultWeekendPrice",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs",
                        "ikon", "epic", "mountain_collective", 'indy']]
    
concat_df = concat_df.reset_index()

In [296]:
#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

selected_columns = []
for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]
result = result.reset_index()                        
content_recommendations = pd.merge(concat_df, result, on="ski_resort")

In [297]:
content_recommendations

Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,ikon,epic,mountain_collective,indy,dec_mean_4_guests,dec_mean_2_guests
0,Stevens Pass,Skykomish,Washington,5845,1800,4061,,,8,43,31,18.0,0,1,0,0,276,235
1,Boreal,Truckee,California,7700,500,7200,49.0,,26,29,44,0.0,0,0,0,0,275,231
2,Timberline Lodge,Timberline Lodge,Oregon,8540,3690,6000,,,0,0,0,0.0,0,0,0,0,255,208
3,Bear Valley,Bear Valley,California,8500,1900,6600,,,11,41,45,4.0,0,0,0,0,270,235
4,Mt. Hood Meadows,Mt. Hood,Oregon,7300,2777,4523,,,0,0,0,0.0,0,0,0,1,267,184
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
325,Elko SnoBowl,Elko,Nevada,7000,700,6300,20.0,20.0,0,0,0,0.0,0,0,0,0,195,164
326,Mt. Southington,Southington,Connecticut,525,425,100,60.0,60.0,43,43,14,0.0,0,0,0,0,231,150
327,Powder Ridge Connecticut,Middlefield,Connecticut,720,550,170,45.0,55.0,45,40,15,0.0,0,0,0,0,233,180
328,Villa Olivia,Bartlett,Illinois,500,180,320,40.0,40.0,0,0,0,0.0,0,0,0,0,238,169


In [298]:
#adding mountain fil
if mtn_pass == "Ikon":
    content_recommendations = content_recommendations.loc[content_recommendations['ikon'] == 1]
elif mtn_pass == "Epic":
    content_recommendations = content_recommendations.loc[content_recommendations['epic'] == 1]
elif mtn_pass == "Mountain_collective":
    content_recommendations = content_recommendations.loc[content_recommendations['mountain_collective'] == 1]
elif mtn_pass == "Indy":
    content_recommendations = content_recommendations.loc[content_recommendations['indy'] == 1]
elif mtn_pass == "No":
    pass

In [299]:
content_recommendations = content_recommendations[content_recommendations.ski_resort != mountain_name].head(20)

In [301]:
# Collaborative model
have_rated = list(user_df.loc[user, 'ski_resort'])
not_rated = final_user_df.copy()
not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]
not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
not_rated.reset_index(inplace=True)
not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
collaborative_recommendations = not_rated[['ski_resort', 'predicted_rating']]

# Combine content-based and collaborative recommendations
combined_recommendations = pd.merge(content_recommendations, collaborative_recommendations, on='ski_resort', how='left')
combined_recommendations = combined_recommendations.drop_duplicates(subset=['ski_resort'])
combined_recommendations.sort_values(by='predicted_rating', ascending=False, inplace=True)
combined_recommendations.drop(columns=['ikon', 'mountain_collective', 'epic', 'indy'], inplace=True)
combined_recommendations.head(n_recs)

Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
12,Mt. Sunapee,Newbury,New Hampshire,2743,1510,1233,,,29,47,24,0.0,266,197,4.262017
15,Attitash,Bartlett,New Hampshire,2350,1750,600,79.0,89.0,26,46,28,0.0,211,164,4.19288
17,Heavenly Mountain,Stateline,California,10067,3500,7170,189.0,225.0,7,60,27,5.0,318,219,4.16181
6,Okemo Mountain,Ludlow,Vermont,3344,2200,1144,,,33,38,21,9.0,290,266,4.069245
1,Stowe Mountain,Stowe Mountain,Vermont,4395,2360,2035,,,16,55,15,15.0,354,231,3.980972


In [340]:
def hybrid_model_content():
    
    # User inputs
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    #mtn_pass = str(input('Are you using a multi-resort pass?  '))
    
    # Pulling out an individual resort
    y = sim_df.loc[[mountain_name]].T
    
    #sorting values by similarity score
    cos_sim_df = y.reset_index().sort_values(by=mountain_name, ascending=False)
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix for final output
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "summit", "drop", "base","adultWeekdayPrice", "adultWeekendPrice",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs",
                        "ikon", "epic", "mountain_collective", 'indy','nov_snow', 'dec_snow',
                        'jan_snow', 'feb_snow', 'mar_snow','apr_snow']]
    
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()                        
    content_recommendations = pd.merge(concat_df, result, on="ski_resort")
    
    #adding mountain fil
    if mtn_pass == "Ikon":
        content_recommendations = content_recommendations.loc[content_recommendations['ikon'] == 1]
    elif mtn_pass == "Epic":
        content_recommendations = content_recommendations.loc[content_recommendations['epic'] == 1]
    elif mtn_pass == "Mountain_collective":
        content_recommendations = content_recommendations.loc[content_recommendations['mountain_collective'] == 1]
    elif mtn_pass == "Indy":
        content_recommendations = content_recommendations.loc[content_recommendations['indy'] == 1]
    elif mtn_pass == "No":
        pass
    
    content_recommendations = content_recommendations[content_recommendations.ski_resort != mountain_name].head(30)

    # Collaborative model
    have_rated = list(user_df.loc[user, 'ski_resort'])
    not_rated = final_user_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]
    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    collaborative_recommendations = not_rated[['ski_resort', 'predicted_rating']]

    # Combine content-based and collaborative recommendations
    combined_recommendations = pd.merge(content_recommendations, collaborative_recommendations, on='ski_resort', how='left')
    combined_recommendations = combined_recommendations.drop_duplicates(subset=['ski_resort'])
    combined_recommendations.sort_values(by='predicted_rating', ascending=False, inplace=True)
    combined_recommendations.drop(columns=['ikon', 'mountain_collective', 'epic', 'indy'], inplace=True)
    return combined_recommendations.head(n_recs)

# Results - Testing

I will be testing the model results with users whose mountain preferences have been definied:

- Stephanie C. - Enjoys mountains with advanced and expert terrain. Enjoys mountains that have good ammenities and that are close to public transportation.
- Alexandria K. - Mountains where the majority of skiiers are there for the sport, and that do not feel overly "bougey". Skies in expert terrain.
- Raghava K. - Enjoys large mountains, back bowls, expert terrain. Parking and mountain ammenities are also important.

#### Alexandria's Results

In [175]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Stevens Pass
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Boreal,Truckee,California,7700,500,7200,49.0,,0,0,0,26,29,44,0.0,275,231
2,Timberline Lodge,Timberline Lodge,Oregon,8540,3690,6000,,,0,0,0,0,0,0,0.0,255,208
3,Bear Valley,Bear Valley,California,8500,1900,6600,,,0,0,0,11,41,45,4.0,270,235


In [324]:
collaborative_model()

Name: Alexandria Kelly
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adultWeekdayPrice,adultWeekendPrice,summit,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
51,Breckenridge,Colorado,Breckenridge,149.0,179.0,12998,3398,9600,0,1,0,36,23,28.0,4.500829
19,Snowbasin,Utah,Huntsville,149.0,169.0,9350,2900,6450,1,0,1,52,33,6.0,4.49585
45,Telluride,Colorado,Telluride,209.0,219.0,13150,4425,8725,0,0,0,21,30,34.0,4.484695


In [327]:
hybrid_model_content()

Name: Alexandria Kelly
How many resort recommendations do you want? 3
What's your favorite ski resort? Stevens Pass
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
13,Brighton,Brighton,Utah,10500,1745,8755,85.0,85.0,0,0,0,0.0,375,322,4.325022
14,Kirkwood,Kirkwood,California,9800,2000,7800,,,0,0,0,0.0,371,304,4.168588
29,Mt. Baker,Bellingham,Washington,5000,1500,3500,87.04,87.04,0,0,0,0.0,265,151,4.090883


In [317]:
hybrid_model_content()

Name: Alexandria Kelly
How many resort recommendations do you want? 5
What's your favorite ski resort? Stevens Pass
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
10,June Mountain,June Lake,California,10090,2590,7545,,,16,40,28,16.0,313,289,3.991108
19,Mt. Bachelor,Bend,Oregon,9065,3365,5700,109.0,109.0,19,55,22,3.0,231,157,3.978858
14,Kirkwood,Kirkwood,California,9800,2000,7800,,,0,0,0,0.0,371,304,3.899944
9,Stowe Mountain,Stowe Mountain,Vermont,4395,2360,2035,,,16,55,15,15.0,354,231,3.867025
16,Crystal Mountain Washington,Crystal Mountain,Washington,7012,3100,4400,99.0,99.0,8,31,32,29.0,293,210,3.851127


#### Raghava's Results

In [242]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Telluride
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Aspen Snowmass,Aspen,Colorado,12510,4406,8104,189.0,199.0,0,0,0,0.0,624,316
2,Solitude Mountain,Brighton,Utah,10488,2494,7994,115.0,115.0,6,46,30,18.0,375,322
3,Beaver Creek,Vail,Colorado,11440,3340,8100,191.0,275.0,38,30,24,8.0,420,268


In [356]:
collaborative_model()

Name: Raghava Kamalesh
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adultWeekdayPrice,adultWeekendPrice,summit,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
20,Snowbasin,Utah,Huntsville,149.0,169.0,9350,2900,6450,1,0,1,52,33,6.0,4.875112
14,Snowbird,Utah,Snowbird,184.0,,11000,3240,7760,1,0,1,43,25,24.0,4.801601
93,Lutsen Mountains,Minnesota,Lutsen Mountains,95.0,105.0,1688,825,800,0,0,0,24,58,8.0,4.742234


In [357]:
hybrid_model_content()

Name: Raghava Kamalesh
How many resort recommendations do you want? 3
What's your favorite ski resort? Snowbird
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,...,expert_runs,nov_snow,dec_snow,jan_snow,feb_snow,mar_snow,apr_snow,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
1,Snowbasin,Huntsville,Utah,9350,2900,6450,149.0,169.0,9,33,...,6.0,14,68,67,68,57,16,240,215,4.875112
0,Alta,Alta,Utah,11068,2538,8530,159.0,,0,0,...,0.0,16,88,84,74,74,40,352,277,4.683351
8,Big Sky,Big Sky,Montana,11166,4350,7500,152.0,194.0,15,25,...,18.0,17,60,47,57,56,34,314,208,4.56084


### Stephanie's Results

In [321]:
collaborative_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 5


Unnamed: 0,ski_resort,state,city,adultWeekdayPrice,adultWeekendPrice,summit,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
19,Snowbasin,Utah,Huntsville,149.0,169.0,9350,2900,6450,1,0,1,52,33,6.0,4.410153
37,Jackson Hole,Wyoming,Teton Village,215.0,215.0,10450,4139,6311,1,0,1,38,41,17.0,4.359738
154,Whiteface Mountain,New York,Wilmington,115.0,115.0,4650,3430,1220,0,0,0,31,46,0.0,4.350751
42,Steamboat,Colorado,Steamboat Springs,177.0,192.0,10568,3668,6900,1,0,0,40,43,5.0,4.323063
170,Mt. Sunapee,New Hampshire,Newbury,,,2743,1510,1233,0,1,0,24,47,0.0,4.262017


In [330]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Snowbird
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Alta,Alta,Utah,11068,2538,8530,159.0,,0,0,0,0.0,352,277
2,Snowbasin,Huntsville,Utah,9350,2900,6450,149.0,169.0,9,33,52,6.0,240,215
3,Crystal Mountain Washington,Crystal Mountain,Washington,7012,3100,4400,99.0,99.0,8,31,32,29.0,293,210


In [355]:
stephanie_hybrid = hybrid_model_content()

Name: Raghava K.
How many resort recommendations do you want? 3
What's your favorite ski resort? Snowbird
What month would you like to travel? December


KeyError: 'Raghava K.'

In [353]:
content_df.loc[content_df['ski_resort'] == "Snowbasin"] 

Unnamed: 0.1,Unnamed: 0,ski_resort,address,city,state,zipcode,summit,drop,base,gondolas_and_trams,...,long,airport_2,distance_2,lat_2,long_2,airport_3,distance_3,lat_3,long_3,total_lifts
0,0,Palisades Tahoe,PO Box 2007,Olympic Valley,California,96146,9050,2850,6200,3,...,-120.139563,Minden-Tahoe,47,39.000309,-119.750806,Greenville Muni,3357,41.446832,-80.391262,36
1,1,Mammoth Mountain,P.O. Box 24,Mammoth Mountain Lakes,California,93546,11053,3100,7953,3,...,-118.837772,Bryant,71,38.262419,-119.225709,Grand Rapids-Itasca County,2330,47.211103,-93.509845,25
2,2,Donner Ski Ranch,P.O. Box 66,Norden,California,95724,8012,750,7031,0,...,-120.139563,Reno/Tahoe International,54,39.498576,-119.768065,Mapleton Municipal,2085,42.178295,-95.793645,8
3,3,Sugar Bowl,P.O. Box 5,Norden,California,95724,8383,1500,6883,1,...,-120.139563,Reno/Tahoe International,54,39.498576,-119.768065,Mapleton Municipal,2085,42.178295,-95.793645,12
4,4,Kirkwood,PO Box 1,Kirkwood,California,95646,9800,2000,7800,0,...,-119.995335,Minden-Tahoe,43,39.000309,-119.750806,Nogales International,1165,31.417722,-110.847889,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
325,392,Oak Mountain,141 Novosel Way,Speculator,New York,12164,2400,650,1750,0,...,-74.517651,Saratoga Cty,65,43.051261,-73.861194,Hillsboro Municipal,1814,47.359408,-97.060416,4
326,394,Mt. Pleasant,23301 Plank Rd,Venango,Pennsylvania,16403,1540,340,1200,0,...,-80.214728,Erie Intl,32,42.082021,-80.176216,Nondalton,5284,59.979043,-154.839694,2
327,395,Hunt Hollow,7532 County Road 36,Naples,New York,14512,2030,825,1000,0,...,-77.713051,Hornell Muni,33,42.382144,-77.682113,Perry-Warsaw,48,42.741347,-78.052081,3
328,396,Powder Ridge Connecticut,99 Powder Hill Road,Middlefield,Connecticut,06455,720,550,170,0,...,-72.829478,Chester,23,41.383905,-72.505894,Piedmont Triad International,865,36.097747,-79.937297,6


In [343]:
stephanie_hybrid 

Unnamed: 0,ski_resort,city,state,summit,drop,base,adultWeekdayPrice,adultWeekendPrice,beginner_runs,intermediate_runs,...,expert_runs,nov_snow,dec_snow,jan_snow,feb_snow,mar_snow,apr_snow,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
19,Telluride,Telluride,Colorado,13150,4425,8725,209.0,219.0,16,30,...,34.0,16,49,47,52,49,4,425,308,4.633803
0,Alta,Alta,Utah,11068,2538,8530,159.0,,0,0,...,0.0,16,88,84,74,74,40,352,277,4.472603
1,Snowbasin,Huntsville,Utah,9350,2900,6450,149.0,169.0,9,33,...,6.0,14,68,67,68,57,16,240,215,4.370856


In [344]:
#using plotly to plot the top featurescolor=
fig = px.bar(stephanie_hybrid, x="ski_resort", y=["dec_mean_2_guests", "dec_mean_4_guests"],
            width=1000, height=500)
fig.update_layout(title_text='December Airbnb Costs',
                  title_x=0.5,
                  xaxis_title="Ski Resort",
                  yaxis_title="Nightly Price ($)",
                 plot_bgcolor='white',
                 font=dict(size=14),
                 barmode='group')

newnames = {'dec_mean_2_guests':'2 Guest Mean', 'dec_mean_4_guests': '4 Guest Mean'}
fig.for_each_trace(lambda t: t.update(name = newnames[t.name],
                                      legendgroup = newnames[t.name],
                                      hovertemplate = t.hovertemplate.replace(t.name, newnames[t.name])
                                     )
                  )

fig.update_traces(textposition='outside')               
              
fig.show()

In [350]:
#using plotly to plot the top featurescolor=
fig = px.bar(stephanie_hybrid, x="ski_resort", y=['nov_snow', 'dec_snow', 'jan_snow', 'feb_snow', 'mar_snow','apr_snow'],
            width=1000, height=500)
fig.update_layout(title_text='December Airbnb Costs',
                  title_x=0.5,
                  xaxis_title="Ski Resort",
                  yaxis_title="Nightly Price ($)",
                 plot_bgcolor='white',
                 font=dict(size=14),
                 barmode='group')

fig.update_traces(textposition='outside')               
              
fig.show()

In [352]:
#using plotly to plot the top featurescolor=
fig = px.bar(stephanie_hybrid, x="ski_resort", y=['base', 'summit', 'drop'],
            width=1000, height=500)
fig.update_layout(title_text='Mountain Vertical',
                  title_x=0.5,
                  xaxis_title="Ski Resort",
                  yaxis_title="Ft.",
                 plot_bgcolor='white',
                 font=dict(size=14),
                 barmode='group')

fig.update_traces(textposition='outside')               
              
fig.show()

# Conclusions

Based on the user analysis, the recommendation system demonstrates strong performance in suggesting ski resorts that align with user inputs. Though, there are times when the output does not seem entirely alligned with the user preferences or former reviews. I believe this is due to the review datatset, being that not all resorts included in the content model were reviewed by users and due to the fact that users with 3 reviews were included in the dataset.

Though the results could use some fine tuning, the recommendation system utilizes collaborative filtering and content-based approaches to provide strong recommendations based on user preferences and resort characteristics.

It is important to acknowledge that recommendations are inherently subjective, as they rely on individual preferences and the available dataset. To further enhance the system and ensure continuous optimization, user feedback is needed. By incorporating user feedback, the recommendations can be refined and the overall user experience improved, creating a more personalized system that caters to individual preferences.


# Next Steps

Next steps involve deploying a web application, expanding the dataset with additional user ratings and features, and refining the hybrid model to enhance its performance.

- The OnTheSnow ratings dataset did not have unique user IDs for each rating, which reduced the number of reviews used to create the collaborative model. As a result, not all ski resorts in the USA were included. By incorporating more reviews, more mountains will be included in the collaborative filtering process which could result in more accurate recommendations.

- Once additional user ratings are collected, the cascade hybrid model will be fine-tuned and the main algorithms re-run.

- Finally, additional feature characteristics related to the resort towns and mountains will be incorporated. These features could include ratings and assessments of mountain restaurants, parking information, lodging options, après-ski activities, ski rentals, and other amenities available in the resort towns. By including these metrics in the recommendation system, a more comprehensive and personalized service can be provided, catering to diverse preferences and requirements, for a focus that is greater than skiing.
