# Avant Ski - Content Based System

by: Stephanie Ciaccia

# Overview

Skiing holds a prominent place for those seeking winter recreational activities in the United States. With its stunning mountain ranges and diverse terrain, the country boasts numerous ski resorts that cater to all skill levels, from beginners to seasoned professionals. Skiing offers a unique blend of adventure, physical activity, and natural beauty, making it a popular choice for winter enthusiasts seeking both relaxation and excitement.

The ski market in the United States is thriving, contributing significantly to the economy. According to the [National Ski Areas Association (NSAA)](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://nsaa.org/webdocs/Media_Public/IndustryStats/Historical_Skier_Days_1979_2022.pdf), approximately 60.7 million skiers and snowboarders visited 473 ski resorts in the 2021-2022 winter season.

# Business Problem 
Skiing is an exhilarating winter activity enjoyed by many, but barriers such as high costs and limited accessibility often hinder people from fully experiencing its joys. Choosing the right ski resort can be overwhelming due to the multitude of options available, and existing websites lack dynamic filtering capabilities based on user preferences.

To address these challenges, I'm developing Avant Ski, a ski resort recommendation app. Avant Ski simplifies the ski resort selection process by leveraging data and user preferences. With dynamic filtering features, users can personalize their search based on budget, location, amenities, and skill level. By bridging the gap between ski enthusiasts and their dream destinations, Avant Ski makes skiing accessible to a wider audience, empowering them to plan unforgettable ski trips with confidence.

# Data Understading

In [101]:
import pandas as pd
import numpy as np
import math
from datetime import datetime
import datetime
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
%matplotlib inline
import plotly
import plotly.express as px
import plotly.io as pio
from matplotlib.ticker import StrMethodFormatter

from collections import Counter
from nltk.corpus import stopwords

from IPython.display import display

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.neighbors import NearestNeighbors

from surprise import SVDpp, SVD
from surprise import accuracy
from surprise import Dataset
from surprise import Reader

import glob
import os

In [120]:
def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')

## Importing Data
### Data Source #1 - Final Feature Data
Importing main ski resort and features dataframe that I scraped and cleaned from OnTheSnow in cleaning notebook.

In [121]:
content_df = pd.read_csv("cleaned_data_exports/scraped_feature_df.csv")

### Data Source #2 - Final User/Review Data

Importing final cleaned user review data from the cleaning notebook.

In [122]:
final_user_df = pd.read_csv("cleaned_data_exports/user_df_model.csv")

### Content Modeling

To begin our content modeling, we will need to create a feature matrix that will store all of the feature information. This matrix will allow us to calculate the similarties between item vectors, so we can determine which ski resorts are similar.

In [123]:
content_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 329 entries, 0 to 328
Data columns (total 87 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Unnamed: 0                329 non-null    int64  
 1   ski_resort                329 non-null    object 
 2   address                   329 non-null    object 
 3   city                      329 non-null    object 
 4   state                     329 non-null    object 
 5   zipcode                   329 non-null    object 
 6   sumt                      329 non-null    int64  
 7   drop                      329 non-null    int64  
 8   base                      329 non-null    int64  
 9   gondolas_and_trams        329 non-null    float64
 10  fastEight                 329 non-null    float64
 11  highSpeedSixes            329 non-null    float64
 12  quadChairs                329 non-null    float64
 13  tripleChairs              329 non-null    float64
 14  doubleChai

In [124]:
#making a copy of the finaldataframe
content_matrix = content_df.copy()

In [126]:
drop_list = ['full_address', 'address', 'zipcode', 'Url', 'projectedOpening', 'projectedClosing', 'Unnamed: 0', 
             'full_address']

content_matrix.drop(columns=drop_list, inplace=True)

In [127]:
content_matrix

Unnamed: 0,ski_resort,city,state,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,...,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,latitude,longitude,total_lifts
0,49 Degrees North,Chewelah,Washington,5774,1851,3932,0.0,1.0,0.0,1.0,...,394.0,216.142857,79.0,435.0,165.625000,79.0,375.0,48.276287,-117.715521,7.0
1,Afton Alps,Hastings,Minnesota,1530,350,1180,0.0,0.0,0.0,1.0,...,725.0,262.093750,103.0,749.0,194.750000,80.0,388.0,44.854416,-92.790839,21.0
2,Alpental,Snoquale Pass,Washington,5420,2280,3140,0.0,1.0,0.0,0.0,...,1200.0,347.687500,90.0,1200.0,245.875000,78.0,580.0,47.392335,-121.400094,5.0
3,Alpine Valley Ohio,Chesterland,Ohio,1500,230,1260,0.0,0.0,0.0,1.0,...,519.0,220.742857,94.0,719.0,167.125000,70.0,339.0,41.526814,-81.259820,5.0
4,Alpine Valley Wisconsin,East Troy,Wisconsin,1040,388,820,0.0,3.0,0.0,0.0,...,502.0,239.406250,65.0,750.0,227.947368,90.0,406.0,42.785292,-88.405096,13.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
324,Woods Valley,Westernville,New York,1400,500,900,0.0,0.0,0.0,0.0,...,666.0,245.428571,129.0,677.0,175.458333,85.0,350.0,43.305625,-75.382950,6.0
325,Yawgoo Valley,Slocum,Rhode Island,315,245,70,0.0,0.0,0.0,0.0,...,658.0,324.771429,150.0,699.0,266.000000,125.0,950.0,41.532874,-71.514729,4.0
326,Badger Pass,Yosete,California,7800,600,7200,0.0,0.0,0.0,0.0,...,559.0,283.685714,130.0,950.0,179.785714,67.0,259.0,37.747595,-119.584136,5.0
327,Shawnee Mountain,Shawnee on Delaware,Pennsylvania,1350,700,650,0.0,1.0,0.0,1.0,...,411.0,207.029412,71.0,473.0,141.695652,68.0,310.0,41.012317,-75.110733,9.0


In [128]:
content_matrix['state'].unique()

array(['Washington', 'Minnesota', 'Ohio', 'Wisconsin', 'Utah', 'Alaska',
       'New Mexico', 'Oregon', 'North Carolina', 'Michigan', 'Colorado',
       'Arizona', 'New Hampshire', 'Pennsylvania', 'California',
       'New York', 'Masshusetts', 'Montana', 'Maine', 'Idaho', 'Vermont',
       'Virginia', 'New Jersey', 'West Virginia', 'Illinois',
       'South Dakota', 'Nevada', 'Wyoming', 'Missouri', 'Connecticut',
       'Iowa', 'Tennessee', 'Indiana', 'Maryland', 'Rhode Island'],
      dtype=object)

In [129]:
content_matrix['ski_resort']

0             49 Degrees North
1                   Afton Alps
2                     Alpental
3           Alpine Valley Ohio
4      Alpine Valley Wisconsin
                ...           
324               Woods Valley
325              Yawgoo Valley
326                Badger Pass
327           Shawnee Mountain
328         Wachusett Mountain
Name: ski_resort, Length: 329, dtype: object

### One Hot Encoding Categorical Variables

I will be one hot encoding the state column, as this is the only column in the dataframe that is a caterogial values. I would like to keep this in the final model, as the location of a resort often plays an important role in deciding where to ski.

In [130]:
# Instantiating OHE
ohe = OneHotEncoder()

# fit and transforming
ohe_state = pd.DataFrame(ohe.fit_transform(content_matrix[['state']]).toarray())

# renaming based on original names
ohe_state.columns = ohe.get_feature_names(['state'])

In [131]:
ohe_state

Unnamed: 0,state_Alaska,state_Arizona,state_California,state_Colorado,state_Connecticut,state_Idaho,state_Illinois,state_Indiana,state_Iowa,state_Maine,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
324,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
325,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
326,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
327,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [132]:
#setting index
ohe_state = ohe_state.set_index(content_matrix['ski_resort'])

In [133]:
ohe_state

Unnamed: 0_level_0,state_Alaska,state_Arizona,state_California,state_Colorado,state_Connecticut,state_Idaho,state_Illinois,state_Indiana,state_Iowa,state_Maine,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49 Degrees North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Afton Alps,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alpental,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Alpine Valley Ohio,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alpine Valley Wisconsin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Woods Valley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Yawgoo Valley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Badger Pass,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Shawnee Mountain,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [134]:
#resetting index as ski_resort
content_matrix = content_matrix.set_index("ski_resort")

In [135]:
#dropping state column
final_content_matrix = content_matrix.drop(columns=["state", "city"])

In [136]:
#filling null matrix values with 0
final_content_matrix = final_content_matrix.fillna(0)

In [137]:
final_content_matrix

Unnamed: 0_level_0,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,tripleChairs,doubleChairs,surfeLifts,...,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,latitude,longitude,total_lifts
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49 Degrees North,5774,1851,3932,0.0,1.0,0.0,1.0,0.0,4.0,1.0,...,394.0,216.142857,79.0,435.0,165.625000,79.0,375.0,48.276287,-117.715521,7.0
Afton Alps,1530,350,1180,0.0,0.0,0.0,1.0,2.0,14.0,4.0,...,725.0,262.093750,103.0,749.0,194.750000,80.0,388.0,44.854416,-92.790839,21.0
Alpental,5420,2280,3140,0.0,1.0,0.0,0.0,0.0,3.0,1.0,...,1200.0,347.687500,90.0,1200.0,245.875000,78.0,580.0,47.392335,-121.400094,5.0
Alpine Valley Ohio,1500,230,1260,0.0,0.0,0.0,1.0,2.0,0.0,2.0,...,519.0,220.742857,94.0,719.0,167.125000,70.0,339.0,41.526814,-81.259820,5.0
Alpine Valley Wisconsin,1040,388,820,0.0,3.0,0.0,0.0,4.0,0.0,6.0,...,502.0,239.406250,65.0,750.0,227.947368,90.0,406.0,42.785292,-88.405096,13.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Woods Valley,1400,500,900,0.0,0.0,0.0,0.0,0.0,2.0,4.0,...,666.0,245.428571,129.0,677.0,175.458333,85.0,350.0,43.305625,-75.382950,6.0
Yawgoo Valley,315,245,70,0.0,0.0,0.0,0.0,0.0,2.0,2.0,...,658.0,324.771429,150.0,699.0,266.000000,125.0,950.0,41.532874,-71.514729,4.0
Badger Pass,7800,600,7200,0.0,0.0,0.0,0.0,1.0,3.0,1.0,...,559.0,283.685714,130.0,950.0,179.785714,67.0,259.0,37.747595,-119.584136,5.0
Shawnee Mountain,1350,700,650,0.0,1.0,0.0,1.0,0.0,3.0,4.0,...,411.0,207.029412,71.0,473.0,141.695652,68.0,310.0,41.012317,-75.110733,9.0


### Scaling Data

I will be using StandardScaler to scale the values in the matrix to ensure they are on the same scale. This is necessary to continue modeling.

In [138]:
#instantiating minmaxscaler
scaler = StandardScaler()

#scaling array
scaled = scaler.fit_transform(final_content_matrix)

#saving as dataframe
scaled_ski_df = pd.DataFrame(scaled, index=final_content_matrix.index, columns=final_content_matrix.columns)

In [139]:
scaled_ski_df

Unnamed: 0_level_0,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,tripleChairs,doubleChairs,surfeLifts,...,mar_max_4_guests,apr_mean_4_guests,apr_min_4_guests,apr_max_4_guests,may_mean_4_guests,may_min_4_guests,may_max_4_guests,latitude,longitude,total_lifts
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49 Degrees North,0.305234,0.654541,0.170126,-0.323977,-0.027422,-0.304056,0.004345,-0.907215,1.352650,-0.812682,...,-0.802303,-0.659457,-0.915449,-0.641683,-0.744282,-0.502380,-0.433056,1.269453,-1.272368,-0.227054
Afton Alps,-0.832604,-0.927553,-0.714607,-0.323977,-0.478521,-0.304056,0.004345,0.293887,7.169931,0.617117,...,0.267857,-0.093674,-0.239267,0.239079,-0.253283,-0.465568,-0.380998,0.561764,-0.006018,2.133695
Alpental,0.210325,1.106718,-0.084492,-0.323977,-0.027422,-0.304056,-0.710413,-0.907215,0.770922,-0.812682,...,1.803585,0.960222,-0.605532,1.504124,0.608598,-0.539191,0.387862,1.086640,-1.459571,-0.564304
Alpine Valley Ohio,-0.840647,-1.054036,-0.688888,-0.323977,-0.478521,-0.304056,0.004345,0.293887,-0.974262,-0.336082,...,-0.398164,-0.602818,-0.492835,0.154930,-0.718995,-0.833682,-0.577217,-0.126429,0.579839,-0.564304
Alpine Valley Wisconsin,-0.963976,-0.887500,-0.830342,-0.323977,0.874776,-0.304056,-0.710413,1.494989,-0.974262,1.570315,...,-0.453127,-0.373020,-1.309888,0.241884,0.306368,-0.097455,-0.308917,0.133842,0.216809,0.784696
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Woods Valley,-0.867458,-0.769449,-0.804623,-0.323977,-0.478521,-0.304056,-0.710413,-0.907215,0.189194,0.617117,...,0.077103,-0.298869,0.493263,0.037121,-0.578509,-0.281512,-0.533168,0.241453,0.878426,-0.395679
Yawgoo Valley,-1.158352,-1.038226,-1.071457,-0.323977,-0.478521,-0.304056,-0.710413,-0.907215,0.189194,-0.336082,...,0.051238,0.678062,1.084923,0.098831,0.947872,1.190942,1.869520,-0.125175,1.074959,-0.732929
Badger Pass,0.848415,-0.664046,1.220747,-0.323977,-0.478521,-0.304056,-0.710413,-0.306664,0.770922,-0.812682,...,-0.268840,0.172183,0.521438,0.802880,-0.505556,-0.944116,-0.897576,-0.908022,-1.367307,-0.564304
Shawnee Mountain,-0.880863,-0.558644,-0.884995,-0.323977,-0.027422,-0.304056,0.004345,-0.907215,0.770922,0.617117,...,-0.747340,-0.771669,-1.140843,-0.535094,-1.147691,-0.907304,-0.693347,-0.232834,0.892256,0.110196


In [140]:
#merging scaled_ski_df and one hot encoded dataframes
final_content_df = scaled_ski_df.join(ohe_state)

In [141]:
final_content_df

Unnamed: 0_level_0,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,tripleChairs,doubleChairs,surfeLifts,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49 Degrees North,0.305234,0.654541,0.170126,-0.323977,-0.027422,-0.304056,0.004345,-0.907215,1.352650,-0.812682,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Afton Alps,-0.832604,-0.927553,-0.714607,-0.323977,-0.478521,-0.304056,0.004345,0.293887,7.169931,0.617117,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alpental,0.210325,1.106718,-0.084492,-0.323977,-0.027422,-0.304056,-0.710413,-0.907215,0.770922,-0.812682,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Alpine Valley Ohio,-0.840647,-1.054036,-0.688888,-0.323977,-0.478521,-0.304056,0.004345,0.293887,-0.974262,-0.336082,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alpine Valley Wisconsin,-0.963976,-0.887500,-0.830342,-0.323977,0.874776,-0.304056,-0.710413,1.494989,-0.974262,1.570315,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Woods Valley,-0.867458,-0.769449,-0.804623,-0.323977,-0.478521,-0.304056,-0.710413,-0.907215,0.189194,0.617117,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Yawgoo Valley,-1.158352,-1.038226,-1.071457,-0.323977,-0.478521,-0.304056,-0.710413,-0.907215,0.189194,-0.336082,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Badger Pass,0.848415,-0.664046,1.220747,-0.323977,-0.478521,-0.304056,-0.710413,-0.306664,0.770922,-0.812682,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Shawnee Mountain,-0.880863,-0.558644,-0.884995,-0.323977,-0.027422,-0.304056,0.004345,-0.907215,0.770922,0.617117,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Cosine Similarity

I will start the content based modeling using cosine similarty to determine the distance between related ski resorts. 

In [142]:
sim_df = pd.DataFrame(cosine_similarity(final_content_df), index=final_content_df.index, columns=final_content_df.index)

In [143]:
#sim_df.to_csv("data/sim_matrix_2.csv")

In [144]:
sim_df.head()

ski_resort,49 Degrees North,Afton Alps,Alpental,Alpine Valley Ohio,Alpine Valley Wisconsin,Alta,Alyeska,Andes Tower Hills,Angel Fire,Anthony Lakes Mountain,...,Winterplace Ski,Wisp,Wolf Creek,Wolf Ridge Ski,Woodbury,Woods Valley,Yawgoo Valley,Badger Pass,Shawnee Mountain,Wachusett Mountain
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49 Degrees North,1.0,0.015845,-0.044947,0.027875,-0.063815,-0.161772,-0.011038,-0.261416,0.355375,0.426521,...,0.032279,-0.528757,0.467005,0.149064,-0.393591,0.187242,-0.250987,0.183395,0.259603,-0.141301
Afton Alps,0.015845,1.0,-0.054467,0.228904,0.104278,-0.054238,-0.166652,-0.147191,-0.014537,-0.253379,...,0.187529,-0.001911,-0.187475,0.062028,-0.062349,0.188048,0.035385,0.045532,0.224891,-0.01843
Alpental,-0.044947,-0.054467,1.0,-0.382228,-0.473356,0.571909,0.664177,-0.070504,-0.265901,-0.149778,...,-0.367566,-0.075008,0.077941,-0.542428,-0.014608,-0.288077,-0.146164,-0.132891,-0.471102,-0.317861
Alpine Valley Ohio,0.027875,0.228904,-0.382228,1.0,0.410744,-0.464258,-0.426995,0.072529,0.098683,0.251703,...,0.044214,0.001427,-0.208512,0.572488,0.068269,0.292331,0.069068,0.129514,0.500106,0.385854
Alpine Valley Wisconsin,-0.063815,0.104278,-0.473356,0.410744,1.0,-0.381725,-0.388276,0.071665,0.178701,0.001201,...,0.183667,0.25862,-0.112638,0.346344,-0.018196,0.202904,0.018337,-0.131854,0.336973,0.423667


In [227]:
#final_content_df.to_csv("data/final_content_df.csv")

In [228]:
#content_matrix.to_csv("data/content_matrix.csv")

### Function Building

In [147]:
# Input for mountain name
mountain_name = str(input("What is your favorite ski resort? "))

# input to ask user how many recommendations they would like
n_recs = int(input('How many recommendations would you like? '))
    
#what month would you like to travel
travel_date = str(input('What month would you like to travel? '))

What is your favorite ski resort? Arapahoe Basin
How many recommendations would you like? 3
What month would you like to travel? December


In [148]:
# Pulling out an individual mountain

y = final_content_df.loc[[mountain_name]]
y

Unnamed: 0_level_0,sumt,drop,base,gondolas_and_trams,fastEight,highSpeedSixes,quadChairs,tripleChairs,doubleChairs,surfeLifts,...,state_Rhode Island,state_South Dakota,state_Tennessee,state_Utah,state_Vermont,state_Virginia,state_Washington,state_West Virginia,state_Wisconsin,state_Wyoming
ski_resort,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Arapahoe Basin,2.255967,1.370225,2.371671,-0.323977,-0.027422,-0.304056,1.433861,-0.306664,-0.392534,0.140517,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [149]:
# Using cosine_similarity to return similarity scores
cos_sim = cosine_similarity(final_content_df, y)

# Create a dataframe with similairty scores based on ski resort
cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
cos_sim_df

#sorting in descending order
cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

#saving as new dataframe
cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
cos_sim_df

Unnamed: 0,ski_resort,0
0,Arapahoe Basin,1.0
1,Grand Targhee,0.617913
2,Taos Ski Valley,0.59178
3,Mammoth Mountain,0.5851


In [150]:
#making list for column names
rec_list = []
    
#grabbing rows from content_matrix 
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]

concat_df = concat_df.reset_index()

concat_df

Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs
0,Arapahoe Basin,Dillon,Colorado,13050,2530,10780,99,79,7,20,49,24
1,Grand Targhee,Alta,Wyoming,9920,2270,7851,46,46,0,0,0,0
2,Taos Ski Valley,Taos Ski Valley,New Mexico,12481,3281,9200,150,150,14,16,30,40
3,Mammoth Mountain,Mammoth Lakes,California,11053,3100,7953,179,149,15,48,24,13


In [151]:
#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

#for loop that changes the user input to the month appreviation that's in the column names
selected_columns = []
for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]

#resetting index
result = result.reset_index()

result

Unnamed: 0,ski_resort,dec_mean_4_guests,dec_mean_2_guests
0,Arapahoe Basin,186.685714,141.861111
1,Grand Targhee,225.628571,181.472222
2,Taos Ski Valley,235.342857,160.542857
3,Mammoth Mountain,260.466667,191.96875


Testing first part of function by hard coding example user inputs

In [152]:
#Input for book title that returns the 'asin' index number for the book to be used to call dataframe
mountain_name = "Park City Mountain"
    
# input to ask user how many recommendations they would like
n_recs = 3
    
#what month would you like to travel
travel_date = "March"
    
# Pulling out an individual resort
y = final_content_df.loc[[mountain_name]]

# Using cosine_similarity to return similarity scores
cos_sim = cosine_similarity(final_content_df, y)

# Create a dataframe with similairty scores based on ski resort
cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
cos_sim_df

#sorting in descending order
cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

#saving as new dataframe
cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
cos_sim_df

#making list for column names
rec_list = []
    
#grabbing rows from content_matrix 
for x in cos_sim_df['ski_resort']:
    rec_df = content_matrix.loc[[x]]  
    rec_list.append(rec_df)  #

rec_df = pd.concat(rec_list)

#Concatenate all the dataframes in rec_list into a single dataframe
concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]

concat_df = concat_df.reset_index()

#filtering based on month to return airbnb prices and turning into dataframe
travel_date = travel_date.lower()

month = ["december", "january", "february", "march", "april", "may"]
month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

#for loop that changes the user input to the month appreviation that's in the column names
selected_columns = []
for x, y in zip(month_abv, month):
    if travel_date == y:
        selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

result = rec_df[selected_columns]

#resetting index
result = result.reset_index()

result

Unnamed: 0,ski_resort,mar_mean_4_guests,mar_mean_2_guests
0,Park City Mountain,395.342857,229.060606
1,Heavenly Mountain,364.363636,262.058824
2,Keystone,335.457143,255.194444
3,Steamboat,393.114286,280.055556


Making final function. I will need to add a line of code to drop the row where ski_resort matches the user's input to ensure this is not part of their recommendations.

In [153]:
# Content-based model
def content_model():
    
    #user inputs
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    
    # Pulling out an individual resort
    y = final_content_df.loc[[mountain_name]]

    # Using cosine_similarity to return similarity scores
    cos_sim = cosine_similarity(final_content_df, y)

    # Create a dataframe with similairty scores based on ski resort
    cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
    cos_sim_df

    # sorting in descending order
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

    #saving as new dataframe
    cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
    cos_sim_df
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix 
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 
    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
    #dropping mountain name from the results
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

    #showing final dataframe
    return(final_concat_df.head(n_recs))

In [155]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Hunter Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Mt. Snow,West Dover,Vermont,3600,1700,1900,55,46,16,67,16,1,291.823529,237.083333
2,Okemo Mountain,Ludlow,Vermont,3344,2200,1144,55,46,33,38,21,9,290.914286,266.4
3,Seven Springs,Seven Springs,Pennsylvania,2994,750,2240,55,69,38,36,14,12,232.0,227.083333


### Collaborative Model

Importing the final cleaned user/review surprise dataframe from the collaborative model notebook. 

In [156]:
final_user_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2807 entries, 0 to 2806
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   2795 non-null   float64
 1   review_date  2807 non-null   object 
 2   state        2795 non-null   object 
 3   ski_resort   2795 non-null   object 
 4   rating       2795 non-null   float64
 5   review       2795 non-null   object 
 6   user_name    2783 non-null   object 
dtypes: float64(2), object(5)
memory usage: 153.6+ KB


In [157]:
final_user_df.drop(columns="Unnamed: 0", inplace=True)

In [158]:
from surprise import Reader, Dataset
from surprise.model_selection import GridSearchCV, cross_validate, train_test_split

#copying final rewview dataframe
surprise_df = final_user_df.copy()

#dropping unneeded columns
surprise_df = surprise_df[['user_name', 'ski_resort', 'rating']]

# counting the number of reviews for each user
value_counts = surprise_df['user_name'].value_counts()

# selecting only users with more than three reviews
selected_users = value_counts[value_counts > 2].index

# selecting only the rows where the user_name is in the selected_users list
surprise_df = surprise_df[surprise_df['user_name'].isin(selected_users)]

In [159]:
surprise_df.head()

Unnamed: 0,user_name,ski_resort,rating
0,anon_1,Winter Park,4.0
1,anon_1,Arapahoe Basin,5.0
2,anon_1,Steamboat,5.0
3,anon_1,Copper Mountain,5.0
4,anon_2,Solitude Mountain,5.0


In [160]:
#saving Reader information
reader = Reader(rating_scale=(1, 5))

#loading final dataset
data = Dataset.load_from_df(surprise_df[['user_name', 'ski_resort', 'rating']], reader)

#making trainset
trainset = data.build_full_trainset()

#instantiating model and training
algo = SVDpp(n_factors=140, n_epochs=120, init_mean=.01)
algo.fit(trainset) 

<surprise.prediction_algorithms.matrix_factorization.SVDpp at 0x7fab48073fa0>

In [161]:
#saving new dataframe with only user information
user_df = surprise_df.reset_index()
user_df.set_index('user_name', inplace = True)
user_df.drop(columns = ['rating', 'index'], inplace =True)
user_df.head()

Unnamed: 0_level_0,ski_resort
user_name,Unnamed: 1_level_1
anon_1,Winter Park
anon_1,Arapahoe Basin
anon_1,Steamboat
anon_1,Copper Mountain
anon_2,Solitude Mountain


In [162]:
#looking at number of users
print('Number of users: ', trainset.n_users, '\n')
print('Number of items: ', trainset.n_items)

Number of users:  537 

Number of items:  269


# Collaborative and Content Based Models

### Final Collaborative Model

In [165]:
#Collaborative model
def collaborative_model():
    
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    
    have_rated = list(user_df.loc[user, 'ski_resort'])
    not_rated = content_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]  # & (not_rated['state'] == state)]
    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    not_rated = not_rated[['ski_resort', 'state', 'city', "adult_weekend", "adult_weekday", 'sumt', 'drop',
                           'base','ikon', 'epic','mountain_collective',
                          'advanced_runs',  'intermediate_runs', 'expert_runs', 'predicted_rating']].copy()

    return not_rated.head(n_recs)

In [166]:
collaborative_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
222,Schweitzer,Idaho,Sandpoint,60,60,6400,2400,4000,1,0,0,35,40,15,4.707028
258,Steamboat,Colorado,Steamboat Springs,106,167,10568,3668,6900,1,0,0,40,43,5,4.654241
83,Deer Valley,Utah,Park City,55,46,9570,3000,6570,1,0,0,10,31,32,4.501867


### Final Content Model

In [167]:
# Content-based model
def content_model():
    
    #user inputs
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    
    # Pulling out an individual resort
    y = final_content_df.loc[[mountain_name]]

    # Using cosine_similarity to return similarity scores
    cos_sim = cosine_similarity(final_content_df, y)

    # Create a dataframe with similairty scores based on ski resort
    cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
    cos_sim_df

    # sorting in descending order
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

    #saving as new dataframe
    cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
    cos_sim_df
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix 
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           'ikon', 'epic','mountain_collective',
                        "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 
    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    
    #dropping mountain name from the results
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]

    #showing final dataframe
    return(final_concat_df.head(n_recs))

In [168]:
content_model()

How many resort recommendations do you want? 5
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Heavenly Mountain,Stateline,California,10067,3500,7170,225,189,0,1,0,7,60,27,5,318.909091,219.636364
2,Keystone,Keystone,Colorado,12408,3128,9280,390,131,0,1,0,16,43,41,0,229.142857,172.514286
3,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,1,0,0,12,43,40,5,246.771429,189.138889
4,Breckenridge,Breckenridge,Colorado,12998,3398,9600,55,116,0,1,0,13,23,36,28,266.028571,192.194444
5,Palisades Tahoe,Olympic Valley,California,9050,2850,6200,229,149,1,0,1,0,0,0,0,353.65625,284.09375


## Cascade-Hybrid Model

I will be creating a cascade hybrid model for my final recommendation system. 

Unlike traditional collaborative user-based models commonly used in music and streaming platforms, these models have limitations when applied to the context of ski trip planning, given the higher opportunity cost involved.

The hybrid model begins with a collaborative model, and takes the top 50 resorts and then will refine the final recommendations by using the content based system, which will use the user's input as a guide for the final recommendations. 

In combining the models, there were a few adjustments that neede to be made:
- Since it is not guaranteed that the user's input for their mountain preference will be selected by the collaborative model, I added in the user's mountain as the top recommendation for the collaborative model. I did this my adding in the row to the final output dataframe, and then assigning the predicted rating to 5 to ensure that it appeared at the top of all results.
- I adjusted the content based model to use the final dataframe from the collaborative model. This included the top 50 results.

In [171]:
#Collaborative model
def hybrid_model():
    
#inputs
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))

#making a list of resorts the user has rated
    have_rated = list(user_df.loc[user, 'ski_resort'])
    
#dropping rated list to create new dataframe of unrated resorts
    not_rated = content_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]  # & (not_rated['state'] == state)]

# adding mountain_name row to the dataframe to ensure this is included in final results for content based model
    if mountain_name not in not_rated['ski_resort'].values:
        mountain_row = final_user_df.loc[final_user_df['ski_resort'] == mountain_name].copy()
        mountain_row['predicted_rating'] = algo.predict(user, mountain_name).est
        not_rated = pd.concat([not_rated, mountain_row])

    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    not_rated = not_rated[['ski_resort', 'state', 'city', 'sumt', 'drop', 'base', 'intermediate_runs',
                                  'advanced_runs', 'expert_runs', 'predicted_rating', 'ikon', 'epic',
                                  'mountain_collective']].copy()

    not_rated.loc[not_rated['ski_resort'] == mountain_name, 'predicted_rating'] = 5

    not_rated = not_rated.sort_values(by="predicted_rating", ascending=False)
    
#updating dataframe for collaborative model
    
    #saving top 50 ski resorts as list
    not_rated_top_list = not_rated['ski_resort'].head(50)

    #making list for column names
    collab_list = []
    
    #grabbing rows from final_condent_df
    for x in not_rated_top_list:
        try:
            col_df = final_content_df.loc[[x]]
            collab_list.append(col_df)
        except KeyError:
            continue

    collab_df = pd.concat(collab_list)
    
    #grabbing top 20 results from collaborative
    
    y = collab_df.loc[[mountain_name]]

    # Utilize cosine_similarity from to return similarity scores based on cosine distance
    cos_sim = cosine_similarity(collab_df, y)

    # Create a dataframe with similairty scores based on ski resort
    cos_sim_df = pd.DataFrame(data=cos_sim, index=collab_df.index)
    cos_sim_df

    # sorting in descending order
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)

    #saving as new dataframe
    cos_sim_df = cos_sim_df.reset_index().head(n_recs + 1)
    #cos_sim_df = cos_sim_df.rename(columns={0:"sim_score"})
    cos_sim_df
    
    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix for final output
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()

    #merging dataframes 

    final_concat_df = pd.merge(concat_df, result, on="ski_resort")
    final_concat_df = final_concat_df[final_concat_df.ski_resort != mountain_name]
    
    #showing final dataframe
    return final_concat_df.head(n_recs)

In [172]:
hybrid_model()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Keystone,Keystone,Colorado,12408,3128,9280,390,131,16,43,41,0,229.142857,172.514286
2,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,12,43,40,5,246.771429,189.138889
3,Crested Butte Mountain,Mt. Crested Butte,Colorado,12162,3062,9375,294,96,14,25,25,36,255.323529,150.625


## Testing Model Results

I will be testing the model results with users whose mountain preferences have been definied:

- Alexandria Kelly - Mountains where the majority of skiiers are there for the sport, and that do not feel overly "bougey". Skies in expert terrain.
- Raghava Kamalesh - Enjoys apres ski, large mountains, expert terrain, and well known mountains.
- Joseph Lewis - Skies locally in the NY area. Is looking to explore more mountains.

#### Alexandria's Results

In [173]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Stevens Pass
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Boreal,Truckee,California,7700,500,7200,55,39,0,0,0,26,29,44,0,275.542857,231.333333
2,Bear Valley,Bear Valley,California,8500,1900,6600,55,46,0,0,0,11,41,45,4,270.285714,235.470588
3,White Pass,White Pass,Washington,6550,2050,4500,49,49,0,0,0,0,0,0,0,237.363636,212.058824


In [179]:
collaborative_model()

Name: Alexandria Kelly
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
248,Mt. Bohemia,Michigan,Michigan,55.0,68.0,1500.0,900.0,600.0,0.0,0.0,0.0,8.0,2.0,90.0,4.287116
16,Alta,Utah,Alta,55.0,159.0,11068.0,2538.0,8530.0,1.0,0.0,1.0,0.0,0.0,0.0,4.207366
26,Breckenridge,Colorado,Breckenridge,55.0,116.0,12998.0,3398.0,9600.0,0.0,1.0,0.0,36.0,23.0,28.0,4.201361


In [181]:
hybrid_model()

Name: Alexandria Kelly
How many resort recommendations do you want? 3
What's your favorite ski resort? Stevens Pass
What month would you like to travel? December


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests
1,Crystal Mountain Washington,Crystal Mountain,Washington,7012,3100,4400,99,99,8,31,32,29,293.411765,210.361111
2,Bridger Bowl,Bozeman,Montana,8700,2600,6100,40,85,12,28,18,42,273.617647,195.285714
3,Stowe Mountain,Stowe,Vermont,4395,2360,2035,55,46,16,55,15,15,354.066667,231.0


#### Raghava's Results

In [185]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Crested Butte Mountain
What month would you like to travel? February


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,feb_mean_4_guests,feb_mean_2_guests
1,Keystone,Keystone,Colorado,12408,3128,9280,390,131,0,1,0,16,43,41,0,310.588235,239.885714
2,Heavenly Mountain,Stateline,California,10067,3500,7170,225,189,0,1,0,7,60,27,5,366.735294,216.857143
3,Park City Mountain,Park City,Utah,10026,3226,6800,430,153,0,1,0,8,41,28,23,425.0,226.028571


In [186]:
collaborative_model()

Name: Raghava Kamalesh
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
27,Snowbird,Utah,Snowbird,94.0,110.0,11000.0,3240.0,7760.0,1.0,0.0,1.0,43.0,25.0,24.0,4.964686
2,Steamboat,Colorado,Steamboat Springs,106.0,167.0,10568.0,3668.0,6900.0,1.0,0.0,0.0,40.0,43.0,5.0,4.61208
11,Jackson Hole,Wyoming,Teton Village,140.0,194.0,10450.0,4139.0,6311.0,1.0,0.0,1.0,38.0,41.0,17.0,4.555539


In [187]:
hybrid_model()

Name: Raghava Kamalesh
How many resort recommendations do you want? 3
What's your favorite ski resort? Crested Butte Mountain
What month would you like to travel? February


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,feb_mean_4_guests,feb_mean_2_guests
1,Park City Mountain,Park City,Utah,10026,3226,6800,430,153,8,41,28,23,425.0,226.028571
2,Steamboat,Steamboat Springs,Colorado,10568,3668,6900,106,167,12,43,40,5,347.971429,276.2
3,Taos Ski Valley,Taos Ski Valley,New Mexico,12481,3281,9200,150,150,14,16,30,40,378.285714,195.8


### Joseph's Results

In [188]:
collaborative_model()

Name: Joseph Lewis
How many resort recommendations do you want? 3


Unnamed: 0,ski_resort,state,city,adult_weekend,adult_weekday,sumt,drop,base,ikon,epic,mountain_collective,advanced_runs,intermediate_runs,expert_runs,predicted_rating
114,Whiteface Mountain,New York,Wilngton,90.0,90.0,4650.0,3430.0,1220.0,0.0,0.0,0.0,31.0,46.0,0.0,4.572699
10,Taos Ski Valley,New Mexico,Taos Ski Valley,150.0,150.0,12481.0,3281.0,9200.0,1.0,0.0,1.0,30.0,16.0,40.0,4.488344
42,Snowbasin,Utah,Huntsville,109.0,89.0,9350.0,2900.0,6450.0,1.0,0.0,1.0,52.0,33.0,6.0,4.385037


In [189]:
content_model()

How many resort recommendations do you want? 3
What's your favorite ski resort? Whiteface Mountain
What month would you like to travel? January


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,ikon,epic,mountain_collective,beginner_runs,intermediate_runs,advanced_runs,expert_runs,jan_mean_4_guests,jan_mean_2_guests
1,Wildcat Mountain,Jackson,New Hampshire,4062,2112,1950,55,67,0,0,0,21,46,33,0,126.36,115.685714
2,Bogus Basin,Boise,Idaho,7582,1800,5800,49,49,0,0,0,10,44,36,10,162.542857,97.382353
3,Ragged Mountain,Danbury,New Hampshire,2250,1250,1000,55,60,0,0,0,30,26,37,7,120.8,111.571429


In [190]:
hybrid_model()

Name: Joseph Lewis
How many resort recommendations do you want? 3
What's your favorite ski resort? Whiteface Mountain
What month would you like to travel? February


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,feb_mean_4_guests,feb_mean_2_guests
1,Wildcat Mountain,Jackson,New Hampshire,4062,2112,1950,55,67,21,46,33,0,128.64,115.057143
2,Bogus Basin,Boise,Idaho,7582,1800,5800,49,49,10,44,36,10,154.371429,98.428571
3,Beaver Mountain,Logan,Utah,8860,1600,7232,35,40,0,0,0,0,154.314286,102.941176


# Collaborative Model #2

In this hybrid model, content based filtering is the primary filter and collaborative system is secondary. This places a larger emphasis on user inputs.

In [174]:
def hybrid_model_content():
    # User inputs
    user = str(input('Name: '))
    n_recs = int(input('How many resort recommendations do you want? '))
    mountain_name = str(input("What's your favorite ski resort? "))
    travel_date = str(input('What month would you like to travel? '))
    mtn_pass = str(input('Are you using a multi-resort pass?  '))
    
    # Content-based model
    y = final_content_df.loc[[mountain_name]]
    cos_sim = cosine_similarity(final_content_df, y)
    cos_sim_df = pd.DataFrame(data=cos_sim, index=final_content_df.index)
    cos_sim_df.sort_values(by=0, ascending=False, inplace=True)
    cos_sim_df = cos_sim_df.reset_index()

    #making list for column names
    rec_list = []
    
    #grabbing rows from content_matrix for final output
    for x in cos_sim_df['ski_resort']:
        rec_df = content_matrix.loc[[x]]  
        rec_list.append(rec_df)  #

    rec_df = pd.concat(rec_list)

    #Concatenate all the dataframes in rec_list into a single dataframe
    concat_df = rec_df[["city", "state", "sumt", "drop", "base", "adult_weekend", "adult_weekday",
                           "beginner_runs", "intermediate_runs", "advanced_runs", "expert_runs",
                        "ikon", "epic", "mountain_collective"]]
    concat_df = concat_df.reset_index()

    #filtering based on month to return airbnb prices and turning into dataframe
    travel_date = travel_date.lower()

    month = ["december", "january", "february", "march", "april", "may"]
    month_abv = ["dec", "jan", "feb", "mar", "apr", "may"]

    selected_columns = []
    for x, y in zip(month_abv, month):
        if travel_date == y:
            selected_columns = [x + "_mean_4_guests", x + "_mean_2_guests"]

    result = rec_df[selected_columns]
    result = result.reset_index()                        
    content_recommendations = pd.merge(concat_df, result, on="ski_resort")
    
    #adding mountain fil
    if mtn_pass == "Ikon":
        content_recommendations = content_recommendations.loc[content_recommendations['ikon'] == 1]
    elif mtn_pass == "Epic":
        content_recommendations = content_recommendations.loc[content_recommendations['epic'] == 1]
    elif mtn_pass == "Mountain_collective":
        content_recommendations = content_recommendations.loc[content_recommendations['mountain_collective'] == 1]
    elif mtn_pass == "No":
        pass
    
    content_recommendations = content_recommendations[content_recommendations.ski_resort != mountain_name].head(20)

    # Collaborative model
    have_rated = list(user_df.loc[user, 'ski_resort'])
    not_rated = final_user_df.copy()
    not_rated = not_rated.loc[~not_rated['ski_resort'].isin(have_rated)]
    not_rated = not_rated.drop_duplicates(subset=['ski_resort'])
    not_rated.reset_index(inplace=True)
    not_rated['predicted_rating'] = not_rated['ski_resort'].apply(lambda x: algo.predict(user, x).est)
    not_rated.sort_values(by='predicted_rating', ascending=False, inplace=True)
    collaborative_recommendations = not_rated[['ski_resort', 'predicted_rating']]

    # Combine content-based and collaborative recommendations
    combined_recommendations = pd.merge(content_recommendations, collaborative_recommendations, on='ski_resort', how='left')
    combined_recommendations = combined_recommendations.drop_duplicates(subset=['ski_resort'])
    combined_recommendations.sort_values(by='predicted_rating', ascending=False, inplace=True)
    combined_recommendations.drop(columns=['ikon', 'mountain_collective', 'epic'], inplace=True)
    return combined_recommendations.head(n_recs)

In [175]:
hybrid_model_content()

Name: Stephanie Ciaccia
How many resort recommendations do you want? 3
What's your favorite ski resort? Park City Mountain
What month would you like to travel? December
Are you using a multi-resort pass?  Epic


Unnamed: 0,ski_resort,city,state,sumt,drop,base,adult_weekend,adult_weekday,beginner_runs,intermediate_runs,advanced_runs,expert_runs,dec_mean_4_guests,dec_mean_2_guests,predicted_rating
1,Keystone,Keystone,Colorado,12408,3128,9280,390,131,16,43,41,0,229.142857,172.514286,4.404293
13,Kirkwood,Kirkwood,California,9800,2000,7800,55,46,0,0,0,0,371.53125,304.967742,4.362896
4,Crested Butte Mountain,Mt. Crested Butte,Colorado,12162,3062,9375,294,96,14,25,25,36,255.323529,150.625,4.173471
