###### This notebook was a testing ground for seeing if we could use KNN to recommend parks to visit based on a user's favorite park. The inspiration for some of the code was a movie recommender app: https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-1-knn-item-based-collaborative-filtering-637969614ea. We also used this notebook for building a few csv files that we further transformed before using them in HTML tables and JSON objects throughout our project.

##### First we imported dependencies and libraries, and did some data transformation in order to create a matrix for our KNN-based recommendation functions to run on.

In [1]:
# import dependencies
import pandas as pd
# utils import
from fuzzywuzzy import fuzz
# data science imports
import math
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

# configure file paths
parks_filename = 'parks.csv'
ratings_filename = 'National Park Survey (Responses) - Form Responses.csv'

# read data into dataframes
df_parks = pd.read_csv(
    parks_filename,
    usecols=['parkID', 'surveyName'],
    dtype={'parkID': 'int32', 'surveyName': 'str'})

df_ratings = pd.read_csv(
    ratings_filename)



In [2]:
# Check the header of the survey results dataframe
df_ratings.head()

Unnamed: 0,Timestamp,Which of these landscapes would you most enjoy while visiting a National Park?,Which of these climates would you most enjoy while visiting a National Park?,Which of these activities would you most enjoy participating in while visiting a National Park?,"Please list any other features/requirements you would be interested in while visiting a National Park (e.g, family-friendly, group tours, ADA compliance, lodging).","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Acadia]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [American Samoa]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Arches]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Badlands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Big Bend]",...,"Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Shenandoah]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Theodore Roosevelt]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Virgin Islands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Voyageurs]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [White Sands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Wind Cave]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Wrangell-St. Elias]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Yellowstone]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Yosemite]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Zion]"
0,7/28/2020 22:33:08,Mountains,Hot,"Water Activities (swimming, boating)",,3.0,,,,,...,,,,,,,,,4.0,
1,7/28/2020 22:36:48,Forest,Hot,"Water Activities (swimming, boating)","Crowd-free, family-friendly",,,4.0,,,...,,,,,,,,5.0,5.0,5.0
2,7/28/2020 22:45:09,Forest,Cold,"Water Activities (swimming, boating)",,,,,,,...,,4.5,,,,,,4.5,,
3,7/28/2020 22:49:17,Forest,Rain,"Land Activities (bouldering, hiking)",,,,,,,...,,,,,,,,3.5,4.5,
4,7/28/2020 22:53:59,Forest,Cold,"Leisure Activities (birdwatching, stargazing)",ADA compliance,,,,,,...,,,,,,,,,,


In [3]:
# remove the non-rating based columns of the survey dataframe
df_ratings.drop(df_ratings.columns[[0,1,2,3,4]], axis = 1, inplace = True)

In [4]:
# check the header to ensure only ratings are displaying
df_ratings.head()

Unnamed: 0,"Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Acadia]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [American Samoa]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Arches]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Badlands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Big Bend]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Biscayne]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Black Canyon of the Gunnison]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Bryce Canyon]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Canyonlands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Capitol Reef]",...,"Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Shenandoah]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Theodore Roosevelt]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Virgin Islands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Voyageurs]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [White Sands]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Wind Cave]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Wrangell-St. Elias]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Yellowstone]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Yosemite]","Please rate how much you enjoyed each park on a scale of 0.5 (hated) to 5 (loved). Skip any parks you did not visit. Note: depending on browser, you may need to scroll to right to see 5.0. [Zion]"
0,3.0,,,,,,,,,,...,,,,,,,,,4.0,
1,,,4.0,,,,,5.0,,3.5,...,,,,,,,,5.0,5.0,5.0
2,,,,,,,,,,,...,,4.5,,,,,,4.5,,
3,,,,,,,,,,,...,,,,,,,,3.5,4.5,
4,,,,,,,,,,,...,,,,,,,,,,


In [5]:
# generate a number list to iterate through the columns of the ratings dataframe
number_list = np.arange(0,62,1)

In [6]:
# build a list of rating data container each userID, parkID (e.g., column position), and rating
ratings_list = []
for index, row in df_ratings.iterrows():
    for i in number_list:
        new_rating = [index + 1, i + 1, row[i]]        
        print(new_rating)
        ratings_list.append(new_rating)

[1, 1, 3.0]
[1, 2, nan]
[1, 3, nan]
[1, 4, nan]
[1, 5, nan]
[1, 6, nan]
[1, 7, nan]
[1, 8, nan]
[1, 9, nan]
[1, 10, nan]
[1, 11, nan]
[1, 12, nan]
[1, 13, nan]
[1, 14, nan]
[1, 15, nan]
[1, 16, nan]
[1, 17, nan]
[1, 18, nan]
[1, 19, nan]
[1, 20, nan]
[1, 21, nan]
[1, 22, nan]
[1, 23, nan]
[1, 24, nan]
[1, 25, nan]
[1, 26, nan]
[1, 27, nan]
[1, 28, nan]
[1, 29, nan]
[1, 30, nan]
[1, 31, nan]
[1, 32, nan]
[1, 33, nan]
[1, 34, nan]
[1, 35, nan]
[1, 36, nan]
[1, 37, nan]
[1, 38, nan]
[1, 39, nan]
[1, 40, nan]
[1, 41, nan]
[1, 42, nan]
[1, 43, nan]
[1, 44, nan]
[1, 45, nan]
[1, 46, nan]
[1, 47, nan]
[1, 48, nan]
[1, 49, nan]
[1, 50, nan]
[1, 51, nan]
[1, 52, nan]
[1, 53, nan]
[1, 54, nan]
[1, 55, nan]
[1, 56, nan]
[1, 57, nan]
[1, 58, nan]
[1, 59, nan]
[1, 60, nan]
[1, 61, 4.0]
[1, 62, nan]
[2, 1, nan]
[2, 2, nan]
[2, 3, 4.0]
[2, 4, nan]
[2, 5, nan]
[2, 6, nan]
[2, 7, nan]
[2, 8, 5.0]
[2, 9, nan]
[2, 10, 3.5]
[2, 11, nan]
[2, 12, nan]
[2, 13, nan]
[2, 14, 4.5]
[2, 15, nan]
[2, 16, 3.5]
[2, 

[14, 34, nan]
[14, 35, 5.0]
[14, 36, nan]
[14, 37, nan]
[14, 38, 4.5]
[14, 39, nan]
[14, 40, nan]
[14, 41, nan]
[14, 42, nan]
[14, 43, nan]
[14, 44, nan]
[14, 45, nan]
[14, 46, nan]
[14, 47, nan]
[14, 48, nan]
[14, 49, nan]
[14, 50, nan]
[14, 51, 4.5]
[14, 52, 4.5]
[14, 53, nan]
[14, 54, nan]
[14, 55, nan]
[14, 56, nan]
[14, 57, nan]
[14, 58, nan]
[14, 59, nan]
[14, 60, 4.5]
[14, 61, 5.0]
[14, 62, 5.0]
[15, 1, nan]
[15, 2, nan]
[15, 3, nan]
[15, 4, nan]
[15, 5, nan]
[15, 6, nan]
[15, 7, nan]
[15, 8, nan]
[15, 9, nan]
[15, 10, nan]
[15, 11, nan]
[15, 12, nan]
[15, 13, nan]
[15, 14, nan]
[15, 15, nan]
[15, 16, nan]
[15, 17, nan]
[15, 18, nan]
[15, 19, nan]
[15, 20, nan]
[15, 21, nan]
[15, 22, nan]
[15, 23, nan]
[15, 24, nan]
[15, 25, nan]
[15, 26, nan]
[15, 27, nan]
[15, 28, nan]
[15, 29, nan]
[15, 30, nan]
[15, 31, nan]
[15, 32, nan]
[15, 33, nan]
[15, 34, nan]
[15, 35, nan]
[15, 36, nan]
[15, 37, nan]
[15, 38, nan]
[15, 39, nan]
[15, 40, nan]
[15, 41, nan]
[15, 42, nan]
[15, 43, nan]
[

[32, 33, nan]
[32, 34, nan]
[32, 35, nan]
[32, 36, nan]
[32, 37, nan]
[32, 38, nan]
[32, 39, nan]
[32, 40, nan]
[32, 41, nan]
[32, 42, nan]
[32, 43, nan]
[32, 44, nan]
[32, 45, nan]
[32, 46, nan]
[32, 47, nan]
[32, 48, nan]
[32, 49, nan]
[32, 50, nan]
[32, 51, nan]
[32, 52, nan]
[32, 53, nan]
[32, 54, nan]
[32, 55, nan]
[32, 56, nan]
[32, 57, nan]
[32, 58, nan]
[32, 59, nan]
[32, 60, nan]
[32, 61, nan]
[32, 62, nan]
[33, 1, nan]
[33, 2, nan]
[33, 3, nan]
[33, 4, nan]
[33, 5, nan]
[33, 6, nan]
[33, 7, nan]
[33, 8, nan]
[33, 9, nan]
[33, 10, nan]
[33, 11, nan]
[33, 12, nan]
[33, 13, nan]
[33, 14, nan]
[33, 15, nan]
[33, 16, nan]
[33, 17, nan]
[33, 18, nan]
[33, 19, 4.0]
[33, 20, nan]
[33, 21, nan]
[33, 22, nan]
[33, 23, nan]
[33, 24, nan]
[33, 25, nan]
[33, 26, nan]
[33, 27, nan]
[33, 28, 5.0]
[33, 29, nan]
[33, 30, nan]
[33, 31, nan]
[33, 32, nan]
[33, 33, nan]
[33, 34, nan]
[33, 35, nan]
[33, 36, nan]
[33, 37, nan]
[33, 38, nan]
[33, 39, nan]
[33, 40, nan]
[33, 41, nan]
[33, 42, nan]
[

[48, 40, nan]
[48, 41, nan]
[48, 42, nan]
[48, 43, 4.0]
[48, 44, nan]
[48, 45, nan]
[48, 46, nan]
[48, 47, 3.5]
[48, 48, nan]
[48, 49, nan]
[48, 50, nan]
[48, 51, nan]
[48, 52, nan]
[48, 53, nan]
[48, 54, nan]
[48, 55, nan]
[48, 56, nan]
[48, 57, nan]
[48, 58, nan]
[48, 59, nan]
[48, 60, nan]
[48, 61, 4.5]
[48, 62, 4.5]
[49, 1, nan]
[49, 2, nan]
[49, 3, 4.0]
[49, 4, 3.0]
[49, 5, nan]
[49, 6, nan]
[49, 7, nan]
[49, 8, 4.5]
[49, 9, 4.0]
[49, 10, 5.0]
[49, 11, nan]
[49, 12, nan]
[49, 13, nan]
[49, 14, nan]
[49, 15, nan]
[49, 16, nan]
[49, 17, nan]
[49, 18, nan]
[49, 19, nan]
[49, 20, nan]
[49, 21, 3.0]
[49, 22, nan]
[49, 23, nan]
[49, 24, 3.5]
[49, 25, 5.0]
[49, 26, 4.5]
[49, 27, nan]
[49, 28, nan]
[49, 29, nan]
[49, 30, nan]
[49, 31, nan]
[49, 32, nan]
[49, 33, nan]
[49, 34, nan]
[49, 35, nan]
[49, 36, nan]
[49, 37, nan]
[49, 38, nan]
[49, 39, nan]
[49, 40, nan]
[49, 41, nan]
[49, 42, 2.5]
[49, 43, nan]
[49, 44, nan]
[49, 45, nan]
[49, 46, nan]
[49, 47, nan]
[49, 48, nan]
[49, 49, nan]
[

[62, 32, nan]
[62, 33, nan]
[62, 34, nan]
[62, 35, nan]
[62, 36, nan]
[62, 37, nan]
[62, 38, nan]
[62, 39, nan]
[62, 40, nan]
[62, 41, nan]
[62, 42, nan]
[62, 43, nan]
[62, 44, nan]
[62, 45, nan]
[62, 46, nan]
[62, 47, nan]
[62, 48, nan]
[62, 49, 5.0]
[62, 50, 5.0]
[62, 51, nan]
[62, 52, nan]
[62, 53, nan]
[62, 54, nan]
[62, 55, nan]
[62, 56, nan]
[62, 57, nan]
[62, 58, nan]
[62, 59, nan]
[62, 60, nan]
[62, 61, 5.0]
[62, 62, nan]
[63, 1, nan]
[63, 2, nan]
[63, 3, nan]
[63, 4, nan]
[63, 5, nan]
[63, 6, nan]
[63, 7, nan]
[63, 8, nan]
[63, 9, nan]
[63, 10, nan]
[63, 11, nan]
[63, 12, nan]
[63, 13, nan]
[63, 14, nan]
[63, 15, nan]
[63, 16, nan]
[63, 17, nan]
[63, 18, nan]
[63, 19, nan]
[63, 20, nan]
[63, 21, nan]
[63, 22, nan]
[63, 23, nan]
[63, 24, nan]
[63, 25, nan]
[63, 26, nan]
[63, 27, nan]
[63, 28, nan]
[63, 29, nan]
[63, 30, nan]
[63, 31, 1.5]
[63, 32, nan]
[63, 33, nan]
[63, 34, nan]
[63, 35, nan]
[63, 36, nan]
[63, 37, nan]
[63, 38, nan]
[63, 39, nan]
[63, 40, nan]
[63, 41, nan]
[

[72, 59, nan]
[72, 60, nan]
[72, 61, 4.5]
[72, 62, 4.5]
[73, 1, 4.0]
[73, 2, nan]
[73, 3, nan]
[73, 4, nan]
[73, 5, nan]
[73, 6, nan]
[73, 7, nan]
[73, 8, nan]
[73, 9, nan]
[73, 10, nan]
[73, 11, nan]
[73, 12, nan]
[73, 13, nan]
[73, 14, nan]
[73, 15, nan]
[73, 16, nan]
[73, 17, nan]
[73, 18, nan]
[73, 19, nan]
[73, 20, nan]
[73, 21, nan]
[73, 22, nan]
[73, 23, nan]
[73, 24, nan]
[73, 25, nan]
[73, 26, nan]
[73, 27, nan]
[73, 28, nan]
[73, 29, nan]
[73, 30, nan]
[73, 31, nan]
[73, 32, nan]
[73, 33, nan]
[73, 34, nan]
[73, 35, nan]
[73, 36, nan]
[73, 37, nan]
[73, 38, nan]
[73, 39, nan]
[73, 40, nan]
[73, 41, nan]
[73, 42, nan]
[73, 43, nan]
[73, 44, nan]
[73, 45, nan]
[73, 46, nan]
[73, 47, nan]
[73, 48, nan]
[73, 49, nan]
[73, 50, nan]
[73, 51, nan]
[73, 52, nan]
[73, 53, nan]
[73, 54, nan]
[73, 55, nan]
[73, 56, nan]
[73, 57, nan]
[73, 58, nan]
[73, 59, nan]
[73, 60, nan]
[73, 61, nan]
[73, 62, nan]
[74, 1, 4.5]
[74, 2, nan]
[74, 3, nan]
[74, 4, nan]
[74, 5, nan]
[74, 6, nan]
[74, 7,

[88, 58, nan]
[88, 59, nan]
[88, 60, 4.0]
[88, 61, nan]
[88, 62, nan]
[89, 1, nan]
[89, 2, nan]
[89, 3, 4.5]
[89, 4, 4.0]
[89, 5, nan]
[89, 6, nan]
[89, 7, nan]
[89, 8, nan]
[89, 9, nan]
[89, 10, nan]
[89, 11, 4.5]
[89, 12, nan]
[89, 13, nan]
[89, 14, nan]
[89, 15, nan]
[89, 16, nan]
[89, 17, nan]
[89, 18, nan]
[89, 19, nan]
[89, 20, nan]
[89, 21, nan]
[89, 22, nan]
[89, 23, 5.0]
[89, 24, 4.0]
[89, 25, 4.5]
[89, 26, nan]
[89, 27, nan]
[89, 28, nan]
[89, 29, nan]
[89, 30, nan]
[89, 31, nan]
[89, 32, nan]
[89, 33, 3.5]
[89, 34, nan]
[89, 35, nan]
[89, 36, nan]
[89, 37, nan]
[89, 38, 3.5]
[89, 39, nan]
[89, 40, nan]
[89, 41, nan]
[89, 42, 4.0]
[89, 43, 4.0]
[89, 44, nan]
[89, 45, nan]
[89, 46, nan]
[89, 47, 3.0]
[89, 48, nan]
[89, 49, nan]
[89, 50, 3.5]
[89, 51, nan]
[89, 52, nan]
[89, 53, nan]
[89, 54, nan]
[89, 55, nan]
[89, 56, nan]
[89, 57, nan]
[89, 58, 4.0]
[89, 59, nan]
[89, 60, 3.5]
[89, 61, 4.0]
[89, 62, nan]
[90, 1, nan]
[90, 2, nan]
[90, 3, nan]
[90, 4, nan]
[90, 5, nan]
[90, 6

[99, 62, 4.0]
[100, 1, nan]
[100, 2, nan]
[100, 3, 1.5]
[100, 4, nan]
[100, 5, nan]
[100, 6, nan]
[100, 7, 1.5]
[100, 8, 1.5]
[100, 9, 1.5]
[100, 10, nan]
[100, 11, nan]
[100, 12, nan]
[100, 13, nan]
[100, 14, 1.0]
[100, 15, nan]
[100, 16, nan]
[100, 17, nan]
[100, 18, nan]
[100, 19, 1.5]
[100, 20, nan]
[100, 21, nan]
[100, 22, 1.0]
[100, 23, 1.5]
[100, 24, 1.5]
[100, 25, 1.5]
[100, 26, 1.5]
[100, 27, nan]
[100, 28, nan]
[100, 29, nan]
[100, 30, 1.0]
[100, 31, 1.0]
[100, 32, nan]
[100, 33, 1.0]
[100, 34, nan]
[100, 35, 1.5]
[100, 36, nan]
[100, 37, nan]
[100, 38, 1.5]
[100, 39, nan]
[100, 40, nan]
[100, 41, 1.0]
[100, 42, nan]
[100, 43, 1.5]
[100, 44, 1.5]
[100, 45, 1.5]
[100, 46, 1.5]
[100, 47, nan]
[100, 48, nan]
[100, 49, 1.5]
[100, 50, 1.5]
[100, 51, 1.5]
[100, 52, 1.0]
[100, 53, nan]
[100, 54, nan]
[100, 55, nan]
[100, 56, nan]
[100, 57, nan]
[100, 58, 1.5]
[100, 59, nan]
[100, 60, 1.5]
[100, 61, 1.5]
[100, 62, 1.5]
[101, 1, 4.5]
[101, 2, nan]
[101, 3, 3.5]
[101, 4, 4.0]
[101, 5, 

[117, 45, nan]
[117, 46, 4.0]
[117, 47, nan]
[117, 48, 3.5]
[117, 49, nan]
[117, 50, nan]
[117, 51, nan]
[117, 52, nan]
[117, 53, nan]
[117, 54, nan]
[117, 55, 4.5]
[117, 56, nan]
[117, 57, nan]
[117, 58, nan]
[117, 59, nan]
[117, 60, 4.5]
[117, 61, 4.5]
[117, 62, 4.5]
[118, 1, nan]
[118, 2, nan]
[118, 3, nan]
[118, 4, nan]
[118, 5, nan]
[118, 6, nan]
[118, 7, nan]
[118, 8, nan]
[118, 9, nan]
[118, 10, nan]
[118, 11, nan]
[118, 12, nan]
[118, 13, nan]
[118, 14, 4.0]
[118, 15, nan]
[118, 16, nan]
[118, 17, nan]
[118, 18, nan]
[118, 19, nan]
[118, 20, nan]
[118, 21, nan]
[118, 22, nan]
[118, 23, nan]
[118, 24, 4.0]
[118, 25, nan]
[118, 26, nan]
[118, 27, nan]
[118, 28, nan]
[118, 29, nan]
[118, 30, nan]
[118, 31, nan]
[118, 32, nan]
[118, 33, nan]
[118, 34, nan]
[118, 35, nan]
[118, 36, nan]
[118, 37, nan]
[118, 38, nan]
[118, 39, nan]
[118, 40, nan]
[118, 41, nan]
[118, 42, nan]
[118, 43, nan]
[118, 44, nan]
[118, 45, nan]
[118, 46, nan]
[118, 47, nan]
[118, 48, nan]
[118, 49, 5.0]
[118

[133, 39, nan]
[133, 40, nan]
[133, 41, nan]
[133, 42, nan]
[133, 43, nan]
[133, 44, nan]
[133, 45, nan]
[133, 46, nan]
[133, 47, nan]
[133, 48, nan]
[133, 49, nan]
[133, 50, nan]
[133, 51, nan]
[133, 52, nan]
[133, 53, nan]
[133, 54, nan]
[133, 55, nan]
[133, 56, nan]
[133, 57, nan]
[133, 58, nan]
[133, 59, nan]
[133, 60, nan]
[133, 61, nan]
[133, 62, nan]
[134, 1, nan]
[134, 2, nan]
[134, 3, 5.0]
[134, 4, nan]
[134, 5, nan]
[134, 6, nan]
[134, 7, nan]
[134, 8, 5.0]
[134, 9, 5.0]
[134, 10, 5.0]
[134, 11, nan]
[134, 12, nan]
[134, 13, nan]
[134, 14, nan]
[134, 15, nan]
[134, 16, 5.0]
[134, 17, 5.0]
[134, 18, nan]
[134, 19, nan]
[134, 20, nan]
[134, 21, nan]
[134, 22, 5.0]
[134, 23, nan]
[134, 24, 5.0]
[134, 25, nan]
[134, 26, nan]
[134, 27, nan]
[134, 28, nan]
[134, 29, nan]
[134, 30, nan]
[134, 31, nan]
[134, 32, nan]
[134, 33, nan]
[134, 34, nan]
[134, 35, 5.0]
[134, 36, nan]
[134, 37, 5.0]
[134, 38, 5.0]
[134, 39, nan]
[134, 40, nan]
[134, 41, 5.0]
[134, 42, 5.0]
[134, 43, nan]
[134

[147, 45, 4.5]
[147, 46, 5.0]
[147, 47, nan]
[147, 48, nan]
[147, 49, 3.5]
[147, 50, nan]
[147, 51, nan]
[147, 52, 3.0]
[147, 53, nan]
[147, 54, nan]
[147, 55, nan]
[147, 56, nan]
[147, 57, nan]
[147, 58, nan]
[147, 59, nan]
[147, 60, nan]
[147, 61, 3.0]
[147, 62, 2.0]
[148, 1, nan]
[148, 2, nan]
[148, 3, nan]
[148, 4, nan]
[148, 5, nan]
[148, 6, nan]
[148, 7, nan]
[148, 8, nan]
[148, 9, nan]
[148, 10, nan]
[148, 11, nan]
[148, 12, nan]
[148, 13, nan]
[148, 14, nan]
[148, 15, nan]
[148, 16, nan]
[148, 17, nan]
[148, 18, nan]
[148, 19, nan]
[148, 20, nan]
[148, 21, nan]
[148, 22, nan]
[148, 23, 4.5]
[148, 24, 3.5]
[148, 25, 4.0]
[148, 26, nan]
[148, 27, nan]
[148, 28, nan]
[148, 29, nan]
[148, 30, nan]
[148, 31, nan]
[148, 32, nan]
[148, 33, nan]
[148, 34, nan]
[148, 35, nan]
[148, 36, nan]
[148, 37, nan]
[148, 38, nan]
[148, 39, nan]
[148, 40, nan]
[148, 41, nan]
[148, 42, nan]
[148, 43, nan]
[148, 44, 3.0]
[148, 45, nan]
[148, 46, 4.5]
[148, 47, nan]
[148, 48, nan]
[148, 49, 4.0]
[148

[161, 59, nan]
[161, 60, 4.0]
[161, 61, 5.0]
[161, 62, 5.0]
[162, 1, nan]
[162, 2, nan]
[162, 3, nan]
[162, 4, nan]
[162, 5, nan]
[162, 6, nan]
[162, 7, nan]
[162, 8, nan]
[162, 9, nan]
[162, 10, nan]
[162, 11, nan]
[162, 12, nan]
[162, 13, nan]
[162, 14, 1.5]
[162, 15, 1.5]
[162, 16, nan]
[162, 17, nan]
[162, 18, nan]
[162, 19, 1.5]
[162, 20, nan]
[162, 21, nan]
[162, 22, nan]
[162, 23, nan]
[162, 24, 1.5]
[162, 25, nan]
[162, 26, nan]
[162, 27, nan]
[162, 28, 1.5]
[162, 29, nan]
[162, 30, nan]
[162, 31, nan]
[162, 32, nan]
[162, 33, nan]
[162, 34, nan]
[162, 35, nan]
[162, 36, nan]
[162, 37, nan]
[162, 38, nan]
[162, 39, nan]
[162, 40, nan]
[162, 41, nan]
[162, 42, nan]
[162, 43, nan]
[162, 44, nan]
[162, 45, nan]
[162, 46, nan]
[162, 47, nan]
[162, 48, nan]
[162, 49, nan]
[162, 50, nan]
[162, 51, nan]
[162, 52, nan]
[162, 53, nan]
[162, 54, nan]
[162, 55, nan]
[162, 56, nan]
[162, 57, nan]
[162, 58, nan]
[162, 59, nan]
[162, 60, nan]
[162, 61, nan]
[162, 62, nan]
[163, 1, nan]
[163,

[174, 44, nan]
[174, 45, nan]
[174, 46, nan]
[174, 47, nan]
[174, 48, nan]
[174, 49, 5.0]
[174, 50, nan]
[174, 51, nan]
[174, 52, 5.0]
[174, 53, nan]
[174, 54, nan]
[174, 55, nan]
[174, 56, nan]
[174, 57, nan]
[174, 58, nan]
[174, 59, nan]
[174, 60, nan]
[174, 61, 5.0]
[174, 62, nan]
[175, 1, nan]
[175, 2, nan]
[175, 3, nan]
[175, 4, nan]
[175, 5, nan]
[175, 6, nan]
[175, 7, nan]
[175, 8, nan]
[175, 9, nan]
[175, 10, nan]
[175, 11, nan]
[175, 12, nan]
[175, 13, nan]
[175, 14, nan]
[175, 15, nan]
[175, 16, nan]
[175, 17, nan]
[175, 18, nan]
[175, 19, nan]
[175, 20, nan]
[175, 21, nan]
[175, 22, nan]
[175, 23, 4.5]
[175, 24, nan]
[175, 25, 5.0]
[175, 26, nan]
[175, 27, nan]
[175, 28, nan]
[175, 29, nan]
[175, 30, nan]
[175, 31, nan]
[175, 32, nan]
[175, 33, 2.5]
[175, 34, nan]
[175, 35, nan]
[175, 36, nan]
[175, 37, nan]
[175, 38, nan]
[175, 39, nan]
[175, 40, nan]
[175, 41, nan]
[175, 42, nan]
[175, 43, nan]
[175, 44, nan]
[175, 45, nan]
[175, 46, nan]
[175, 47, nan]
[175, 48, 3.5]
[175

[201, 48, nan]
[201, 49, nan]
[201, 50, 4.0]
[201, 51, nan]
[201, 52, 3.5]
[201, 53, nan]
[201, 54, nan]
[201, 55, nan]
[201, 56, nan]
[201, 57, nan]
[201, 58, nan]
[201, 59, nan]
[201, 60, 4.0]
[201, 61, 5.0]
[201, 62, 1.5]
[202, 1, 4.5]
[202, 2, nan]
[202, 3, 4.5]
[202, 4, nan]
[202, 5, nan]
[202, 6, nan]
[202, 7, nan]
[202, 8, nan]
[202, 9, nan]
[202, 10, nan]
[202, 11, nan]
[202, 12, nan]
[202, 13, nan]
[202, 14, 4.5]
[202, 15, 3.0]
[202, 16, 4.5]
[202, 17, nan]
[202, 18, nan]
[202, 19, nan]
[202, 20, nan]
[202, 21, nan]
[202, 22, nan]
[202, 23, nan]
[202, 24, nan]
[202, 25, 4.5]
[202, 26, nan]
[202, 27, nan]
[202, 28, nan]
[202, 29, nan]
[202, 30, nan]
[202, 31, nan]
[202, 32, nan]
[202, 33, nan]
[202, 34, 4.5]
[202, 35, 4.5]
[202, 36, nan]
[202, 37, nan]
[202, 38, nan]
[202, 39, nan]
[202, 40, nan]
[202, 41, 4.5]
[202, 42, nan]
[202, 43, nan]
[202, 44, nan]
[202, 45, nan]
[202, 46, nan]
[202, 47, nan]
[202, 48, nan]
[202, 49, nan]
[202, 50, nan]
[202, 51, nan]
[202, 52, 3.5]
[202

[220, 14, nan]
[220, 15, nan]
[220, 16, nan]
[220, 17, nan]
[220, 18, nan]
[220, 19, nan]
[220, 20, nan]
[220, 21, nan]
[220, 22, nan]
[220, 23, nan]
[220, 24, nan]
[220, 25, nan]
[220, 26, nan]
[220, 27, nan]
[220, 28, nan]
[220, 29, nan]
[220, 30, nan]
[220, 31, nan]
[220, 32, nan]
[220, 33, nan]
[220, 34, nan]
[220, 35, nan]
[220, 36, nan]
[220, 37, nan]
[220, 38, nan]
[220, 39, nan]
[220, 40, nan]
[220, 41, nan]
[220, 42, nan]
[220, 43, nan]
[220, 44, nan]
[220, 45, nan]
[220, 46, nan]
[220, 47, nan]
[220, 48, nan]
[220, 49, nan]
[220, 50, nan]
[220, 51, nan]
[220, 52, nan]
[220, 53, nan]
[220, 54, nan]
[220, 55, nan]
[220, 56, nan]
[220, 57, nan]
[220, 58, nan]
[220, 59, nan]
[220, 60, nan]
[220, 61, 5.0]
[220, 62, nan]
[221, 1, nan]
[221, 2, nan]
[221, 3, nan]
[221, 4, nan]
[221, 5, nan]
[221, 6, nan]
[221, 7, nan]
[221, 8, 5.0]
[221, 9, nan]
[221, 10, nan]
[221, 11, nan]
[221, 12, nan]
[221, 13, nan]
[221, 14, nan]
[221, 15, nan]
[221, 16, 5.0]
[221, 17, nan]
[221, 18, nan]
[221

In [7]:
# Create a dataframe from the ratings list  
df_new = pd.DataFrame(ratings_list, columns = ['userID', 'parkID','rating'])

In [8]:
# check the ratings list dataframe
df_new

Unnamed: 0,userID,parkID,rating
0,1,1,3.0
1,1,2,
2,1,3,
3,1,4,
4,1,5,
...,...,...,...
13697,221,58,
13698,221,59,
13699,221,60,
13700,221,61,5.0


In [9]:
# pivot ratings dataframe so that parks are indexed rather than users and assign to a new dataframe
df_park_features = df_new.pivot(
    index='parkID',
    columns='userID',
    values='rating'
).fillna(0)

In [10]:
# export just the rankings dataframe to csv
df_new.to_csv('parkRatings.csv')

In [11]:
# create a dataframe of user ratings grouped by park ID, take the average rating for each park, and then output to CSV
df_grouped = df_new.groupby(['parkID']).mean()
df_grouped.to_csv('parkAverages.csv')

In [12]:
# pivot and create park-user matrixx
park_user_mat = df_new.pivot(index='parkID', columns='userID', values='rating').fillna(0)
# create mapper from park name to index
park_to_idx = {
    park: i for i, park in 
    enumerate(list(df_parks.set_index('parkID').loc[park_user_mat.index].surveyName))
}

# transform matrix to scipy sparse matrix (optimizes for missing values)
park_user_mat_sparse = csr_matrix(park_user_mat.values)

###### Here we created our model, fit it, and used some functions for comparing user input to actual park names, then returning results (first via print, then via output)

In [13]:
%env JOBLIB_TEMP_FOLDER=/tmp
# define model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)
# fit
model_knn.fit(park_user_mat_sparse)

env: JOBLIB_TEMP_FOLDER=/tmp


NearestNeighbors(algorithm='brute', leaf_size=30, metric='cosine',
                 metric_params=None, n_jobs=-1, n_neighbors=20, p=2,
                 radius=1.0)

In [14]:
# function to return options for what a user may have tried to enter (in case they had typos)
def fuzzy_matching(mapper, fav_park, verbose=True):
    """
    return the closest match via fuzzy ratio. If no match found, return None
    
    Parameters
    ----------    
    mapper: dict, map park name to index of the park in data

    fav_park: str, name of user input park
    
    verbose: bool, print log if True

    Return
    ------
    index of the closest match
    """
    match_tuple = []
    # get match
    for surveyName, idx in mapper.items():
        ratio = fuzz.ratio(surveyName.lower(), fav_park.lower())
        if ratio >= 100:
            match_tuple.append((surveyName, idx, ratio))
    # sort
    match_tuple = sorted(match_tuple, key=lambda x: x[2])[::-1]
    if not match_tuple:
        print('Oops! No match is found')
        return
    if verbose:
        print('Found possible matches in our database: {0}\n'.format([x[0] for x in match_tuple]))
    return match_tuple[0][1]


# function to print top 10 recommendations for a given park
def make_recommendation(model_knn, data, mapper, fav_park, n_recommendations):
    """
    return top n similar park recommendations based on user's input park


    Parameters
    ----------
    model_knn: sklearn model, knn model

    data: park-user matrix

    mapper: dict, map park name to index of the park in data

    fav_park: str, name of user input park

    n_recommendations: int, top n recommendations

    Return
    ------
    list of top n similar park recommendations
    """
    # fit
    model_knn.fit(data)
    # get input park index
    print('You have input park:', fav_park)
    idx = fuzzy_matching(mapper, fav_park, verbose=True)
    # inference
    print('Recommendation system start to make inference')
    print('......\n')
    distances, indices = model_knn.kneighbors(data[idx], n_neighbors=n_recommendations+1)
    # get list of raw idx of recommendations
    raw_recommends = \
        sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())), key=lambda x: x[1])[:0:-1]
    # get reverse mapper
    reverse_mapper = {v: k for k, v in mapper.items()}
    # print recommendations
    print('Recommendations for {}:'.format(fav_park))
    for i, (idx, dist) in enumerate(raw_recommends):
        print('{0}: {1}, with distance of {2}'.format(i+1, reverse_mapper[idx], dist))

# function to return top 10 recommendations for a given park as a list
def make_recommendation_list(model_knn, data, mapper, fav_park, n_recommendations):
    """
    return top n similar park recommendations based on user's input park


    Parameters
    ----------
    model_knn: sklearn model, knn model

    data: park-user matrix

    mapper: dict, map park name to index of the park in data

    fav_park: str, name of user input park

    n_recommendations: int, top n recommendations

    Return
    ------
    list of top n similar park recommendations
    """
    # fit
    model_knn.fit(data)
    # get input park index
    idx = fuzzy_matching(mapper, fav_park, verbose=True)
    # inference
    
    distances, indices = model_knn.kneighbors(data[idx], n_neighbors=n_recommendations+1)
    # get list of raw idx of recommendations
    raw_recommends = \
        sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())), key=lambda x: x[1])[:0:-1]
    # get reverse mapper
    reverse_mapper = {v: k for k, v in mapper.items()}
    
    #initialize empty list
    rec_list = []
    
    for i, (idx, dist) in enumerate(raw_recommends):
        rec_list.append('{0}: {1}, {2}'.format(i+1, reverse_mapper[idx], dist))
    return rec_list

In [15]:
# test printing function on Acadia park
make_recommendation(
        model_knn=model_knn,
        data=park_user_mat_sparse,
        fav_park='Acadia',
        mapper=park_to_idx,
        n_recommendations=10)

You have input park: Acadia
Found possible matches in our database: ['Acadia']

Recommendation system start to make inference
......

Recommendations for Acadia:
1: Yosemite, with distance of 0.6311133588232232
2: Cuyahoga Valley, with distance of 0.6196384074702594
3: Dry Tortugas, with distance of 0.6020582269967976
4: Yellowstone, with distance of 0.5787418995717107
5: Zion, with distance of 0.5782640516407724
6: Everglades, with distance of 0.577053844759615
7: Rocky Mountain, with distance of 0.5735352769754769
8: Grand Canyon, with distance of 0.5223487648953334
9: Shenandoah, with distance of 0.5084027702001823
10: Great Smoky Mountains, with distance of 0.49715903976140896


###### At first we weren't sure if we would be able to run our ML code in real-time on our site, so we decided to store the recommendations for each park in a csv for future reference.

In [16]:
# create list of parks from df_parks
park_list = df_parks['surveyName'].to_list()

# initialize an empty list for storing recommendations for each park
store_recs = []

# iterate through park list and create and store park recommendations into a list of lists
for park in park_list:
    store = make_recommendation_list(
        model_knn=model_knn,
        data=park_user_mat_sparse,
        fav_park=park,
        mapper=park_to_idx,
        n_recommendations=10)
    store_recs.append(store)

Found possible matches in our database: ['Acadia']

Found possible matches in our database: ['American Samoa']

Found possible matches in our database: ['Arches']

Found possible matches in our database: ['Badlands']

Found possible matches in our database: ['Big Bend']

Found possible matches in our database: ['Biscayne']

Found possible matches in our database: ['Black Canyon of the Gunnison']

Found possible matches in our database: ['Bryce Canyon']

Found possible matches in our database: ['Canyonlands']

Found possible matches in our database: ['Capitol Reef']

Found possible matches in our database: ['Carlsbad Caverns']

Found possible matches in our database: ['Channel Islands']

Found possible matches in our database: ['Congaree']

Found possible matches in our database: ['Crater Lake']

Found possible matches in our database: ['Cuyahoga Valley']

Found possible matches in our database: ['Death Valley']

Found possible matches in our database: ['Denali']

Found possible matches

In [17]:
# check the results of the recommendation store
store_recs

[['1: Yosemite, 0.6311133588232232',
  '2: Cuyahoga Valley, 0.6196384074702594',
  '3: Dry Tortugas, 0.6020582269967976',
  '4: Yellowstone, 0.5787418995717107',
  '5: Zion, 0.5782640516407724',
  '6: Everglades, 0.577053844759615',
  '7: Rocky Mountain, 0.5735352769754769',
  '8: Grand Canyon, 0.5223487648953334',
  '9: Shenandoah, 0.5084027702001823',
  '10: Great Smoky Mountains, 0.49715903976140896'],
 ['1: North Cascades, 0.6541657493774591',
  '2: Gateway Arch, 0.6516334928541911',
  '3: Virgin Islands, 0.6506664941562508',
  '4: Wrangell-St. Elias, 0.6357610901945647',
  '5: Katmai, 0.6150998205402496',
  '6: White Sands, 0.5900466975871532',
  '7: Dry Tortugas, 0.5272083123435347',
  '8: Guadalupe Mountains, 0.4503502900706873',
  '9: Biscayne, 0.37230472096206046',
  '10: Gates of the Arctic, 0.33333333333333337'],
 ['1: Sequoia, 0.5171677051081278',
  '2: Joshua Tree, 0.5024240267864182',
  '3: Death Valley, 0.48192190990918804',
  '4: Mesa Verde, 0.4778478870896635',
  '5: C

In [18]:
# convert recommendations into a Series so it can be added to a DataFrame
store_series = pd.Series(store_recs)

In [19]:
# build new df of recommendations by adding recommendations to parks_df
df_recommendations = pd.concat([df_parks,store_series],axis=1)

In [20]:
# export recommendations df to csv
df_recommendations.to_csv('recommendations.csv')

###### When displaying a list of each park's recommendations within an HTML table, it actually looked best as a string with each recommendation on a new line, so we needed to split the output of our make_recommendation_list function into a list of strings rather than a list of lists.

In [21]:
numbers_to_10 = np.arange(0,10,1)
# create a number list to iterate through each list of recommended parks
number_list

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61])

In [22]:
# initialize an empty list for storing each park's recommendation string
all_string_recs = []

for i in number_list:
    # initialize an empty list for storing value names and recommendation strengths
    park_string_rec = []
    for j in numbers_to_10:
        # isolate each recommendation strength and store it as a percentage with one decimal
        rec_strength = round(float(store_recs[i][j].split(" ")[-1]) * 100,1)
        # isolate each park name and store it as a string
        park = ' '.join(store_recs[i][j].split(" ")[1:-1])
        # append the park name, recommendation strength to the recommendation string
        park_string_rec.append(park)   
        park_string_rec.append(str(rec_strength))
        park_string_rec.append("% match<br>")
    
    # join back together the components and append them to the list of park recommendations
    park_string_together = ''.join(park_string_rec)
    all_string_recs.append(park_string_together)

In [23]:
# convert the park recommendation string list into a series 
store_string_series = pd.Series(all_string_recs)

# check the results of the series of stored park recommendation strings.
# NOTE: we realized later that we needed to do some cleanup to add a space after the comma in each line
store_string_series

0     Yosemite,63.1% match<br>Cuyahoga Valley,62.0% ...
1     North Cascades,65.4% match<br>Gateway Arch,65....
2     Sequoia,51.7% match<br>Joshua Tree,50.2% match...
3     Kings Canyon,59.4% match<br>Arches,58.3% match...
4     Yosemite,66.9% match<br>Sequoia,66.2% match<br...
                            ...                        
57    Denali,70.0% match<br>Guadalupe Mountains,69.1...
58    Dry Tortugas,67.8% match<br>Denali,67.2% match...
59    Crater Lake,52.7% match<br>Joshua Tree,51.8% m...
60    Grand Teton,44.9% match<br>Arches,43.8% match<...
61    Kings Canyon,48.9% match<br>Yellowstone,48.8% ...
Length: 62, dtype: object

In [24]:
# build new df by adding recommendations to parks_df 
df_recommendation_strings = pd.concat([df_parks,store_string_series],axis=1)

# export recommendations df to csv
df_recommendation_strings.to_csv('recommendation_strings.csv')

###### Getting a sense of just how sparse our matrix is

In [25]:
# calcuate total number of entries in the park-user matrix
num_entries = park_user_mat.shape[0] * park_user_mat.shape[1]
# calculate total number of entries with zero values
num_zeros = (park_user_mat==0).sum(axis=1).sum()
# calculate ratio of number of zeros to number of entries
ratio_zeros = num_zeros / num_entries
print('About {:.2%} of ratings in our data is missing'.format(ratio_zeros))

About 83.76% of ratings in our data is missing
