In [2]:
!pip install haversine

Collecting haversine
  Downloading haversine-2.9.0-py2.py3-none-any.whl.metadata (5.8 kB)
Downloading haversine-2.9.0-py2.py3-none-any.whl (7.7 kB)
Installing collected packages: haversine
Successfully installed haversine-2.9.0


In [19]:
#import packages 
import pandas as pd
import haversine as hs
from haversine import Unit
from geopy.distance import distance
import numpy as np
import math

In [53]:
#Obtain locations dataframe from .csv file 
location_url = "https://raw.githubusercontent.com/cpflynn45/Final-Project/refs/heads/main/locations_df2_filtered.csv?token=GHSAT0AAAAAACYDCC5P2OUXGZSTHZQG6NGIZ23IF2Q"
locations_df = pd.read_csv(location_url)
print(locations_df.shape)

(37, 15)


In [54]:
#Obtain location review ratings dataframe from .csv file 
location_ratings_url = "https://raw.githubusercontent.com/cpflynn45/Final-Project/refs/heads/main/location_scores2.csv?token=GHSAT0AAAAAACYDCC5OA4WXQXADSG2PAYLSZ23IGDA"
reviews_df = pd.read_csv(location_ratings_url)
print(reviews_df.shape)

(37, 10)


In [153]:
pip install geopy

Collecting geopy
  Downloading geopy-2.4.1-py3-none-any.whl.metadata (6.8 kB)
Collecting geographiclib<3,>=1.52 (from geopy)
  Downloading geographiclib-2.0-py3-none-any.whl.metadata (1.4 kB)
Downloading geopy-2.4.1-py3-none-any.whl (125 kB)
Downloading geographiclib-2.0-py3-none-any.whl (40 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-2.0 geopy-2.4.1
Note: you may need to restart the kernel to use updated packages.


In [98]:
'''
The reccomendation class goes through a series of functions and ultimatley returns the 
top 3 reccomended study spot locations for the user, as well as the distance to the location, and the 
similarity score. The functions  
'''
class reccomendation:
    '''
    The function intializes variables such as the user's location, the dataframe including all the locations, 
    and the user's maximum distance they are willing to travel. 
    Inputs: The user's latitude and longitude coordinates, the dataframe including each study spot location, 
    and the max distance the user is willing to travel in miles. 
    '''
    def __init__(self, user_location_latitude, user_location_longitude, locations_dataframe, reviews_df, max_distance):
        #User's starting location using latitude and longitude coordinates. Convert to float. 
        self.user_location_latitude = float(user_location_latitude)
        self.user_location_longitude = float(user_location_longitude)
        self.user_location = (float(user_location_latitude), float(user_location_longitude))
        #Contains each study spot location data
        self.locations_dataframe = locations_dataframe

        #Contains each study spot's ratings 
        self.reviews_df = reviews_df
        #User's maximum distance range, in miles. 
        self.max_distance = max_distance

    '''
    The distance function loops through each location in the locations_dataframe and calculates
    the distance between the user and the location in miles. If the distance is within the user's maximum 
    range, the distance is appended to the distance_list. The locations dataframe is filtered based on 
    rows with a distance value. Then, the dataframe is sorted based on closest to furthest locations. 
    Output: A filtered locations dataframe only including locations in the user's maximum distance range, 
    sorted from closest to furthest location. 
    '''
    def distance(self):
        #Define loc1 as the user's location
        loc1 = self.user_location
        #Initialize list that will include distance between user and each location 
        distance_list = []

        #Loop through each location, using row as an index
        for index, row in self.locations_dataframe.iterrows(): 
            loc2_latitude = row['gps_coordinates.latitude']
            loc2_longitude = row['gps_coordinates.longitude']
            loc2 = (loc2_latitude, loc2_longitude)

            #Calculate distance between locations using haversine function, setting unit to miles
            #haversine calculates distance on a sphere on a 3D plane 
            calculated_distance = hs.haversine(loc1, loc2, unit=Unit.MILES)

            #Append calculated distance if within range, otherwise append "None" 
            if calculated_distance <= self.max_distance:
                distance_list.append(calculated_distance)
            else:
                distance_list.append(None)

        #Add the calculated distances to a new column in the dataframe
        self.locations_dataframe['distance'] = distance_list

        #Filter the dataframe to include distance only in user's range, ie distance
        #with values that are not "None". 
        filtered_df = self.locations_dataframe.dropna(subset=['distance'])

        #Sort the filtered dataframe by distance
        sorted_df = filtered_df.sort_values(by='distance')

        #Merge sorted_df and reviews_df on the shared column, "place_id", keeping only desired columns
        sorted_df = pd.merge(sorted_df, self.reviews_df, on='place_id', how='left')[['Place_Name', 'place_id', 
                            'Internet', 'Price', 'Lighting', 'Noise', 'Comfort', 'Food', 'Service', 'Atmosphere', 'distance',]]

    
    
        return sorted_df

    
    '''
    User inputs their ratings for each feature and the function 
    converts their ratings into a one row dataframe. 
    Input: user input scoring of importance of features from scale of 1-5 
    Output: user_ratings_df, a dataframe with one row and each column is a different feature 
    '''
    def user_input(self, type):
        if type == "Hard Requirement":
            print("Please input your minimum requirement rating for each feature")
        
        else:
            print("Please input your rating for each feature based on personal importance")

        print("Rate each feature on a scale of 1-5 (1 is least, 5 is greatest):")
        #Intiialize features list with each study spot feature
        features = ['Internet', 'Price', 'Lighting', 'Noise', 'Comfort', 'Food', 'Service', 'Atmosphere']
        #Initialize dictionary storing ratings for each feature
        user_ratings = {}
        
        #Loop through each feature in the features list and ask for user input on importance of each feature 
        for feature in features: 
            rating = int(input(f"Rate {feature} (1-5): "))
            #Add error to ensure ratings are on scale of 1-5 
            if rating < 1 or rating > 5: 
                raise ValueError("Rating must be within 1-5")
            #Store user's rating as the value for the key in the dictionary 
            user_ratings[feature] = rating
        
        #Convert the user ratings into a dataframe
        user_ratings_df = pd.DataFrame(user_ratings, index=[0])
        return user_ratings_df  
        
    def get_hard_requriement(self):
    
        return self.user_input("Hard Requirement")
    def get_importance_rating(self):
	
        return self.user_input("importance")

    def filtered_requirements(self, sorted_df, hard_req_df):
         excluded_columns = ["Place_Name"]
         
         for col in sorted_df.columns:
             if col not in excluded_columns:
                 sorted_df[col] = pd.to_numeric(sorted_df[col], errors='coerce')
         
         hard_req_df = hard_req_df.apply(pd.to_numeric, errors='coerce')

         # Filter rows based on hard requirements
         valid_rows = []
         for _, row in sorted_df.iterrows():
             row_valid = True
             for idx, column in enumerate(sorted_df.columns):
                 if pd.isna(row[column]) or pd.isna(hard_req_df.iloc[0, idx]) or idx == 0:
                     continue  # Skip NaN comparisons
                 if row[column] < hard_req_df.iloc[0, idx]:
                     row_valid = False
                 break
             if row_valid:
                 valid_rows.append(row)

         return pd.DataFrame(valid_rows, columns=sorted_df.columns)

        
    '''
    The comparison function converts the user_rating df to a numpy array and uses numpy linear algebra 
    to compare the similarity between the user's preferences 1D array and each study spot. 
    Input: user_ratings, a data frame scoring of importance of features from scale of 1-5 
    Output: comparison_value, a 1D numpy array with the values of how similar the ratings between the user and study spot is. 
    '''
    
    def compare(self, user_ratings, location_ratings_dataframe): 
        common_columns = ['Internet', 'Price', 'Lighting', 'Noise', 'Comfort', 'Food', 'Service', 'Atmosphere']
        
        comparison_column = []
        
        #Convert the user dataframe to numpy array of shape (1,8)
        common_user_numpy = user_ratings[common_columns].to_numpy().flatten()
       
        for index, row in location_ratings_dataframe.iterrows():
            #Convert the filtered dataframes to numpy arrays
            common_location_numpy = row[common_columns].to_numpy()

            #Normalize the ratings 
            common_location_normalized= common_location_numpy / np.linalg.norm(common_location_numpy)
            #(37,9)
            common_user_normalized = common_user_numpy / np.linalg.norm(common_user_numpy)
            #(1,9)
            comparison_value = np.dot(common_location_normalized, common_user_normalized.T)
            
            #Calculated weighted cosine similarity for location ratings 
            #weighted_location_ratings = common_location_numpy * common_user_numpy 

            #Add vale to list 
            comparison_column.append(comparison_value)

        
        return comparison_column

#Get user inputs for location, distance, and ratings
user_location_latitude = input("Enter the latitude coordinate of your starting point: ")
user_location_longitude = input("Enter the longitude coordinate of your starting point: ")
max_distance = int(input("How far are you willing to travel? (in miles): "))

#Create an instance of the `reccomendation` class
reccomendation_instance = reccomendation(user_location_latitude, user_location_longitude, locations_df, reviews_df, max_distance)


#Calculate distances and filter for the locations within specified range 
sorted_df = reccomendation_instance.distance()

#Get user's hard requirement values for each feature in a dataframe 
hard_requirement = reccomendation_instance.get_hard_requriement()


#Update sorted dataframe to only include locations that meet the hard requirements 
sorted_df = reccomendation_instance.filtered_requirements(sorted_df, hard_requirement)

#Get user's importance rating values for each feature in a dataframe 
importance_rating = reccomendation_instance.get_importance_rating()

#Use the compare function to compare user ratings with location ratings
sorted_df['comparison_value'] = reccomendation_instance.compare(importance_rating, sorted_df)  


#Round values to 4 decimal points
sorted_df['comparison_value'] = sorted_df['comparison_value'].round(4)

#Sort locations based on highest to lowest comparison values 
locations_df_sorted = sorted_df.sort_values(by='comparison_value', ascending=False)


#Show top 3 reccomended places 
print("\nTop 3 user recommendations based on your preferences:")
for i, row in locations_df_sorted.head(3).iterrows():
    distance = row['distance']
    print(f"{row['Place_Name']}. Distance is {distance:.2f} miles. Comparison Value is  {row['comparison_value']}")

#Test coordinates = UCLA, (34.070002762333004, -118.4438392271015)

Enter the latitude coordinate of your starting point:  34.070002762333004
Enter the longitude coordinate of your starting point:  -118.4438392271015
How far are you willing to travel? (in miles):  5


Please input your minimum requirement rating for each feature
Rate each feature on a scale of 1-5 (1 is least, 5 is greatest):


Rate Internet (1-5):  1
Rate Price (1-5):  2
Rate Lighting (1-5):  1
Rate Noise (1-5):  2
Rate Comfort (1-5):  1
Rate Food (1-5):  2
Rate Service (1-5):  1
Rate Atmosphere (1-5):  1


Please input your rating for each feature based on personal importance
Rate each feature on a scale of 1-5 (1 is least, 5 is greatest):


Rate Internet (1-5):  2
Rate Price (1-5):  3
Rate Lighting (1-5):  2
Rate Noise (1-5):  4
Rate Comfort (1-5):  5
Rate Food (1-5):  2
Rate Service (1-5):  3
Rate Atmosphere (1-5):  1



Top 3 user recommendations based on your preferences:
Caffe Luxxe. Distance is 1.56 miles. Comparison Value is  0.9349
Elysee Bakery & Cafe. Distance is 0.70 miles. Comparison Value is  0.8832
SIPP COFFEE HOUSE. Distance is 1.06 miles. Comparison Value is  0.8683


Example Output with Haversine Distance and user rating of [5,4,3,2,4,5,2,3], max distance of 3 miles 
Top 3 user recommendations based on your preferences:
Bluestone Lane Westwood Coffee Shop. Distance is 0.51 miles. Comparison Value is  0.9092
Teaspoon Sawtelle. Distance is 2.01 miles. Comparison Value is  0.8944
Court of Sciences Store. Distance is 0.14 miles. Comparison Value is  0.8933

Google Map Distances (walking): 
Bluestone Lane Westwood Coffee Shop  = .6 miles 
Teaspoon Sawtelle: 2.5 miles
Court of Sciences Store = .2 miles 

Euclidian Distances: 
Bluestone Lane Westwood Coffee Shop  = . miles 
Teaspoon Sawtelle: 3.3 by car. 
Court of Sciences Store = .2 miles 

Euclidian [5, 4, 5, 5, 2, 5, 4, 3, 5]
The Study. Distance is 0.53 miles. Comparison Value is  0.9046
Bluestone Lane Westwood Coffee Shop. Distance is 0.51 miles. Comparison Value is  0.8869
Verve Coffee Roasters. Distance is 3.31 miles. Comparison Value is  0.8842

In [None]:
Haversine 