# Location-Based Model

The following sources were used to perform this model:

https://stackoverflow.com/questions/41336756/find-the-closest-latitude-and-longitude
https://towardsdatascience.com/location-based-recommendations-bb70af7b1538

To prepare the model, the necessary packages were imported and a csv file was read into a dataframe. The dataframe was then filtered to include only the relevant columns for the recommendation system, including latitude, longitude, brewery name, beer name, and reviews.

Next, a KNNBasic model was performed using the surprise package and a train-test split with a default test size of 25%. KNNBasic was chosen as it performed better on the mean average error and mean squared error metrics. A cross-validation was also performed to check for overfitting, and the results showed consistent validation.

##### Running this model, it uses the haversine formula to correctly run the recommend_beers function. 

The recommendation system used the haversine formula to run the recommend_beers function. The function converted the latitude and longitude inputs to radians and iterated through each brewery in the dataframe to calculate the distance between the input location and the brewery using the formula. The distances were stored in a list called distances.

The function then sorted the dataframe by distance and dropped any duplicated brewery name values to provide the five closest breweries to the input latitude and longitude. For each of the five breweries, the function looped through their reviews to find their top five rated beers, providing the user with a destination as well as recommended beers.

In [1]:
#import necessary packages
import pandas as pd

In [2]:
# read in all_beer_df.csv 
all_beer_df = pd.read_csv('data/all_beer_df.csv', low_memory=False)

In [3]:
# create new pd.dataframe of ONLY columns lat, long, brewery_name, beer_name and review_overall 
columns_to_include = ['lat', 'long', 'brewery_name', 'beer_name', 'review_overall']

# create new dataframe with selected columns
locate_df = all_beer_df.loc[:, columns_to_include]
locate_df.head()

Unnamed: 0,lat,long,brewery_name,beer_name,review_overall
0,37.711807,-122.177658,21st Amendment Brewery,Hop Caen,4.5
1,37.711807,-122.177658,21st Amendment Brewery,21 Rock,3.5
2,37.711807,-122.177658,21st Amendment Brewery,Harvest Moon,4.0
3,37.711807,-122.177658,21st Amendment Brewery,21st Amendment IPA,4.0
4,37.711807,-122.177658,21st Amendment Brewery,Monk's Blood,4.0


In [4]:
# create model of users, beers, and ratings
#Running a model with KNNBasic
from surprise import KNNBasic
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import train_test_split

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(locate_df[['brewery_name', 'beer_name', 'review_overall']], reader)

trainset, testset = train_test_split(data, test_size=.25)

sim_options = {'name': 'cosine', 'user_based': True}
model = KNNBasic(sim_options=sim_options)
model.fit(trainset)
predictions = model.test(testset)

accuracy.mse(predictions)
accuracy.mae(predictions)

Computing the cosine similarity matrix...
Done computing similarity matrix.
MSE: 0.3540
MAE:  0.4419


0.44193186454112665

In [5]:
# Checking for overfitting with cross validation
from surprise.model_selection import cross_validate

cross_validate(model, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.5924  0.5901  0.5937  0.5938  0.5934  0.5927  0.0014  
MAE (testset)     0.4407  0.4402  0.4409  0.4428  0.4410  0.4411  0.0009  
Fit time          1.32    1.35    1.32    1.32    1.31    1.32    0.01    
Test time         9.58    9.33    9.02    9.10    9.22    9.25    0.20    


{'test_rmse': array([0.59237581, 0.59005371, 0.59374689, 0.59382234, 0.59344682]),
 'test_mae': array([0.44068214, 0.44017829, 0.44087082, 0.44281595, 0.44102904]),
 'fit_time': (1.3219170570373535,
  1.3513970375061035,
  1.3203928470611572,
  1.3194010257720947,
  1.3081858158111572),
 'test_time': (9.57987904548645,
  9.333570003509521,
  9.018429279327393,
  9.097063779830933,
  9.21616816520691)}

In [6]:
# create recommendation function that finds the 5 closest brewery coordinates using lat and long columns and returns the top 5 beers from each brewery, not repeating beers or breweries
# https://stackoverflow.com/questions/41336756/find-the-closest-latitude-and-longitude - source
# use haversine formula to find nearest brewery coordinates
# use .isin() to find beers from nearest breweries
# use .value_counts() to find top 5 beers from nearest breweries
# use .index to return top 5 beers from nearest breweries
# use .unique() to return unique beers from nearest breweries
# use .tolist() to return list of beers from nearest breweries

In [7]:
from math import radians, sin, cos, sqrt, atan2
import pandas as pd

def recommend_beers(latitude, longitude, locate_df):
    # convert latitude and longitude to radians
    lat1 = radians(latitude)
    lon1 = radians(longitude)

    # calculate distances using haversine formula
    R = 6371.0  # radius of the earth in km
    distances = []
    for lat2, lon2 in zip(locate_df['lat'], locate_df['long']):
        lat2 = radians(lat2)
        lon2 = radians(lon2)
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
        c = 2 * atan2(sqrt(a), sqrt(1 - a))
        distance = R * c
        distances.append(distance)

    # add distances to dataframe
    locate_df['distance'] = distances

    # find 5 closest breweries
    closest_breweries = locate_df.sort_values('distance').drop_duplicates('brewery_name').head(5)

    # find top 5 beers from each brewery
    for i, brewery in enumerate(closest_breweries['brewery_name']):
        beers = locate_df.loc[locate_df['brewery_name'] == brewery, 'beer_name']
        beers = beers.value_counts().index[:5].tolist()
        print(f"Brewery {i+1}: {brewery}")
        print(f"Top 5 beers: {beers}\n")


#### Will use the coordinates of Flatiron School to test the model

In [8]:
latitude = 40.70542603644052
longitude = -74.01387181694061
recommend_beers(latitude, longitude, locate_df)

Brewery 1: Brooklyn Brewery
Top 5 beers: ['Brooklyn Black Chocolate Stout', 'Brooklyn Lager', 'Brooklyn Monster Ale', 'Brooklyn Brown Ale', 'Brooklyn East India Pale Ale']

Brewery 2: Trap Rock Restaurant & Brewery
Top 5 beers: ['Kestrel&#8217;s Joy IPA', 'Ghost Pony Helles Lager', 'Chainsaw Double IPA', 'Hathor Red Lager', 'Schroeder Weiss']

Brewery 3: Port Jeff Brewing Company
Top 5 beers: ['Port Jeff Porter', 'Low Tide Black IPA', 'Schooner Ale', 'Boo Brew', 'Cold North Wind']

Brewery 4: Blue Point Brewing Company
Top 5 beers: ['Hoptical Illusion', 'Blue Point Toasted Lager', 'Blue Point Rastafa Rye Ale', 'Blue Point Blueberry', 'Blue Point Winter Ale']

Brewery 5: The Ship Inn
Top 5 beers: ['Best Bitter', 'Extra Special Bitter', 'Golden Wheat Light', 'Killer Bee', 'Spring Mild']



### Checking to make sure the function created works as it's intended

In [9]:
# find top 5 rated beers from specific brewery

def top_rated_beers(brewery_name, locate_df):
    beers = locate_df.loc[locate_df['brewery_name'] == brewery_name, 'beer_name']
    beers = beers.value_counts().index[:5].tolist()
    print(f"Top 5 beers from {brewery_name}: {beers}\n")

top_rated_beers('Glacier Brewhouse', locate_df)

Top 5 beers from Glacier Brewhouse: ['India Pale Ale', 'Big Woody Barley Wine', 'Ice Axe Ale (Imperial Blonde)', 'Amber Ale', 'Oatmeal Stout']

