# Location-Based Model

This model was performed using some sources as provided below.

The first preparation of a model of this caliber, was to import the necessary packages, read in the csv file to the dataframe. This dataframe is then cut down to only include specific columns needed to perform the recommendation system we need. These include only columns with latitude, longitude, the brewery name, the beer name as well as the reviews of such beers. 

We then performed a KNNBasic model using the surprise package and a train-test split with the default test_size of 75% training data and 25% testing data. KNNBasic was run as it performed better on the metrics provided of mean average error and mean squared error. 

The recommendation system was performed using the sources provided here:
https://stackoverflow.com/questions/41336756/find-the-closest-latitude-and-longitude
https://towardsdatascience.com/location-based-recommendations-bb70af7b1538

##### Running this model, it uses the haversine formula to correctly run the recommend_beers function. 

The recommend_beers function converts the latitude and longitude inputs to radians and uses a loop to iterate through each brewery within the dataframe. It calculates the distance between the input location and the brewery using the formula and stored within a list called distances. 

The function sorts the dataframe by distance and drops any duplicated brewery_name values to provide the 5 closest breweries to the input latitiude and longitude.

For each of the 5 breweries, it loops through their reviews to find their top 5 rated beers to provide the user with a destination as well as what they each provide. 

In [1]:
#import necessary packages
import pandas as pd

In [2]:
# read in all_beer_df.csv 
all_beer_df = pd.read_csv('data/all_beer_df.csv', low_memory=False)

In [3]:
# create new pd.dataframe of ONLY columns lat, long, brewery_name, beer_name and review_overall 
columns_to_include = ['lat', 'long', 'brewery_name', 'beer_name', 'review_overall']

# create new dataframe with selected columns
locate_df = all_beer_df.loc[:, columns_to_include]
locate_df.head()

Unnamed: 0,lat,long,brewery_name,beer_name,review_overall
0,37.711807,-122.177658,21st Amendment Brewery,Hop Caen,4.5
1,37.711807,-122.177658,21st Amendment Brewery,21 Rock,3.5
2,37.711807,-122.177658,21st Amendment Brewery,Harvest Moon,4.0
3,37.711807,-122.177658,21st Amendment Brewery,21st Amendment IPA,4.0
4,37.711807,-122.177658,21st Amendment Brewery,Monk's Blood,4.0


In [4]:
# create model of users, beers, and ratings
#Running a model with KNNBasic
from surprise import KNNBasic
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import train_test_split

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(locate_df[['brewery_name', 'beer_name', 'review_overall']], reader)

trainset, testset = train_test_split(data, test_size=.25)

sim_options = {'name': 'cosine', 'user_based': True}
model = KNNBasic(sim_options=sim_options)
model.fit(trainset)
predictions = model.test(testset)

accuracy.mse(predictions)
accuracy.mae(predictions)

Computing the cosine similarity matrix...
Done computing similarity matrix.
MSE: 0.3525
MAE:  0.4405


0.4404953647352639

In [5]:
# Checking for overfitting with cross validation
from surprise.model_selection import cross_validate

cross_validate(model, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.5929  0.5903  0.5994  0.5951  0.5904  0.5936  0.0034  
MAE (testset)     0.4419  0.4397  0.4450  0.4434  0.4399  0.4420  0.0020  
Fit time          1.29    1.32    1.33    1.33    1.33    1.32    0.02    
Test time         8.26    8.60    8.76    8.64    8.50    8.55    0.17    


{'test_rmse': array([0.5928843 , 0.59026479, 0.59935437, 0.59510948, 0.59038974]),
 'test_mae': array([0.44189593, 0.43973676, 0.44495726, 0.44337783, 0.4399302 ]),
 'fit_time': (1.2936038970947266,
  1.3231737613677979,
  1.333704948425293,
  1.3318729400634766,
  1.3349876403808594),
 'test_time': (8.259610176086426,
  8.595849990844727,
  8.75997281074524,
  8.640898942947388,
  8.495603322982788)}

In [6]:
# create recommendation function that finds the 5 closest brewery coordinates using lat and long columns and returns the top 5 beers from each brewery, not repeating beers or breweries
# https://stackoverflow.com/questions/41336756/find-the-closest-latitude-and-longitude - source
# use haversine formula to find nearest brewery coordinates
# use .isin() to find beers from nearest breweries
# use .value_counts() to find top 5 beers from nearest breweries
# use .index to return top 5 beers from nearest breweries
# use .unique() to return unique beers from nearest breweries
# use .tolist() to return list of beers from nearest breweries

In [7]:
from math import radians, sin, cos, sqrt, atan2
import pandas as pd

def recommend_beers(latitude, longitude, locate_df):
    # convert latitude and longitude to radians
    lat1 = radians(latitude)
    lon1 = radians(longitude)

    # calculate distances using haversine formula
    R = 6371.0  # radius of the earth in km
    distances = []
    for lat2, lon2 in zip(locate_df['lat'], locate_df['long']):
        lat2 = radians(lat2)
        lon2 = radians(lon2)
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
        c = 2 * atan2(sqrt(a), sqrt(1 - a))
        distance = R * c
        distances.append(distance)

    # add distances to dataframe
    locate_df['distance'] = distances

    # find 5 closest breweries
    closest_breweries = locate_df.sort_values('distance').drop_duplicates('brewery_name').head(5)

    # find top 5 beers from each brewery
    for i, brewery in enumerate(closest_breweries['brewery_name']):
        beers = locate_df.loc[locate_df['brewery_name'] == brewery, 'beer_name']
        beers = beers.value_counts().index[:5].tolist()
        print(f"Brewery {i+1}: {brewery}")
        print(f"Top 5 beers: {beers}\n")


In [8]:
latitude = 43.7848
longitude = 74.7278
recommend_beers(latitude, longitude, locate_df)

Brewery 1: Glacier Brewhouse
Top 5 beers: ['India Pale Ale', 'Big Woody Barley Wine', 'Amber Ale', 'Ice Axe Ale (Imperial Blonde)', 'Oatmeal Stout']

Brewery 2: Black Bear Brewery
Top 5 beers: ['Black Bear Pail Ale', 'Black Bear Voodoo Bear Porter', 'Black Bear Gearhead Ale', 'Black Bear Blueberry Ale', 'Black Bear Slap Shot']

Brewery 3: Bar Harbor Brewing Company
Top 5 beers: ['Cadillac Mountain Stout', 'Thunder Hole Ale', 'Harbor Lighthouse Ale', '"True Blue" Blueberry Ale', 'Ginger Mild Brew']

Brewery 4: Atlantic Brewing Company
Top 5 beers: ['Bar Harbor Blueberry Ale', 'Coal Porter', 'Bar Harbor Real Ale', "Brother Adam's Bragget Ale", 'Mount Desert Island Ginger']

Brewery 5: Oak Pond Brewing Company
Top 5 beers: ['OPB Nut Brown Ale', 'Storyteller Doppelbock', 'White Fox Ale', 'Dooryard Ale', 'Oktoberfest']



### Checking to make sure the function created works as it's intended

In [9]:
# find top 5 rated beers from specific brewery

def top_rated_beers(brewery_name, locate_df):
    beers = locate_df.loc[locate_df['brewery_name'] == brewery_name, 'beer_name']
    beers = beers.value_counts().index[:5].tolist()
    print(f"Top 5 beers from {brewery_name}: {beers}\n")

top_rated_beers('Glacier Brewhouse', locate_df)

Top 5 beers from Glacier Brewhouse: ['India Pale Ale', 'Big Woody Barley Wine', 'Amber Ale', 'Ice Axe Ale (Imperial Blonde)', 'Oatmeal Stout']

