# Beer Recommendation Functions 

This notebook contains the necessary functions for the beer recommendation system. Note that these functions depend on the ALS model and KNN algorithm already been saved and their outputs saved as csv files.

## Load Raw Data Function

In [0]:
def load_raw():

    '''
    This function returns the raw data for the beer ratings dataset as a pyspark dataframe.
    '''

    # Import dependencies 
    from  pyspark.sql.types import StructField, StructType, StringType, LongType, FloatType
    import pyspark.sql.functions as f 

    # Define the Data Dir 
    data_dir = 'dbfs:/FileStore/tables/Capstone/beer/data'   

    # Beer Schema - column names and types (True means that columns are nullable)
    beerSchema = StructType([
        StructField('brewery_id', LongType(), True), 
        StructField('brewery_name', StringType(), True), 
        StructField('review_time', LongType(), True), 
        StructField('review_overall', FloatType(), True), 
        StructField('review_aroma', FloatType(), True), 
        StructField('review_apperance', FloatType(), True), 
        StructField('review_profilename', StringType(), True), 
        StructField('beer_style', StringType(), True), 
        StructField('review_palate', FloatType(), True), 
        StructField('review_taste', FloatType(), True), 
        StructField('beer_name', StringType(), True), 
        StructField('beer_abv', FloatType(), True), 
        StructField('beer_beerid', LongType(), True)
    ]) 
    
    # Load the raw data using fname and schema
    raw = spark.read.load(path = data_dir + '/beer_reviews.csv', 
                          format='csv', 
                          header=True, schema= beerSchema)
    
    # Return DataFrame
    return raw

## Recommendations from Beer/Brewery

This function returns the top 10 recommendations from a given beer name or brewery name. Names can be passed to the function when calling the function or through prompting the user.

In [0]:
def rec_from_name(name: str = None, brewery:str = None, prompt=True):

    '''
    Function to return recommendations for specific beers. 
    Can also return recommendations for other brewery's beers based on a brewery the user indicates.
    Both beer and brewery can be specified for more precise matching. 

    Parameters: 

    name (str):     the name of the beer (case sensitive). Optional if brewery is indicated.
    brewery (str):  the name of the brewery (case sensitive) Optional if beer is indicated.
    prompt (bool):  whether to prompt user for beer/brewery
    '''

    # Import Dependencies
    import pyspark.sql.functions as f

    # Recommendations CSV Directory
    rec_dir = 'dbfs:/FileStore/tables/Capstone/beer/beer_recs'
    new_rec_dir = rec_dir + '/new'
    
    # Load csv of new user recommendations 
    # (Contains beer, recommended beer and the distance)
    new_recs = spark.read.load(path = new_rec_dir + "/new_user_recommendations.csv", 
                      format='csv', header=True)
    
    # Whether to prompt the user for beer/brewery names
    if prompt: 
        print("Please enter the name of the beer you'd like similar recommendations for. (Case Sensitive)\nOr press enter to lookup by brewery.")
        name = input('')
        print("If you'd like, input the name of the brewery, or press enter to skip.")
        brewery = input('')

    ## Filtering recommendations for the best for specified beer/brewery
    # If only beer is specified
    if name not in [None, ''] and brewery in [None, '']:
        beer_recs = new_recs.filter(f.col('home_beer_name') == name) 
    # If only brewery is specified 
    elif name in [None, ''] and brewery not in [None, '']: 
        beer_recs = new_recs.filter(f.col('home_brewery_name') == brewery) 
    # If both are specified
    elif name not in [None, ''] and brewery not in [None, '']: 
        beer_recs = new_recs.filter((f.col('home_beer_name') == name) & (f.col('home_brewery_name') == brewery))
    
    # If the name is specified
    if name not in [None, '']:
        # Print the beer found in the database 
        found = beer_recs.select('home_beer_name', 'home_brewery_name').distinct().rdd.flatMap(list).collect()
        print(f'\nMaking Recommendations for:\n\t{found[0]} from {found[1]}')

        # Order recommendations by the distance, select/rename the required columns 
        top10 = beer_recs.orderBy('nbr_dis', ascending=True)\
            .select(f.col('rec_brewery_name').alias('Brewery Name'), f.col('rec_beer_name')\
                .alias('Beer Name'))
        
        # Display the portion of the csv showing the recommendations 
        # (best way for now when it comes to formatting printing)
        top10.display()
    
    else: 
        # Display the brewery found
        found = beer_recs.select('home_brewery_name').distinct().rdd.flatMap(list).collect()
        print(f'\nMaking recommendations based on beers from:\n\t{found[0]}')
        
        # Order by distance, and select top 10 beers recommended considering all beers in specified brewery
        top10 = beer_recs.filter(f.col('rec_brewery_name') != brewery)\
            .orderBy(f.col('nbr_dis'))\
                .select(f.col('rec_beer_name').alias('Beer Name'), 
                        f.col('rec_brewery_name').alias('Brewery Name')).limit(10)
        # Display top 10 recommendations
        top10.display()

## Recommendations from Username

This function checks the user's experience and returns the top 5 recommendations for the experienced users (positively rated at least 3 beers). Inexperienced users get 10 recommendations for all beers they've positively rated so far. If no beers have been positively rated, a message is printed, the same occurs if the username has no previous ratings at all.  

In [0]:
def get_exp_rec():

    '''Function to return recommendations for a given username. 
    Takes no parameters, though prompts for a username (not case sensitive).
    Calls rec_from_name on positviely rated beers by user if less than 3 are positively rated.'''

    # Import dependencies 
    import pyspark.sql.functions as f

    # Experienced Users ratings directory
    rec_dir = 'dbfs:/FileStore/tables/Capstone/beer/beer_recs'
    exp_rec_dir = rec_dir + '/experienced'
    
    # Prompt the user for the username
    print('Please Type Username (not case sensitive):')
    username = input('')
    print(f'Making Recommendations for: {username}')

    # Load recommendations for experienced users
    exp_recs = spark.read.load(path = exp_rec_dir + '/exp_user_recommendations.csv', 
                      format='csv', header=True)
    
    # Filter by the username, get the top 5 ratings, select required columns
    user_recs = exp_recs.filter(f.col('username') == username.lower())\
        .orderBy(f.col('predRating')).limit(5)\
            .select(f.col('brewery_name').alias('Brewery'), 
                    f.col('beer_name').alias('Beer'))
    
    # IF there are no recommendations for the user 
    if user_recs.count() == 0: 
        # Check if they exist in the system
        raw = load_raw()
        raw = raw.select('*', f.lower(f.col('review_profilename')))
        user_revs = raw.filter(f.col("lower(review_profilename)") == username)

        # If the user has no reviews 
        if user_revs.count() == 0:
            # User not found message
            print(f'Username {username} not in Database')
            return
        else: 
            # If the user has no positive reviews 
            pos_revs = user_revs.filter(f.col('review_taste') >= 3)
            if pos_revs.count() == 0: 
                # Unable to recommend, they can search by name
                print('Not enough beers rated positively for recommendations.\nPlease rate more beers for personalized recommendations.')
            else: 
                # Run rec_from_name on all positive ratings from user if they have any 
                print('Cannot make personalized recommendations (too few positively rated beers).\nRecommending based on all positively rated beers by user:')
                liked_beers = pos_revs.select(f.col('beer_name')).distinct().rdd.flatMap(list).collect()
                for b in liked_beers: 
                    # Here, the recommendations by name function can be run without prompting the user
                    rec_from_name(name = b, prompt=False)
    else: 
        # Display top 10 recommendations for experienced users
        user_recs.display()

## Wrapper Function for System

This function is the main function used by users of the system. Asks if the user wants recommendations for username or beer/brewery name. Then, runs the apporpriate function for either type of user. 

In [0]:
def make_recommendations(): 

   '''
   Function to make recommendations by username or by beer name.
   Wrapper for the functions `get_exp_rec` and `rec_from_name`.
   '''

   # Prompting user for recommendation types, reprompts if the input is invalid.
   user = 0
   while user not in [1,2]: 
      print('''
Select on option (1/2):
   1. Recommendations by username.
   2. Beer/Brewery specific recommendations.
            ''')
      user = input("")
      try: user = int(user)
      except: 
         user = 0

   # For usernames, simply runs the appropriate function
   if user == 1: 
      print('Making recommendations by username.')
      get_exp_rec()
   # For beer names, runs the correct function
   elif user == 2: 
      print('Making item-specific recommendations.')
      # Note prompt here is used to ask the user for the names
      rec_from_name(prompt=True)