# Social Computing - Summer 2018
# Exercise 5 - Group Recommender System 

### Background
In exercise 3 we built a simple collaborative filtering recommender system for
movies using the MovieLens dataset. In this exercise we will reuse and extend this system  to build a group recommender system for restaurants. A group recommender provides recommendations for a group of users instead of a single user. A Group recommendation is an aggregation of single user recommendations generated to each group member. There are two possibilities to do this aggregation:
* The first is to generate recommendations (predicted ratings) for individual members of the group (like a single user recommender), then aggregate those individual predicted ratings into predicted ratings (recommendations) for the group.
* The second is to aggregate individual user ratings (actual ratings) to build a group model, and then create predicted ratings for the model, i.e. treating the group model as a single user and create predicted ratings for that user (using the single user recommender system)

In both cases, the aggreation is done according to a social choice strategy (e.g. Maximum satisficaction, minimum misery, etc...)
For more information about group recommenders and aggregation strategy, please refer to the paper: Group Recommender Systems: New Perspectives in the Social Web by Cantador and Castells (You will find it the reading material for this exercise)

In this exercise, we will use to the first design option: The system should generate predicted ratings for each member of the group and those ratings will be aggregated into a group rating based on the "Least Misery" strategy.

The group recommender system we will be build in this exercise is a social-context-aware recommender system and the input will be a subset of the anonymized restaurant rating dataset that the students submitted in the experiment part of this class. The "restaurant domain expertise" will be used as the social context parameter that will influence the output of the group recommender.

The dataset is available in 2 comma-separated files: ratings.csv, and domain-expertise.csv
* Ratings.csv contains individual participant restaurant ratings according to the rating attributes: price, clumsiness, service, hippieness, location, social overlap. The first two columns in the csv are for participant ID and restaurant ID respectively. The rating values for each of the attributes are between 0 and 100.
* domain-expertise.csv: Each record contains the restaurant experitise rating of a participant as estimated by another participant. The file has 3 columns from_participant_id: The ID of the participant who did the rating, to_participant_id: The ID of the participant being rated, and domain_expertise: the rating value between 0 and 100 


### The Exercise
Write a simple group recommender system in python. The entry point to the program should be a method that takes the following arguments:
* group: a list of integers which represents the participant IDs of some participants forming a group
* ratings_path: path to the files ratings.csv 
* domain_expertise_path: path to the file domain-expertise.csv.

Pre-processing: 

The restaurant ratings in the dataset are multi-valued (because there are several rating parameters for a single restaurant). Your program should calculate a single that represents the overall rating for each restaurant by each participant. The single value rating should be between 0 and 1 (divide the different rating values by 100 and calculate the average).
The same applies for domain expertise ratings (divide the rating values by 100)

Single-user recommendations:

The program should generate individual recommendations for each participant in the group. A delegation-based-method that takes into consideration the domain-expertise of other participants in the group will be used. The idea is that a member's preference will be influenced by the opinion of another member in the group depending on how much she thinks this other person has expertise in the domain in question: if the person thinks that the other group member is an expert in restaurants, then she will be influenced by that members's opinion while choosing a restaurant for a group dinner for example. The delegation-based-method is formulated as follows:
$$pred'(u,i) = \frac{1}{\mid{\sum_{v \in G} d_{u,v}\mid}}\sum_{v \in G \wedge v \neq u}d_{u,v}*pred(v,i)$$
* $pred'(u,i)$ is the social-context-aware predicted rating of participant u to the restaurant i
* $v$ is another member in the group $G$
* $d_{u,v}$ is the domain expertise rating from participant u to participant v
* $pred(v,i)$ is the predicted rating of participant v to the restaurant it

According to the formula, you will notice that there are two different predicted ratings: pred and pred'.
* pred: represents a normal collaborative filtering predicted rating for a certain user towards a certain item. We call this the base rating
* while pred': is the social-context-aware predicted rating which is function in the base  rating of the other group members and with those members' domain expertise as preceived by the current user. 

This means your program should calculate two different ratings for each group member
The program should start by calculating the base predicted rating for all members in the group for all restaurants (re-use your code from exercise 3), then for each group member, the program should calculate the social-context-aware predicted rating for all restaurants

The final step in the program should be the calculation of the group recommendation, or the appliation of the aggregation strategy. The strategy we are going to use in this exercise is: "Least misery"

The output should be a list of Python tuples (sorted by the groups restaurants' predicted ratings: highest first). Each tuple has the following two attributes: restaurant's ID, and social-context-aware predicted rating. You are free to design your recommendation engine the way as you want (the provided code below is just a suggested design). Clean, readable, and documented code is expected, and those aspects will be part of the overall grade of the exercise

Note: You can test your recommender with the following groups (identified by the participant IDs): [63,117,116], [160,161,162], [178,134,91], [67,198,197]


In [7]:
import math

# TODO: Least misery strategy for group recommendations aggregation


def least_misery(social_preds, restaurants, group):
    group_ratings = []
    # TODO: implement strategy
    for restaurant in restaurants:
        grp_rating = 1
        for member in social_preds:            
            if grp_rating > social_preds[member][restaurant]:
                grp_rating = social_preds[member][restaurant] 
        group_ratings.append((restaurant, grp_rating))
    group_ratings.sort(key = lambda tup: tup[1], reverse = True) 
    return group_ratings


def calculate_social_context_aware_predictions(group, restaurants, base_predicted_ratings, dom_expertise):
    #TODO: Apply the formula for the participant for each restaurant
    social_preds = {}
    for member in group:
        for restaurant in restaurants:
            if member not in social_preds:
                social_preds[member] = {}
            dom_sum = 0
            dom_base_prod = []
            for other_members in group:
                if member != other_members:
                    dom_sum += dom_expertise[str(member)][str(other_members)]
                    dom_base_prod.append(dom_expertise[str(member)][str(other_members)]*base_predicted_ratings[str(other_members)][str(restaurant)])

            social_preds[member][restaurant] = (1/dom_sum)*(sum(dom_base_prod))    
            
    return social_preds # {participant_id: {restaurant_id: social_rating}}

# Using Euclidean distance to calculate similarity score
def calculate_similarity_score(ratings, user_id1, user_id2):
    common_movies = []
    # TODO Find common movies rated by both users
    user1_movies = ratings[user_id1]    
    user2_movies = ratings[user_id2]
    
    for movie in user2_movies:
        if movie in user1_movies:
            common_movies.append(movie)            
            
    if len(common_movies) == 0: # no common ratings between two users. Similarity is 0
        return 0

    # TODO Calculate Euclidean distance between two users based on their common ratings
    sum_of_squares_of_differences = 0
    for movie_id in common_movies:    
        sum_of_squares_of_differences += (user1_movies[movie_id] - user2_movies[movie_id])**2
        
        # TODO Accumulate the sum of squares of differences in ratings between the two users for the same movie
    return 1 / (1 + sum_of_squares_of_differences)   

    

def cf_recommend(target_user_id, restaurants, ratings, avg_ratings):
    weighted_ratings = {} # {movie_id: weighted_rating}
    similarity_scores = {} # {movie_id: similarity_score}
    recommended_movie_list = {} # Each element is a tuple (estimated_rating, movie_id)
    for user_id in ratings:
        if user_id != target_user_id:
            similarity_score = calculate_similarity_score(ratings, target_user_id, user_id)
            if similarity_score > 0:
                for movie_id in ratings[user_id]:
                    # Movie was not recommended by the target user before
                    if movie_id not in ratings[target_user_id]:
                        try:
                            if movie_id in weighted_ratings:
                                weighted_ratings[movie_id] += ratings[user_id][movie_id]*similarity_score
                                similarity_scores[movie_id] += similarity_score
                            else:
                                weighted_ratings[movie_id] = ratings[user_id][movie_id]*similarity_score
                                similarity_scores[movie_id] = similarity_score
                        except KeyError:
                            print("key error found! " + str(user_id) + " " + str(movie_id))
  
    for key in weighted_ratings: # TODO for each movie
        # TODO Weighted_rating/sigma (similarity scores of users who rated that movie)
        estimated_rating = weighted_ratings[key]/similarity_scores[key]
        recommended_movie_list[key] = estimated_rating
       
    for restaurant in restaurants:
        if restaurant not in weighted_ratings:
            recommended_movie_list[restaurant] = avg_ratings[restaurant]
        
        
    return recommended_movie_list # List of recommended movies for user_id from highest to lowest estimated rating

def calculate_base_predictions(item, restaurants, ratings, avg_ratings):
    return cf_recommend(item, restaurants, ratings, avg_ratings)
    

def get_restaurants(ratings_path): 
    #TODO: using the ratings return a list of unique restaurant IDs
    restaurants_list = list()
    f = open(ratings_path, 'r')
    lines = f.readlines()[1:]
    for line in lines:    
        rating_data = line.strip().split(',')
        if rating_data[1] not in restaurants_list:
            restaurants_list.append(rating_data[1])
    
    return restaurants_list



def get_participants(ratings_path): 
    #TODO: using the ratings return a list of unique restaurant IDs
    
    participants = []
    f = open(ratings_path, 'r')
    lines = f.readlines()[1:]
    
    for line in lines:
        ratings_data = line.split(',')
        participants.append(ratings_data[0])        
    participants = set(participants)
    return participants


def create_ratings_dictionary(path):
    ratings_dictionary = {}
    avg_ratings = {}
    file  = open(path, 'r')
    flag = 0
    count = {}
    
    for line in file:
        if flag == 1:
            ratings_info = line.split(',')
            if ratings_info[0] not in ratings_dictionary:
                ratings_dictionary[ratings_info[0]] = {}
            ratings_dictionary[ratings_info[0]][ratings_info[1]] = ((float(ratings_info[2])/100)+(float(ratings_info[3])/100)+(int(ratings_info[4])/100)+(float(ratings_info[5])/100)+(float(ratings_info[6])/100)+(int(ratings_info[7])/100))/6
            if ratings_info[1] in avg_ratings:
                avg_ratings[ratings_info[1]] += ((float(ratings_info[2])/100)+(float(ratings_info[3])/100)+(float(ratings_info[4])/100)+(float(ratings_info[5])/100)+   (float(ratings_info[6])/100)+(float(ratings_info[7])/100))/6
                count[ratings_info[1]] += 1
            else:
                avg_ratings[ratings_info[1]] = ((float(ratings_info[2])/100)+(float(ratings_info[3])/100)+(float(ratings_info[4])/100)+(float(ratings_info[5])/100)+   (int(ratings_info[6])/100)+(float(ratings_info[7])/100))/6
                count[ratings_info[1]] = 1
        flag = 1
        
        
    for key in avg_ratings:
        avg_ratings[key] = avg_ratings[key]/count[key]
    
    
    return ratings_dictionary, avg_ratings

def create_domain_dictionary(domain_expertise_path):
    domain_dict = {}
    f = open(domain_expertise_path, 'r')
    lines = f.readlines()[1:]
    for line in lines:
        domain_data = line.split(',')
        if domain_data[0] not in domain_dict:
            domain_dict[domain_data[0]] = {}
        domain_dict[domain_data[0]][domain_data[1]] = float(domain_data[2])/100        
    
    return domain_dict

# Group recommender (Main program)
def group_recommender(group, ratings_path, domain_expertise_path):
    # Pre-processing:
    #-----------------
    ratings, avg_ratings = create_ratings_dictionary(ratings_path)
    
    dom_expertise = create_domain_dictionary(domain_expertise_path)

    restaurants = list(get_restaurants(ratings_path))
    participants = list(get_participants(ratings_path))
    
    # Calculate single user recommendations (predicted ratings):
    #-------------------------------------------------------------
    # For each participant in the group, calculate the base ratings for each restaurant
    # For each participant in the group, calculate the social-context-aware predicted ratings (given the base predicted ratings)
    base_preds = {}
    for item in participants:
        base_preds[item] = calculate_base_predictions(item, restaurants, ratings, avg_ratings)
   
    # Calculate group recommendations (group predicted ratings) - based on the "Least Misery" strategy:
    #--------------------------------------------------------------------------------------------------
    # Given the social-context-aware predicted ratings for each group member, aggregate those ratings 
    # into group recommendations for each restaurant based on the "Least Misery" strategy (sorted by predicted ratings: highest first)
    social_preds = {}
    social_preds = calculate_social_context_aware_predictions(group, restaurants, base_preds, dom_expertise)
    group_ratings = least_misery(social_preds, restaurants, group)
    
    for i in range(len(group_ratings)):
        print(group_ratings[i])
        
# Test (Call your main program to test it with the sample groups from the exercise description above)
group_recommender([160,161,162], "ratings.csv", "domain_expertise.csv")

('ChIJi8mM2MR1nkcREr7RKfG-r-A', 2.675)
('ChIJ0SA6ol7fnUcRY94JlDkvHIM', 2.62)
('ChIJn1i9catjnkcRUfGk3lw0nfs', 2.5949999999999993)
('ChIJ9UAZIc2SHyMRvjDjbD8RAZ8', 2.5799999999999996)
('ChIJl1jXp-x1nkcRWOz1L2XZjOg', 2.5749999999999997)
('ChIJk_E9pRJ1nkcR_gj14oF1w4U', 2.5599999999999996)
('ChIJPVuIx17MHkcRLTw4MxAEGqg', 2.54)
('ChIJs9m2L4h1nkcRH33znwtpr64', 2.5199999999999996)
('ChIJV8ScsvR1nkcRNjb_cjiH1_M', 2.4949999999999997)
('ChIJvUdRyzDEyIARh86Fi1C2hqI', 2.48)
('ChIJvU0_gel1nkcRT4LaXa2PuNI', 2.465)
('ChIJF4NRh96TyzsRuQItK64EJW4', 2.45)
('ChIJ56fJIvd1nkcRvJ7vYPfBkME', 2.4199999999999995)
('ChIJS0pws8-WyzsRXFPfkYpmjNY', 2.415)
('ChIJ8_kwAYTYnUcRf7LU6AZtsrQ', 2.4049999999999994)
('ChIJ_Q8c8ESRyzsRLZngwt4GTh8', 2.395)
('ChIJbyS9yOR1nkcRUCwH80RQmX8', 2.383)
('ChIJc1iaGGDfnUcRUvk6-kRjpQ0', 2.3749999999999996)
('ChIJPYeBMlvfnUcRkpE-ahhWlVo', 2.369999999999999)
('ChIJpXFwOM6WyzsRDNt4k3XaefI', 2.36)
('ChIJJxx6Yvl1nkcR3wKvmRIZuW8', 2.36)
('ChIJZRsKt-p3nkcRC4npIl4H8wo', 2.36)
('ChIJKf2foPQHdkcRji