### Collaborative Filtering (CF)
__What is Collaborative Filtering?__

In Collaborative Filtering, we tend to find similar users and recommend what similar users like. In this type of recommendation system, we don’t use the features of the item to recommend it, rather we classify the users into clusters of similar types and recommend each user according to the preference of its cluster.

There are basically four types of algorithms o say techniques to build Collaborative filtering recommender systems:

   1. `Memory-Based`
   2. Model-Based
   3. Hybrid
   4. Deep Learning
   
   
__Types of Memory-based Collaborative Filtering__

User-based collaborative filtering:   
Measures the similarity between target users and other users.   

Item-based collaborative filtering:   
Measures the similarity between the items that target users rate or interact with and other items.    
This method is more scalable since items tend to have more stable interaction patterns than users.


<center><img src="image/item_user_image.png" width="600" hight="800"></center>





In [106]:
import pandas as pd
# import cosine similarity function
from sklearn.metrics.pairwise import cosine_similarity

In [107]:
# Load Rating dataset
rating_final = pd.read_csv('./data_set/rating_final.csv')

In [108]:
duplicated_value = rating_final[['userID', 'placeID']].duplicated().sum()
print("there is:", duplicated_value, "duplicated_value")
# for each userID just exist one placeID, Because of that there is no need to mean or weighted mean to doing so, or deleting extra record

there is: 0 duplicated_value


#### User-item matrix

The system organizes user interactions (ratings, clicks, purchases)      
into a matrix. The matrix is often sparse due to the limited number of   
interactions. Typically, this matrix is sparse due to limited interactions   
—many users engage with only a small fraction of available items. Managing   
and interpreting this sparse data effectively is key to accurate recommendations.   
“Similarity index” is a term I see.    

<center><img src="image/user_item_matrix.png" width="600" hight="800"></center>


In [109]:
# Make rating matrix
user_place_df = rating_final.pivot(columns='placeID', index='userID', values='rating')
# Fill Nan value with zero, indicate user do not rate it
user_place_df.fillna(0, inplace=True)
# Print first 5 row
user_place_df.head()

placeID,132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,...,135080,135081,135082,135085,135086,135088,135104,135106,135108,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U1002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
U1003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U1004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
U1005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Memory-based approaches 
Let’s examine the most basic approach to implementing collaborative filtering: the memory-based approach. This approach uses simple arithmetic operations or metrics to calculate the similarities between two users or two items to group them. For example, to find user-user relations, both users’ historically liked items are used to find the similarity metric, that measures how similar the two users are.

`Cosine similarity` is a common similarity metric. Euclidean distance and Pearson’s correlation are other popular metrics. A metric is considered geometric if the row (column) of a given user (item) is treated as a vector or a matrix. In cosine similarity, the similarity of two users (say) is measured as the cosine of the angle between the vectors of the two users.

The memory-based approach is further divided into user-to-user-based and item-to-item-based collaborative filtering.

    

In [110]:
# Create user similarity matrix, 
user_sim_matrix = cosine_similarity(user_place_df)
# Create user similarity data frame
user_sim_df = pd.DataFrame(
    user_sim_matrix,
    index=user_place_df.index, 
    columns=user_place_df.index, # used index attribute, we calculated user pairwise similarity
)
# Print first 5 record
user_sim_df.head()

userID,U1001,U1002,U1003,U1004,U1005,U1006,U1007,U1008,U1009,U1010,...,U1129,U1130,U1131,U1132,U1133,U1134,U1135,U1136,U1137,U1138
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,1.0,0.227921,0.166957,0.0,0.059761,0.111456,0.188982,0.0,0.106904,0.0,...,0.0,0.0,0.0,0.353553,0.0,0.083478,0.0,0.0,0.14825,0.0
U1002,0.227921,1.0,0.266371,0.158362,0.095346,0.088911,0.075378,0.0,0.426401,0.0,...,0.0,0.0,0.0,0.402911,0.0,0.199778,0.0,0.322329,0.413919,0.355335
U1003,0.166957,0.266371,1.0,0.0,0.0,0.325645,0.0,0.0,0.374817,0.0,...,0.0,0.0,0.0,0.118056,0.0,0.439024,0.0,0.059028,0.476463,0.208232
U1004,0.0,0.158362,0.0,1.0,0.166091,0.07744,0.131306,0.0,0.037139,0.0,...,0.0,0.0,0.0,0.350931,0.0,0.0,0.0,0.280745,0.103005,0.0
U1005,0.059761,0.095346,0.0,0.166091,1.0,0.0,0.237171,0.0,0.0,0.447214,...,0.0,0.0,0.0,0.084515,0.0,0.0,0.0,0.0,0.124035,0.0


##### User-to-User Collaborative Filtering
User-to-user-based collaborative filtering recommends items that a particular user might like by finding similar users, using purchase history or ratings on various items, and then suggesting the items liked by these similar users.

In [111]:
# Define function for select top 5 user_ids similar to given an user_id
def fetch_sim_users(user_place_data, user_id, k=5) -> pd.Series:
    """This function separates the selected user from all
        other users and then takes a cosine similarity of 
        the selected user with all users to find similar 
        users. Return the top k similar users (by CustomerID)
        to our selected user

    Args:
        user_place_data (pandas DataFrame): user-place-matrix 

        user_id (str): user identification

        k (int, optional): Top K similar users. Defaults to 5.
    """

    # Separating data rows for the entered user id
    user_place_record = user_place_data[user_place_data.index == user_id]
    # a data of all other users 
    other_users_place_records = user_place_data[user_place_data.index != user_id]
    # calculate cosine similarity between user and each other user
    sim_score = cosine_similarity(user_place_record, other_users_place_records)[0].tolist()
    # users indices
    user_indices = other_users_place_records.index
    
    # Use pandas Series data structure, 
    sim_ser = pd.Series(data=sim_score, index=user_indices, name=f'Users are similar to {user_id}')
    # Sort from most similar to least similar
    sim_ser.sort_values(ascending=False, inplace=True)
    # Return top k users similar to the specific user_
    return sim_ser.iloc[:k]
# Let's find top 5 similar users to U1001 user   
fetch_sim_users(user_place_data=user_place_df, user_id = 'U1001', k=5)

userID
U1036    0.417392
U1054    0.417392
U1092    0.404061
U1116    0.397033
U1055    0.395437
Name: Users are similar to U1001, dtype: float64

In [126]:
# Define function to use for recommend some places for specific user ID 

def sim_users_rec(userid: str) -> pd.DataFrame:
    """
    Recommendation table of other users visited places

    Args:
        userid (str): User ID

    Returns:
        pd.DataFrame: Recommendation table
    """

    # Get similar users ID
    sim_users_id = fetch_sim_users(user_place_data=user_place_df, user_id=userid, k=5).index # index give user ID
    
    # Obtaining all the places visited by similar users
    # Uses (isin) for multi item filtering
    sim_users_rec_df = rating_final[rating_final['userID'].isin(sim_users_id)][['userID', 'placeID']] 
    
    # Set User ID as Index
    sim_users_rec_df = sim_users_rec_df.set_index('userID')
    # Drop duplicated place ID for make the dataFrame unique by place ID
    unique_users_rec_df = sim_users_rec_df.drop_duplicates()
    # Return table contain users have been visited, that we can use it for randomly recommend places
    return unique_users_rec_df

# let's try recommend some places for user 'U1001'
# Randomly select 5 place to recommend
sim_users_rec('U1001').sample(5)   

Unnamed: 0_level_0,placeID
userID,Unnamed: 1_level_1
U1055,135047
U1036,135064
U1055,132572
U1055,132754
U1054,132856
