# Recommendations based on Jaccard index
The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. 

$$ J(A, B) = \frac{|A \cap B|}{|A \cup B|} $$

Your objective is to create a Python function that utilizes the Jaccard index to find and recommend content from users who share similar interests. You will embed this function to the streamlit app


### 1. Loading the data
Load a subset from the ratings dataset.

In [1]:
import pandas as pd
df = pd.read_csv('./data/BX-Book-Ratings-Subset.csv', sep=';', encoding='latin-1')

### 2. Generating the recommendations

In [None]:
import itertools

# user we want to check
id = 98783

def get_jaccard_recommendations(id):
  # create lists per user
  users = df.groupby('User-ID')['ISBN'].apply(list)
  
  new_content = []
  similar_users = []
  
  for user, value in users.items():
    a = set(users[id])
    b = set(users[user])
    new = b.difference(a)

    j = float(len(a.intersection(b))) / len(a.union(b))
    
    # tweak this parameter. Closer to 0.0 the more diverse. 1.0 is the user.
    if j > 0.3 and j != 1.0:
      new_content.append(new)
      similar_users.append(user)

  # flatten the list with the sets
  new_content = list(itertools.chain(*new_content))

  df_recommendations = df[df['User-ID'].isin(similar_users) & df['ISBN'].isin(new_content)]

  df_recommendations.sort_values('Book-Rating', ascending=False)

  return df_recommendations

get_jaccard_recommendations(id)