## Movie Recommendation System with LLaMA AI

#### What is LLaMA? 
#### LLaMA, which stands for "LLaMA: Language Model Meta AI," is part of Meta's ongoing research in the field of natural language processing and machine learning. 
###### Tutorial Reference: https://www.youtube.com/watch?v=epidA1fBFtI&t=52s
###### Kaggle Dataset: https://www.kaggle.com/datasets/shivamb/netflix-shows

##### Installing the model by command: ollama pull llama2

In [1]:
# pip3 install numpy pandas faiss-gpu requests
import pandas as pd
import faiss 
import requests 
import numpy as np

df = pd.read_csv('netflix_titles.csv')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [2]:
def create_textual_representation(row):
    textual_representation = f"""
    Type: {row['type']},
    Title: {row['title']}, 
    Cast: {row['cast']},
    Released {row['release_year']},
    Genres: {row['listed_in']},
    Description: {row['description']},
    """
    return textual_representation

In [3]:
# Apply the function 'create_textual_representation' to each row of the DataFrame 'df'
# and assign the result to a new column called 'textual_representation'.
# The 'axis=1' argument specifies that the function should be applied to each row.
df['textual_representation'] = df.apply(create_textual_representation, axis=1)

In [4]:
print(df['textual_representation'].values[0])


    Type: Movie,
    Title: Dick Johnson Is Dead, 
    Cast: nan,
    Released 2020,
    Genres: Documentaries,
    Description: As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.,
    


In [5]:
dim = 4096
index = faiss.IndexFlatL2(dim) # creates a FAISS index for storing embeddings using L2 (Euclidean) distance
size = len(df['textual_representation'])
X = np.zeros((size,dim), dtype='float32') # creates a NumPy array of zeros with shape (size, dim) to store the embedding

print(X)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [None]:
try:
    for i, representation in enumerate(df['textual_representation']):
        if (i % 200 == 0):
            # prints progress every 200 instances
            print(f'Processed {str(i)} instances')
        # sends text data to API to get embeddings and stores them in array then add to index
        res = requests.post('http://localhost:8888/api/embeddings', json = {'model':'llama2', 'prompt': representation})
        embedding = res.json()['embedding']
        X[i] = np.array(embedding)
    
    index.add(X)
except Exception as e:
    print(e)

print('Processing completed')

In [None]:
faiss.write_index(index, 'index') # saves FAISS index to file and store index on disk 

In [None]:
index = faiss.read_index('index') # loads index from file named 'index'

In [None]:
keyword = 'Shutter'
# Filters the DataFrame df to include only rows where the 'title' column contains the keyword
df[df['title'].str.contains(keyword)]

show_id = 1200 # Selects the row at index 1200 from the filtered DataFrame
fav_movie = df.iloc[show_id]
print(fav_movie)

# Sends a POST request to a local API and get embeddings for textual representation
response = requests.post('http://localhost:8888/api/embeddings', json = {'model':'llama2', 'prompt': fav_movie['textual_representation']})
# Converts the response JSON containing the embedding into a NumPy array of type float32
embedding = np.array(res.json()['embedding'], dtype='float32')

In [None]:
# Searches the FAISS index for the top 5 closest matches to the given embedding. 
# D contains the distances, and I contains the indices of the top matches.
D, I = index.search(embedding, 5) # get top five matching
print(I)

# Retrieves the textual representations corresponding to the top match indices.
best_matches = np.array(df['textual_representation'])[I.flatten()]

for match in best_matches:
    print(f'MATCH:\n{match}\n')