<a href="https://colab.research.google.com/github/sahug/ds-bert/blob/main/BERT%20NLP%20-%20Recommender%20System%20using%20BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**BERT NLP - Recommender System using BERT**

Gives recommendation based on previous experiance. 

**Load Dataset**

In [1]:
import zipfile
path_to_zip_file = "/content/US_videos_data.csv.zip"
directory_to_extract_to = "/content/"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

In [2]:
import pandas as pd
df = pd.read_csv("/content/US_videos_data.csv")

In [3]:
df = df.drop_duplicates(subset = ['title'])

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description
0,0,IhO1FcjDMV4,Meatpacking: Last Week Tonight with John Olive...,2021-02-22T07:30:01Z,UC3XTzVzaHQEd30rQbuvCtTQ,LastWeekTonight,24,21.22.02,[none],1067147,60111,1221,6866,https://i.ytimg.com/vi/IhO1FcjDMV4/default.jpg,False,False,The pandemic has thrown into high relief some ...
1,1,p4Wy84AOzj0,"Best 3D Pen Art Wins $5,000 Challenge! | ZHC C...",2021-02-21T20:43:28Z,UCPAk4rqVIwg1NCXh61VJxbg,ZHC Crafts,26,21.22.02,[none],1047854,50662,690,4879,https://i.ytimg.com/vi/p4Wy84AOzj0/default.jpg,False,False,I can't believe we made art using 3d pens and ...
2,2,4eKXwKDdXYA,100 Days - [Minecraft Superflat],2021-02-20T18:00:01Z,UC9FkeEFIGd9FXRfxpTltXYA,Luke TheNotable,20,21.22.02,luke thenotable|luke|the|notable|luke thenotab...,6133266,372753,7961,39551,https://i.ytimg.com/vi/4eKXwKDdXYA/default.jpg,False,False,This video is intended for audiences 13+ years...
3,3,XHR5mt2gBjo,Amazing! Luke Bryan Calls 15-Year-Old Casey Bi...,2021-02-22T01:53:12Z,UCAMPco9PqjBbI_MLsDOO4Jw,American Idol,24,21.22.02,American Idol|singing competition|Katy Perry|R...,790238,14267,129,1257,https://i.ytimg.com/vi/XHR5mt2gBjo/default.jpg,False,False,Small but MIGHTY! Casey Bishop completely blow...
4,4,C-icyHEb7W4,Game Theory: Did Reddit Just SOLVE FNAF?,2021-02-20T19:05:26Z,UCo_IB5145EVNcf8hw1Kku7w,The Game Theorists,20,21.22.02,fnaf|five nights at freddy's|fnaf 4|fnaf theor...,3248661,225780,2872,31885,https://i.ytimg.com/vi/C-icyHEb7W4/default.jpg,False,False,Get Yourself Some BRAND NEW Theory Wear! ► htt...


In [5]:
ds = df.copy()

**Data Preperation**

In [6]:
ds = ds[["title", "description"]]
ds.head()

Unnamed: 0,title,description
0,Meatpacking: Last Week Tonight with John Olive...,The pandemic has thrown into high relief some ...
1,"Best 3D Pen Art Wins $5,000 Challenge! | ZHC C...",I can't believe we made art using 3d pens and ...
2,100 Days - [Minecraft Superflat],This video is intended for audiences 13+ years...
3,Amazing! Luke Bryan Calls 15-Year-Old Casey Bi...,Small but MIGHTY! Casey Bishop completely blow...
4,Game Theory: Did Reddit Just SOLVE FNAF?,Get Yourself Some BRAND NEW Theory Wear! ► htt...


**Preprocessor and Encoder**

In [7]:
%pip install -qq tensorflow_hub
%pip install -qq tensorflow_text

In [8]:
import tensorflow_hub as hub
import tensorflow_text as text

In [9]:
preprocessor = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1", trainable=True)

In [10]:
import tensorflow as tf

def get_bert_embeddings(text, preprocessor, encoder):
  text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
  encoder_inputs = preprocessor(text_input)
  outputs = encoder(encoder_inputs)
  embedding_model = tf.keras.Model(text_input, outputs['pooled_output'])
  sentences = tf.constant([text])
  return embedding_model(sentences)

In [11]:
ds['encodings'] = ds['title'].apply(lambda x: get_bert_embeddings(x, preprocessor, encoder))

In [12]:
ds.head()

Unnamed: 0,title,description,encodings
0,Meatpacking: Last Week Tonight with John Olive...,The pandemic has thrown into high relief some ...,"((tf.Tensor(-0.99999803, shape=(), dtype=float..."
1,"Best 3D Pen Art Wins $5,000 Challenge! | ZHC C...",I can't believe we made art using 3d pens and ...,"((tf.Tensor(-0.9999718, shape=(), dtype=float3..."
2,100 Days - [Minecraft Superflat],This video is intended for audiences 13+ years...,"((tf.Tensor(-0.99999654, shape=(), dtype=float..."
3,Amazing! Luke Bryan Calls 15-Year-Old Casey Bi...,Small but MIGHTY! Casey Bishop completely blow...,"((tf.Tensor(-0.99996674, shape=(), dtype=float..."
4,Game Theory: Did Reddit Just SOLVE FNAF?,Get Yourself Some BRAND NEW Theory Wear! ► htt...,"((tf.Tensor(-0.99998075, shape=(), dtype=float..."


**Finding Similarity**

In [13]:
import re

def preprocess_text(intrest):
  # text = input()
  text = intrest.lower()
  text = re.sub('[^A-Za-z0-9]+', ' ', text)
  return text

In [14]:
from sklearn.metrics.pairwise import cosine_similarity

def get_recommendation(intrest):
  query_text = preprocess_text(intrest)
  query_encoding = get_bert_embeddings(query_text, preprocessor, encoder)
  ds['similarity_score'] = ds['encodings'].apply(lambda x: cosine_similarity(x, query_encoding)[0][0])
  result = ds.sort_values(by=['similarity_score'], ascending=False)
  return result

In [15]:
get_recommendation("Action, Hollywood, Thrillers")

Unnamed: 0,title,description,encodings,similarity_score
1234,Tom and Jerry - Movie Review,Head to http://www.squarespace.com/chrisstuckm...,"((tf.Tensor(-0.9999908, shape=(), dtype=float3...",0.978082
1836,Raya and the Last Dragon - Movie Review,Chris Stuckmann reviews Raya and the Last Drag...,"((tf.Tensor(-0.99999577, shape=(), dtype=float...",0.974213
427,Honest Trailers | The Simpsons Movie,►►Subscribe to ScreenJunkies!► https://fandom....,"((tf.Tensor(-0.99999416, shape=(), dtype=float...",0.969347
4039,Jazmine Sullivan - Pick Up Your Feelings (Offi...,“Pick Up Your Feelings” (Official Video) out n...,"((tf.Tensor(-0.99999624, shape=(), dtype=float...",0.967798
150,"Sam Fischer, Demi Lovato - What Other People S...",Sam Fischer & Demi Lovato – What Other People ...,"((tf.Tensor(-0.9999976, shape=(), dtype=float3...",0.965828
...,...,...,...,...
5842,OLYMPIQUE LYONNAIS - PARIS SAINT-GERMAIN (2 - ...,OLYMPIQUE LYONNAIS vs PARIS SAINT-GERMAIN High...,"((tf.Tensor(-0.99995637, shape=(), dtype=float...",0.725659
5820,Resumen y goles | Costa Rica 0-3 México | Preo...,"Gustó, goleó y calificó... El Tri dio un treme...","((tf.Tensor(-0.99995387, shape=(), dtype=float...",0.724430
4209,"🔴 Live: Grammys 2021 Red Carpet | March 14th, ...",Watch interviews with music's biggest stars li...,"((tf.Tensor(-0.9999675, shape=(), dtype=float3...",0.717134
4100,2021 F1 Testing | Day 1 Afternoon Session | Ba...,Watch all of F1 2021 Testing Live on F1 TV - h...,"((tf.Tensor(-0.9999534, shape=(), dtype=float3...",0.708580


In [16]:
get_recommendation("Arsenal, Europa league, Premier league")

Unnamed: 0,title,description,encodings,similarity_score
59,LIVERPOOL 0-2 EVERTON | PREMIER LEAGUE HIGHLIGHTS,Everton recorded a long overdue win at Anfield...,"((tf.Tensor(-0.9998858, shape=(), dtype=float3...",0.948221
5611,Texas vs. Abilene Christian - First Round NCAA...,The opening round finished with another stunne...,"((tf.Tensor(-0.99992967, shape=(), dtype=float...",0.921210
2223,Barcelona’s EPIC comeback! All the action from...,Barcelona might win some silverware yet this s...,"((tf.Tensor(-0.9993373, shape=(), dtype=float3...",0.918895
5608,Loyola Chicago vs. Illinois - Second Round NCA...,The first one seed fell in the 2021 NCAA Tourn...,"((tf.Tensor(-0.99975157, shape=(), dtype=float...",0.913364
135,FC Porto vs. Juventus: Extended Highlights | U...,Presented by HeinekenA bittersweet win for Por...,"((tf.Tensor(-0.999879, shape=(), dtype=float32...",0.912981
...,...,...,...,...
253,What Your Fortnite Skin Says Of You..,What Your Fortnite Skin Says About You.. This ...,"((tf.Tensor(-0.9999835, shape=(), dtype=float3...",0.540596
2829,i'm moving on... (the full explanation),"SIMPLISTIC SATURDAY - it's a sad day, but i ...","((tf.Tensor(-0.99998415, shape=(), dtype=float...",0.537656
5642,He Was The Heart & Soul of The Legion Of Boom....,Get 20% OFF + Free Shipping @Manscaped with co...,"((tf.Tensor(-0.99989814, shape=(), dtype=float...",0.533453
5639,It’s The Fact That Huda Beauty...,Hey guys! Today I’ll be testing the newly refo...,"((tf.Tensor(-0.99998856, shape=(), dtype=float...",0.532366


In [17]:
get_recommendation("Music, Taylor Swift, Imagine Dragons")

Unnamed: 0,title,description,encodings,similarity_score
2023,Tate McRae - slower (Lyric Video),get slower: https://smarturl.it/slowerxstream ...,"((tf.Tensor(-0.99998844, shape=(), dtype=float...",0.972059
4054,Imagine Dragons - Follow You (Lyric Video),"Listen to Follow You + Cutthroat, out now: htt...","((tf.Tensor(-0.99998516, shape=(), dtype=float...",0.963154
4062,Nick Jonas - Spaceman (Official Video),Listen to “Spaceman” at https://NickJonas.lnk....,"((tf.Tensor(-0.99999577, shape=(), dtype=float...",0.961594
423,DRAGON BALL LEGENDS Video and Stuff #12,*Although the video says the new Broly appeare...,"((tf.Tensor(-0.99999416, shape=(), dtype=float...",0.961167
3606,Maroon 5 - Beautiful Mistakes ft. Megan Thee S...,“Beautiful Mistakes” featuring Megan Thee Stal...,"((tf.Tensor(-0.99999565, shape=(), dtype=float...",0.959511
...,...,...,...,...
4214,What Happened to My Nail,Would have rather broken my jaw than a nail tb...,"((tf.Tensor(-0.9999665, shape=(), dtype=float3...",0.730034
5007,Where have I been?,,"((tf.Tensor(-0.99996555, shape=(), dtype=float...",0.729173
17,i broke my foot...,lol oops,"((tf.Tensor(-0.99987155, shape=(), dtype=float...",0.718878
613,How Reddit almost CRASHED the Economy with a m...,The GameStop short squeeze has probably been o...,"((tf.Tensor(-0.99996406, shape=(), dtype=float...",0.715393
