## YouTube Recomendation system

### Content-based filtering:
### is a recommendation system that makes recommendations to users based on their past interactions with content. The idea behind content-based filtering is to recommend content that is similar to what the user has already shown an interest in.

In [1]:
#pip install --upgrade pandas==1.3.4

In [2]:
import joblib
import pickle

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk.corpus import stopwords
import re
import nltk
from surprise import Reader, Dataset, SVD, SVDpp, model_selection, NormalPredictor, KNNBasic, KNNWithMeans, KNNWithZScore, KNNBaseline, BaselineOnly, NMF, SlopeOne, CoClustering, accuracy
from surprise.accuracy import rmse
from surprise.model_selection import cross_validate, train_test_split

# Set up inline plotting
%matplotlib inline

# Silence warnings
import warnings
warnings.simplefilter('ignore')

# Uncomment if needed:
# nltk.download('stopwords')

In [4]:
pd.__version__

'1.3.4'

In [5]:
youtube= pd.read_csv("IN_youtube_trending_data.csv")

In [6]:
youtube=youtube[:30000]

In [7]:
youtube.head(2)

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description
0,Iot0eF6EoNA,Sadak 2 | Official Trailer | Sanjay | Pooja | ...,2020-08-12T04:31:41Z,UCGqvJPRcv7aVFun-eTsatcA,FoxStarHindi,24,2020-08-12T00:00:00Z,sadak|sadak 2|mahesh bhatt|vishesh films|pooja...,9885899,224925,3979409,350210,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,False,False,Three Streams. Three Stories. One Journey. Sta...
1,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tan...,2020-08-11T09:00:11Z,UCm9SZAl03Rev9sFwloCdz1g,Rehaan Records,10,2020-08-12T00:00:00Z,[None],11308046,655450,33242,405146,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,False,False,Singer/Lyrics: Karan Aujla Feat Tania Music/ D...


In [8]:
quartiles = youtube['likes'].quantile([0, 0.2, 0.4, 0.6, 0.8, 1.0])

In [9]:
# Create a new column for ratings based on the quartiles
youtube['Rating'] = pd.cut(youtube['likes'], bins=quartiles, labels=[0,1,2,3,4])

In [10]:
youtube.head(5)

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,Rating
0,Iot0eF6EoNA,Sadak 2 | Official Trailer | Sanjay | Pooja | ...,2020-08-12T04:31:41Z,UCGqvJPRcv7aVFun-eTsatcA,FoxStarHindi,24,2020-08-12T00:00:00Z,sadak|sadak 2|mahesh bhatt|vishesh films|pooja...,9885899,224925,3979409,350210,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,False,False,Three Streams. Three Stories. One Journey. Sta...,4
1,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tan...,2020-08-11T09:00:11Z,UCm9SZAl03Rev9sFwloCdz1g,Rehaan Records,10,2020-08-12T00:00:00Z,[None],11308046,655450,33242,405146,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,False,False,Singer/Lyrics: Karan Aujla Feat Tania Music/ D...,4
2,KX06ksuS6Xo,Diljit Dosanjh: CLASH (Official) Music Video |...,2020-08-11T07:30:02Z,UCZRdNleCgW-BGUJf-bbjzQg,Diljit Dosanjh,10,2020-08-12T00:00:00Z,clash diljit dosanjh|diljit dosanjh|diljit dos...,9140911,296533,6179,30058,https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg,False,False,CLASH official music video performed by DILJIT...,4
3,UsMRgnTcchY,Dil Ko Maine Di Kasam Video | Amaal M Ft.Ariji...,2020-08-10T05:30:49Z,UCq-Fj5jknLsUf-MWSy4_brA,T-Series,10,2020-08-12T00:00:00Z,hindi songs|2020 hindi songs|2020 new songs|t-...,23564512,743931,84162,136942,https://i.ytimg.com/vi/UsMRgnTcchY/default.jpg,False,False,Gulshan Kumar and T-Series presents Bhushan Ku...,4
4,WNSEXJJhKTU,"Baarish (Official Video) Payal Dev,Stebin Ben ...",2020-08-11T05:30:13Z,UCye6Oz0mg46S362LwARGVcA,VYRLOriginals,10,2020-08-12T00:00:00Z,VYRL Original|Mohsin Khan|Shivangi Joshi|Payal...,6783649,268817,8798,22984,https://i.ytimg.com/vi/WNSEXJJhKTU/default.jpg,False,False,VYRL Originals brings to you ‘Baarish’ - the b...,4


In [11]:
youtube['Rating'].unique()

[4, 2, 1, 3, 0, NaN]
Categories (5, int64): [0 < 1 < 2 < 3 < 4]

In [12]:
youtube.dropna(inplace=True)

In [13]:
# Get the number of rows in the youtube DataFrame
num_rows = len(youtube)

# Create a list of values for the new column
values = list(range(1, num_rows + 1))

# Assign the new column to 'userID' in the youtube DataFrame
youtube['userID'] = values

In [14]:
youtube.shape

(29400, 18)

In [15]:
youtube.columns

Index(['video_id', 'title', 'publishedAt', 'channelId', 'channelTitle',
       'categoryId', 'trending_date', 'tags', 'view_count', 'likes',
       'dislikes', 'comment_count', 'thumbnail_link', 'comments_disabled',
       'ratings_disabled', 'description', 'Rating', 'userID'],
      dtype='object')

In [16]:
youtube.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29400 entries, 0 to 29999
Data columns (total 18 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   video_id           29400 non-null  object  
 1   title              29400 non-null  object  
 2   publishedAt        29400 non-null  object  
 3   channelId          29400 non-null  object  
 4   channelTitle       29400 non-null  object  
 5   categoryId         29400 non-null  int64   
 6   trending_date      29400 non-null  object  
 7   tags               29400 non-null  object  
 8   view_count         29400 non-null  int64   
 9   likes              29400 non-null  int64   
 10  dislikes           29400 non-null  int64   
 11  comment_count      29400 non-null  int64   
 12  thumbnail_link     29400 non-null  object  
 13  comments_disabled  29400 non-null  bool    
 14  ratings_disabled   29400 non-null  bool    
 15  description        29400 non-null  object  
 16  Rati

In [17]:
YouTube_df = youtube[['userID','video_id','title','channelTitle','categoryId','tags','likes','description','channelId','thumbnail_link','Rating']]

In [18]:
YouTube_df.head(2)

Unnamed: 0,userID,video_id,title,channelTitle,categoryId,tags,likes,description,channelId,thumbnail_link,Rating
0,1,Iot0eF6EoNA,Sadak 2 | Official Trailer | Sanjay | Pooja | ...,FoxStarHindi,24,sadak|sadak 2|mahesh bhatt|vishesh films|pooja...,224925,Three Streams. Three Stories. One Journey. Sta...,UCGqvJPRcv7aVFun-eTsatcA,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4
1,2,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tan...,Rehaan Records,10,[None],655450,Singer/Lyrics: Karan Aujla Feat Tania Music/ D...,UCm9SZAl03Rev9sFwloCdz1g,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4


In [19]:
YouTube_df.title

0        Sadak 2 | Official Trailer | Sanjay | Pooja | ...
1        Kya Baat Aa : Karan Aujla (Official Video) Tan...
2        Diljit Dosanjh: CLASH (Official) Music Video |...
3        Dil Ko Maine Di Kasam Video | Amaal M Ft.Ariji...
4        Baarish (Official Video) Payal Dev,Stebin Ben ...
                               ...                        
29995    India celebrate a win for the ages at the Gabb...
29996    Jethalal Ke Haath Ki Chai | Taarak Mehta Ka Oo...
29997    Bruised, abused but conquered! India stun Aust...
29998    Kumkum Bhagya | Premiere Episode 1751 Preview ...
29999    SISTERS Season 2 | Episode 5 | Girl Formula | ...
Name: title, Length: 29400, dtype: object

In [20]:
YouTube_df["title"] = YouTube_df["title"].str.split('|').str[0].str.strip()

In [21]:
YouTube_df.head(2)

Unnamed: 0,userID,video_id,title,channelTitle,categoryId,tags,likes,description,channelId,thumbnail_link,Rating
0,1,Iot0eF6EoNA,Sadak 2,FoxStarHindi,24,sadak|sadak 2|mahesh bhatt|vishesh films|pooja...,224925,Three Streams. Three Stories. One Journey. Sta...,UCGqvJPRcv7aVFun-eTsatcA,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4
1,2,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tania,Rehaan Records,10,[None],655450,Singer/Lyrics: Karan Aujla Feat Tania Music/ D...,UCm9SZAl03Rev9sFwloCdz1g,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4


In [22]:
YouTube_df['tags'] = YouTube_df['tags'].str.replace('|', ' ')

In [23]:
YouTube_df.head(3)

Unnamed: 0,userID,video_id,title,channelTitle,categoryId,tags,likes,description,channelId,thumbnail_link,Rating
0,1,Iot0eF6EoNA,Sadak 2,FoxStarHindi,24,sadak sadak 2 mahesh bhatt vishesh films pooja...,224925,Three Streams. Three Stories. One Journey. Sta...,UCGqvJPRcv7aVFun-eTsatcA,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4
1,2,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tania,Rehaan Records,10,[None],655450,Singer/Lyrics: Karan Aujla Feat Tania Music/ D...,UCm9SZAl03Rev9sFwloCdz1g,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4
2,3,KX06ksuS6Xo,Diljit Dosanjh: CLASH (Official) Music Video,Diljit Dosanjh,10,clash diljit dosanjh diljit dosanjh diljit dos...,296533,CLASH official music video performed by DILJIT...,UCZRdNleCgW-BGUJf-bbjzQg,https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg,4


In [24]:
YouTube_df.isnull().sum()

userID            0
video_id          0
title             0
channelTitle      0
categoryId        0
tags              0
likes             0
description       0
channelId         0
thumbnail_link    0
Rating            0
dtype: int64

In [25]:
YouTube_df.dropna(inplace=True)

In [26]:
YouTube_df.drop_duplicates(subset=['title'], inplace=True)

In [27]:
#YouTube_df['description'].apply(lambda x:x.split())
YouTube_df['description'][1]#apply(lambda x:x.split())

'Singer/Lyrics: Karan Aujla Feat Tania Music/ Desi Crew Mix & Master / Dc Studio’sVideo/ Sukh Sanghera Project By / Deep Rehaan Sukh Bajwa & Jeewan Chahal Produced By / Sandeep RehaanLabel / Rehaan RecordsOnline Promotions - Global Digital SolutionDigital Partner / Coin Digital Social media Promotions: Gk DigitialSpotify: https://spoti.fi/3kDfnNuYoutube Music: https://bit.ly/31N2tUkGaana: https://bit.ly/2PHrUkLhttps://wynk.in/u/SSWNC5VO3Website: WWW.RehaanRecords.CAFB: https://m.facebook.com/RehaanRecords/INSTA: Instagram/rehaanrecords'

In [28]:
YouTube_df.head(2)

Unnamed: 0,userID,video_id,title,channelTitle,categoryId,tags,likes,description,channelId,thumbnail_link,Rating
0,1,Iot0eF6EoNA,Sadak 2,FoxStarHindi,24,sadak sadak 2 mahesh bhatt vishesh films pooja...,224925,Three Streams. Three Stories. One Journey. Sta...,UCGqvJPRcv7aVFun-eTsatcA,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4
1,2,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tania,Rehaan Records,10,[None],655450,Singer/Lyrics: Karan Aujla Feat Tania Music/ D...,UCm9SZAl03Rev9sFwloCdz1g,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4


In [29]:
YouTube_df['joint_tags'] = YouTube_df['channelTitle'] + YouTube_df['description'] + YouTube_df['tags'] 


In [30]:
YouTube_df['joint_tags'][:5]

0    FoxStarHindiThree Streams. Three Stories. One ...
1    Rehaan RecordsSinger/Lyrics: Karan Aujla Feat ...
2    Diljit DosanjhCLASH official music video perfo...
3    T-SeriesGulshan Kumar and T-Series presents Bh...
4    VYRLOriginalsVYRL Originals brings to you ‘Baa...
Name: joint_tags, dtype: object

In [31]:
YouTube_df.isnull().sum()

userID            0
video_id          0
title             0
channelTitle      0
categoryId        0
tags              0
likes             0
description       0
channelId         0
thumbnail_link    0
Rating            0
joint_tags        0
dtype: int64

In [32]:
#YouTube_df['joint_tags']=YouTube_df['joint_tags'].apply(lambda x:x.split())

In [33]:
YouTube_df['joint_tags'][1]

'Rehaan RecordsSinger/Lyrics: Karan Aujla Feat Tania Music/ Desi Crew Mix & Master / Dc Studio’sVideo/ Sukh Sanghera Project By / Deep Rehaan Sukh Bajwa & Jeewan Chahal Produced By / Sandeep RehaanLabel / Rehaan RecordsOnline Promotions - Global Digital SolutionDigital Partner / Coin Digital Social media Promotions: Gk DigitialSpotify: https://spoti.fi/3kDfnNuYoutube Music: https://bit.ly/31N2tUkGaana: https://bit.ly/2PHrUkLhttps://wynk.in/u/SSWNC5VO3Website: WWW.RehaanRecords.CAFB: https://m.facebook.com/RehaanRecords/INSTA: Instagram/rehaanrecords[None]'

In [34]:
new = YouTube_df.drop(columns=['channelTitle','categoryId','tags','description','channelId'])#'likes'
#new.head()

In [35]:
# new['joint_tags'] = new['joint_tags'].apply(lambda x: " ".join(x))
new.head()

Unnamed: 0,userID,video_id,title,likes,thumbnail_link,Rating,joint_tags
0,1,Iot0eF6EoNA,Sadak 2,224925,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4,FoxStarHindiThree Streams. Three Stories. One ...
1,2,x-KbnJ9fvJc,Kya Baat Aa : Karan Aujla (Official Video) Tania,655450,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4,Rehaan RecordsSinger/Lyrics: Karan Aujla Feat ...
2,3,KX06ksuS6Xo,Diljit Dosanjh: CLASH (Official) Music Video,296533,https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg,4,Diljit DosanjhCLASH official music video perfo...
3,4,UsMRgnTcchY,Dil Ko Maine Di Kasam Video,743931,https://i.ytimg.com/vi/UsMRgnTcchY/default.jpg,4,T-SeriesGulshan Kumar and T-Series presents Bh...
4,5,WNSEXJJhKTU,"Baarish (Official Video) Payal Dev,Stebin Ben",268817,https://i.ytimg.com/vi/WNSEXJJhKTU/default.jpg,4,VYRLOriginalsVYRL Originals brings to you ‘Baa...


In [36]:
from transformers import AutoTokenizer, AutoModel
import torch
import numpy as np
import pandas as pd

# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [37]:
def generate_embeddings(texts, max_seq_length=256):
    """Generate embeddings for a list of texts and return as a NumPy array."""
    embeddings = []
    for text in texts:
        inputs = tokenizer(text, return_tensors="pt", padding='max_length', truncation=True, max_length=max_seq_length)
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
        with torch.no_grad():
            outputs = model(**inputs)
        embedding = outputs.last_hidden_state.mean(dim=1).cpu().numpy()
        embeddings.append(embedding)
    return np.vstack(embeddings)


In [38]:
# Apply the function to the entire column and store the results 
embeddings_array = generate_embeddings(new['joint_tags'])

# embeddings_array now contains your embeddings
print(embeddings_array)

[[ 0.1060425  -0.06211875  0.37668958 ...  0.05721436  0.09904254
  -0.250424  ]
 [ 0.16316964 -0.11441816  0.33752966 ...  0.07034065  0.00362955
  -0.21658388]
 [ 0.07894272 -0.07638603  0.34717637 ...  0.11601932  0.08807094
  -0.33818817]
 ...
 [ 0.06445204 -0.1905932   0.4141702  ... -0.00462532 -0.05667549
  -0.01262415]
 [ 0.10386483 -0.10505175  0.2714658  ... -0.05957988  0.10783472
  -0.1528971 ]
 [ 0.04888593  0.03631538  0.5183073  ...  0.01218523  0.01840801
   0.06771469]]


In [39]:
embeddings_array.shape

(7312, 768)

In [40]:
# from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
# # Define additional parameters
# ngram_range = (2,3)
# max_features = 4000

# # Create the TF-IDF vectorizer with the specified parameters
# tfidf_vectorizer = TfidfVectorizer(
#     ngram_range=ngram_range,
#     max_features=max_features
# )


In [41]:
# vector = tfidf_vectorizer.fit_transform(new['cleen_joint_tags']).toarray()

In [42]:
from sklearn.preprocessing import MinMaxScaler ,StandardScaler
# Apply min-max scaling to your data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(embeddings_array)

In [43]:
scaled_data

array([[-0.02449328,  0.03028919, -0.03922281, ...,  0.66897017,
         0.6567135 , -0.5929439 ],
       [ 0.29304612, -0.3708813 , -0.24385814, ...,  0.78518724,
        -0.15469514, -0.39128333],
       [-0.1751266 , -0.07915008, -0.19344798, ...,  1.1896156 ,
         0.5634091 , -1.1159495 ],
       ...,
       [-0.25567257, -0.95519346,  0.15663712, ...,  0.12145597,
        -0.6675396 ,  0.8241567 ],
       [-0.0365978 , -0.29903483, -0.5890835 , ..., -0.36509895,
         0.73148364, -0.01176011],
       [-0.34219632,  0.7853431 ,  0.7008194 , ...,  0.27029264,
        -0.02901652,  1.3029132 ]], dtype=float32)

In [44]:
from sklearn.decomposition import PCA

pca = PCA(n_components=0.99)

vector_reduced = pca.fit_transform(scaled_data)

In [45]:
vector_reduced.shape

(7312, 429)

In [46]:
from sklearn.neighbors import NearestNeighbors

# Assuming 'X' is your feature matrix and 'k' is the number of neighbors to consider
knn_model = NearestNeighbors(n_neighbors=6, metric='cosine')
knn_model.fit(vector_reduced)


In [47]:
new[new['title'] == 'Dil Ko Maine Di Kasam Video'].index[0]

3

### Prediction 

In [48]:
def recommend(movie):
    index = new[new['title'] == movie].index[0]
    query_vector = vector_reduced[index].reshape(1, -1)  # Reshape the query vector for KNN
    distances, indices = knn_model.kneighbors(query_vector)

    # 'indices' contains the indices of the k-nearest neighbors, including the query itself
    # 'distances' contains the corresponding distances/similarities

    # Print the top 5 recommended movies (excluding the query itself)
    recommended_movies = new.iloc[indices[0][1:6]]
    for movie in recommended_movies['title']:
        print(movie)

In [49]:
with open("KNN_model.pkl", "wb") as f: 
    joblib.dump(knn_model, f) 

In [50]:
recommend('Diljit Dosanjh: CLASH (Official) Music Video')

PEED: Diljit Dosanjh (Official) Music Video
Peed Diljit Dosanjh Lyrical Video Song
Diljit Dosanjh: Born To Shine (Official Music Video) G.O.A.T
Diljit Dosanjh: Welcome To My Hood (Official Music Video)
ASLE - Gurman Sandhu


In [51]:
recommend('NEW! Ep 2967 - Jethalal Drops Babita!')

Noodle Party!
NEW! Ep 3021 - Gokuldham Mein Disturbance
NEW! Ep 2997 - Police in Gokuldham!
NEW! Ep 2974 - Taraak Mehta Gets Late!
NEW! Ep 2993 - Special Juice for Goli!


In [52]:
recommend("first Crazy RIDE on Kawasaki NINJA H2")

Finally bought my DreamBike - Ninja H2
bass Life mai HAYABUSA honi chaheye
I Surprised my Friends With NINJA H2!!
No Competition : Jass Manak Ft DIVINE (Full Video) Satti Dhillon
Yaara Tere Warga : Jass Manak (Official Video) Sunidhi Chauhan


In [53]:
recommend('IPL 2020 - Patanjali IPL As Sponsor With 10 Big News')

Top of Switzerland 🇨🇭
My German Girlfriend's Home
Top of the Island Sunset
Best Cinema in the World!
Taarak Mehta Ka Ooltah Chashmah - Ep 2909 - Full Episode - 20th January 2020


### Some advantages of content-based filtering are that it does not rely on other users' behavior, it can handle new items without a large user base, and it can provide personalized recommendations even for niche interests. However, one limitation is that it may not be able to capture the diversity of user interests, as it tends to recommend similar items to what the user has already consumed.

In [54]:
import pickle
with open("Embedding_vector.pkl", "wb") as f: 
    joblib.dump(vector_reduced, f) 

## colaborative recomendation ststem

### Scikit-learn's Surprise library is a popular Python library for building collaborative filtering recommendation systems. Here's a general overview of how to build a collaborative recommendation system using the Surprise library:

#### Data preparation: The first step is to prepare your data in a format that can be used by the Surprise library. This typically involves creating a user-item matrix, where each row represents a user and each column represents an item. The values in the matrix represent the user's rating or interaction with the item.

In [55]:
new = new.reset_index()
titles = new[['title']]
indices = pd.Series(new.index, index=new['title'])

In [56]:
indices[100:150]

title
TSP's Rabish ki report Unlock 2.0                                                                       100
BADSHAH – BKL (Official Lyrical Video)                                                                  101
அட இவ்வளவு நாள் இது தெரியாம போச்சே                                                                      102
GEOMETRIC SHAPE FOOD CHALLENGE                                                                          103
DIPIKA’ S BIRTHDAY CELEBRATION                                                                          104
DNA: Russia August 12 को लॉन्च करेगा Corona Vaccine                                                     105
MASTER CHEF                                                                                             106
We Made 4 Wheeler Quad-Cycle                                                                            107
Yeh Rishta Kya Kehlata Hai: Meet Naira's daughter!                                                      108
રણછોડ રંગીલા ( ગોવાળીય

In [57]:
#pickle.dump(indices,open('indices.pkl','wb'))
with open("indices.pkl", "wb") as f: 
    joblib.dump(indices, f) 

### Create the dataset: Use Surprise's Dataset module to create a dataset object from your Pandas DataFrame.

In [58]:
reader=Reader(rating_scale=(0, 4))
data = Dataset.load_from_df(youtube[['categoryId','video_id','Rating']], reader)#  'categoryId','video_id','channelId'

### Select an algorithm: Choose a collaborative filtering algorithm to use with the Surprise library. Some popular algorithms include Singular Value Decomposition (SVD), k-Nearest Neighbors (k-NN), and Non-negative Matrix Factorization (NMF).

In [59]:
benchmark = []
# Iterate over all algorithms
for algorithm in [SVD(), SVDpp(), SlopeOne(), NMF(), NormalPredictor(), KNNBaseline(), KNNBasic()]:
    # Perform cross validation
    results = cross_validate(algorithm, data, measures=['MAE'], verbose=False,cv = 3)  # cv = 1 onluy try
    
    # Get results & append algorithm name
    tmp = pd.DataFrame.from_dict(results).mean(axis=0)
    tmp = tmp.append(pd.Series([str(algorithm).split(' ')[0].split('.')[-1]], index=['Algorithm']))
    benchmark.append(tmp)

Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.


In [60]:
surprise_results = pd.DataFrame(benchmark).set_index('Algorithm').sort_values('test_mae')

In [61]:
surprise_results

Unnamed: 0_level_0,test_mae,fit_time,test_time
Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
KNNBaseline,0.271471,0.052099,0.131551
SlopeOne,0.283524,4.55669,18.443906
KNNBasic,0.283847,0.007494,0.10206
NMF,0.293733,0.702378,0.062654
SVDpp,0.346799,162.522396,46.896243
SVD,0.426704,0.387405,0.362012
NormalPredictor,1.516644,0.031652,0.047435


### Hyperparameter tuning: Finally, you may want to fine-tune the hyperparameters of your model to improve its performance. The Surprise library includes functions for performing grid search or random search to find the optimal hyperparameters for your model.

In [62]:
from surprise import Dataset, SVD
from surprise.model_selection import GridSearchCV

In [63]:
knn = KNNBaseline()
param_grid = {"n_epochs": [250,500,650],"k" :[10,20,30,40,50]}
gs = GridSearchCV(KNNBaseline, param_grid, measures=["rmse", "mae"], cv=3)

In [64]:
gs.fit(data)

Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matr

In [65]:

# combination of parameters that gave the best RMSE score
print(gs.best_params["mae"])

{'n_epochs': 250, 'k': 10}


### Model training & Model evaluation: After training the model, evaluate its performance using metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE)

In [66]:
knn_baseline = KNNBaseline(n_epochs= 250,k=10, min_k=1,verbose=False)
cross_validate(knn_baseline, data, measures=['RMSE','MAE'],cv=3)

{'test_rmse': array([0.54682804, 0.5592981 , 0.54715568]),
 'test_mae': array([0.26910918, 0.27455702, 0.27083453]),
 'fit_time': (0.05290627479553223, 0.06251931190490723, 0.0468897819519043),
 'test_time': (0.14154696464538574, 0.12503886222839355, 0.10940933227539062)}

In [67]:
trainset = data.build_full_trainset()
knn_baseline.fit(trainset)

<surprise.prediction_algorithms.knns.KNNBaseline at 0x2cf26d7fd60>

In [68]:
#youtube[youtube['likes'] >= 5000]

### Generate recommendations: Use the trained model to generate personalized recommendations for users based on their past interactions with items.

In [70]:
knn_baseline.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.0041456740223786, details={'was_impossible': False})

In [71]:
# Save the SVD object as a pickle file
with open('KNNBaseline.pkl', 'wb') as f:
    joblib.dump(knn_baseline, f)


In [72]:
#youtube[youtube['categoryId'] ==1]

### Hybrid :
### A hybrid recommendation system combines multiple recommendation techniques, such as content-based and collaborative filtering, to provide more accurate and diverse recommendations to users. The system involves collecting user behavior and preference data, selecting and training the recommendation algorithms, combining the recommendations, and evaluating and refining the system. Additional features, such as user demographics, can be incorporated for further personalization. The key is to fine-tune the system to meet the specific needs of the application.

In [73]:
smd =YouTube_df[['title','likes','categoryId','video_id','thumbnail_link','Rating']]
smd.head(5)

Unnamed: 0,title,likes,categoryId,video_id,thumbnail_link,Rating
0,Sadak 2,224925,24,Iot0eF6EoNA,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4
1,Kya Baat Aa : Karan Aujla (Official Video) Tania,655450,10,x-KbnJ9fvJc,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4
2,Diljit Dosanjh: CLASH (Official) Music Video,296533,10,KX06ksuS6Xo,https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg,4
3,Dil Ko Maine Di Kasam Video,743931,10,UsMRgnTcchY,https://i.ytimg.com/vi/UsMRgnTcchY/default.jpg,4
4,"Baarish (Official Video) Payal Dev,Stebin Ben",268817,10,WNSEXJJhKTU,https://i.ytimg.com/vi/WNSEXJJhKTU/default.jpg,4


In [74]:
id_map= YouTube_df[['video_id','title','categoryId','channelId']].set_index('title')
id_map.head(2)#	movieId	id  title	

Unnamed: 0_level_0,video_id,categoryId,channelId
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sadak 2,Iot0eF6EoNA,24,UCGqvJPRcv7aVFun-eTsatcA
Kya Baat Aa : Karan Aujla (Official Video) Tania,x-KbnJ9fvJc,10,UCm9SZAl03Rev9sFwloCdz1g


In [75]:
indices_map = id_map.set_index('categoryId')

In [76]:
indices_map.head(2)

Unnamed: 0_level_0,video_id,channelId
categoryId,Unnamed: 1_level_1,Unnamed: 2_level_1
24,Iot0eF6EoNA,UCGqvJPRcv7aVFun-eTsatcA
10,x-KbnJ9fvJc,UCm9SZAl03Rev9sFwloCdz1g


In [77]:
#pickle.dump(indices_map,open('indices_map.pkl','wb'))
#indices_map=indices_map.to_csv('indices_map.csv')
with open('indices_map.pkl', 'wb') as f:
    joblib.dump(indices_map, f)

In [78]:
indices_map

Unnamed: 0_level_0,video_id,channelId
categoryId,Unnamed: 1_level_1,Unnamed: 2_level_1
24,Iot0eF6EoNA,UCGqvJPRcv7aVFun-eTsatcA
10,x-KbnJ9fvJc,UCm9SZAl03Rev9sFwloCdz1g
10,KX06ksuS6Xo,UCZRdNleCgW-BGUJf-bbjzQg
10,UsMRgnTcchY,UCq-Fj5jknLsUf-MWSy4_brA
10,WNSEXJJhKTU,UCye6Oz0mg46S362LwARGVcA
...,...,...
10,fDzf3eB6R4E,UCLsMef624nZ4ME6WoSElPTg
17,q3ydfiwIN9U,UCujuVKmt_utAQZJghxlRMIQ
24,kFG5YliTlLU,UC6-F5tO8uklgE9Zy8IvbdFw
24,5lb4DUPXTao,UC6-F5tO8uklgE9Zy8IvbdFw


In [79]:
id_map.loc['Sadak 2']['categoryId']

24

### Prediction 

In [80]:
def hybrid(userId, title):
    index = new[new['title'] == title].index[0]
    query_vector = vector_reduced[index].reshape(1, -1)
    distances, indices = knn_model.kneighbors(query_vector)

    # Extract the indices of the recommended videos (excluding the query itself)
    video_indices = indices[0][1:6]
    
    # Use .iloc to select rows by integer position from the 'smd' DataFrame
    recommended_videos = smd.iloc[video_indices][['title', 'categoryId', 'video_id', 'likes']]
    
    # Calculate the 'est' value for each recommended video based on user preferences
    recommended_videos['est'] = recommended_videos['categoryId'].apply(lambda x: knn_baseline.predict(userId, indices_map.iloc[x]['video_id']).est)
    
    # Sort the recommended videos by the number of likes in descending order
    recommended_videos = recommended_videos.sort_values('likes', ascending=False)
    
    # Return the top 5 recommended videos
    return recommended_videos.head(5)


In [82]:
 knn_baseline.predict(1, indices_map.iloc[24]['video_id'])

Prediction(uid=1, iid='lw4ZqLZSBow', r_ui=None, est=2.793735047079548, details={'actual_k': 0, 'was_impossible': False})

In [83]:
hybrid(1,'Ahmedabad welcomes PM Modi!')

Unnamed: 0,title,categoryId,video_id,likes,est
2281,Precious moments: PM Modi feeding peacocks at ...,25,axbpbQTIiZo,248007,2.121955
13394,PM Modi's address to the nation,25,ClSk5l6OBx4,57306,2.121955
22694,Details of the New Parliament Building,25,_AOzpFXocKI,49356,2.121955
29181,PM Modi launches pan-India rollout of COVID-19...,25,TyzF9r_8VjI,12580,2.121955
20396,PM Modi visits Bharat Biotech facility in Hyde...,25,SnSVmBcZOHQ,8482,2.121955


In [84]:
hybrid(1,"BTS (방탄소년단) ‘Life Goes On’ Official MV : on my pillow")

Unnamed: 0,title,categoryId,video_id,likes,est
18789,BTS (방탄소년단) 'Life Goes On' Official MV,10,-5q5mZbe3V8,5178144,2.636435
18413,BTS (방탄소년단) 'Life Goes On' Official Teaser 1,10,Wq5S8Dt_HQE,3365225,2.636435
18601,BTS (방탄소년단) 'Life Goes On' Official Teaser 2,10,t3KZnCgiMW0,2550351,2.636435
19831,BTS (방탄소년단) 'Life Goes On' Official MV : in th...,10,RvcP6V4h_q4,2218612,2.636435
1314,TXT (투모로우바이투게더) 'Drama [Japanese Ver.]' Offici...,10,UUOGVgComrU,944347,2.636435


In [85]:
hybrid(1,'Apple iPhone 12 Pro')

Unnamed: 0,title,categoryId,video_id,likes,est
23535,5 Masaledaar Gadgets I bought Online !,28,ICE7f9q5GNs,173205,3.403106
28062,Trying 10 Weird Smartphone Life Hacks !,28,4ypTy72o_tM,149462,3.403106
12140,Apple iPhone 12 Series is Here ! *Shocking Ind...,28,3Qgtz3lpvxA,103464,3.403106
17354,iPhone 12 Pro Max,28,s-INp0FxjXA,102540,3.403106
22413,Shifting from iPhone 12 to Nokia 3310 for a Day !,28,8bglQlBonD4,98990,3.403106


In [86]:
hybrid(1,'Baarish (Official Video) Payal Dev,Stebin Ben')

Unnamed: 0,title,categoryId,video_id,likes,est
13618,Chhalaang: Care Ni Karda,10,WGdoaRVOm8o,482466,2.636435
24503,NACHDI TU,10,qwYbkZMOOjU,113212,2.636435
6724,Hasina Pagal Deewani: Indoo Ki Jawani,10,LG6H-r-7IUk,77422,2.636435
17655,Mukhda (Official Video),10,Rn_zZD1YT5s,53677,2.636435
6733,Sumit Goswami,10,P_GCSNFXHgo,34580,2.636435


In [87]:
hybrid(1,'Types of Ex Girlfriends')

Unnamed: 0,title,categoryId,video_id,likes,est
17724,Behan Bhai Aur Diwali,22,y4WnqX9QwR0,21557,2.316475
4020,Rich Mom Vs Normal Mom,22,cWewnrC1qUk,16719,2.316475
5922,Badi Behan Vs Choti Behan,22,boHP_GuoDwU,14168,2.316475
1704,Indian Maids,22,3szVkjPhs_Y,6788,2.316475
19897,Before Marriage Vs After Marriage,22,XQNNu7EcjTU,6597,2.316475


In [88]:
hybrid(1,'NEW! Ep 3006 - Abdul Missing?!')

Unnamed: 0,title,categoryId,video_id,likes,est
8233,NEW! Ep 2999 - Jethalal Smuggling Fafda for Ta...,24,GpF0tRUibFg,55644,2.793735
8680,NEW! Ep 3001 - Will Taarak Get To Eat Fafda?!,24,GraKhk7yqFM,51697,2.793735
23986,NEW! Ep 3061 - Jetha Explains the Pizza Mystery,24,CK0Z3f8VKMM,48886,2.793735
1833,NEW! Ep 2974 - Taraak Mehta Gets Late!,24,sf7NDiQfSIo,40777,2.793735
1419,NEW! Ep 2973 - Jetha Invites Babita Over For B...,24,m0tdMLJWxb8,39433,2.793735


In [89]:
hybrid(1,'When 10R meets NINJA H2 !!')

Unnamed: 0,title,categoryId,video_id,likes,est
59,how JS FILMS bought KAWASAKI H2,22,lh7QPjPpXas,123335,2.316475
367,Accidental Wheelie on NINJA H2,22,zkArAMNUyAY,53613,2.316475
10896,why JS Films QUIT YouTube ?,22,l2KT4hbguLQ,29340,2.316475
14219,Surprising my Friends with NEW Bike!!,22,FG9MXlvkUkc,16773,2.316475
16568,SUBSCRIBERS ne Keh ke LELI 😭😂,22,r9RE9eO_qYs,12130,2.316475


In [90]:
hybrid(1,'FULL MATCH - Roman Reigns vs. Murphy: SmackDown LIVE, August 13, 2019')

Unnamed: 0,title,categoryId,video_id,likes,est
7117,FULL MATCH - Roman Reigns vs. Baron Corbin – U...,17,XsM2Fi8J4zw,48155,3.203106
6971,FULL MATCH - Roman Reigns vs. Drew McIntyre: W...,17,1pb8jp8uOp4,43634,3.203106
13570,FULL MATCH - Shinsuke Nakamura vs. Roman Reign...,17,UG4Wpk5kD0s,40442,3.203106
4625,FULL MATCH: Rusev vs. Roman Reigns – U.S. Titl...,17,1e1-xibMDkU,27412,3.203106
5568,FULL MATCH - The Miz vs. Roman Reigns – Interc...,17,Q2rBqioG_KQ,25240,3.203106


In [91]:
hybrid(1,'IPL 2020 - Patanjali IPL As Sponsor With 10 Big News')

Unnamed: 0,title,categoryId,video_id,likes,est
11094,"PubG, FauG and BhabhiG",22,b76U2005tH4,49036,2.316475
10378,I broke it..,22,Z4VrggAJ7AY,48693,2.316475
7300,IPL 2020 - Match 01,17,lmfk_6ZVoBI,19438,3.203106
4678,"IPL 2020 - Schedule Update,Bhajji Out With 10 ...",17,mZtvQBqMc4Y,15993,3.203106
19229,IPL 2021 - First List Of All 82 Released Playe...,17,qb_QJVXC38o,13401,3.203106


In [92]:
hybrid(1,"DNA: Russia August 12 को लॉन्च करेगा Corona Vaccine")

Unnamed: 0,title,categoryId,video_id,likes,est
21752,Farmers Protest: किसानों के समर्थन में उतरीं P...,25,beAYdqZCUrQ,25424,2.121955
20762,Singhu Border: सड़क पर डटे हैं प्रदर्शनकारी किसान,25,VTeftt4jz1s,18121,2.121955
22325,Farmers Protest: किसानों ने सरकार का खारिज किय...,25,suZwWlS3hYw,15264,2.121955
24789,महाराष्ट्र किसान जत्था: दिल्ली की तरफ बढ़ा एक क...,25,DrtBVNZJnj0,11657,2.121955
27298,Farmers Protest: कृषि कानूनों के खिलाफ आज किसा...,25,16pM_jfJg6c,8698,2.121955


In [93]:
hybrid(1,"KOHLI & CO start with a fabulous WIN!")

Unnamed: 0,title,categoryId,video_id,likes,est
7794,DHONI's boys are off to a GREAT START!,17,QrL-V0SJ8dE,54629,3.203106
7508,SHREYAS IYER vs KL RAHUL - Battle of the Young...,17,QrL-V0SJ8dE,51248,3.203106
8186,KOHLI & CO start with a fabulous WIN!,17,JZbs3i_HyR4,47838,3.203106
8035,STELLAR STOINIS seals the deal for DELHI,17,1yUg1L-OYFY,42786,3.203106
11587,Is it OVER for MSD's CHENNAI?,17,YRXt8UAF4Yg,38702,3.203106


In [94]:
hybrid(1,"Finally bought my DreamBike - Ninja H2")

Unnamed: 0,title,categoryId,video_id,likes,est
12,first Crazy RIDE on Kawasaki NINJA H2,22,KrXRnIESDxM,110209,2.316475
42,I Surprised my Friends With NINJA H2!!,22,7Mz71xlUlw8,89781,2.316475
22908,bass Life mai HAYABUSA honi chaheye,22,2TGgdI1L0Hw,48490,2.316475
29016,Why Narula Family Gets Emotional? 😭,24,uCNvWSb0wL8,27480,2.793735
28461,Sammy's Birthday Vlog 🥳♥️👑 @Mr Mrs Narula,24,LIedUszADEA,22958,2.793735


In [95]:
hybrid(1,"Gujarati song pucho to khara")

Unnamed: 0,title,categoryId,video_id,likes,est
4791,Garaj Matlabi,10,qlmVMdCfnzs,17538,2.636435
8843,Kok Dado Mane Yaad Kari Lejo Re,10,23PU1XgaV3Q,14726,2.636435
1508,Jignesh Barot ( Kaviraj) ચોય જ્યાં મારી હંભાળ ...,10,QwUw7Wb0I-A,12893,2.636435
28571,Firki Levi Chhe Free Ma - Kaushik Bharwad,10,XT_tvQ8oeow,10518,2.636435
13414,LIVE : Rakesh Barot Garba - Navratri 2020 - Day 4,10,8PqTkKoncBE,3307,2.636435


In [96]:
hybrid(1,"BADSHAH – BKL (Official Lyrical Video)")

Unnamed: 0,title,categoryId,video_id,likes,est
6319,Butterfly : Jass Manak (Full Video) Satti Dhillon,10,b1c6i0VT7ak,381520,2.636435
5372,AYEN KIVEN : Gippy Grewal Ft. Amrit Maan (Full...,10,VU6ZUcxFgqE,166883,2.636435
14342,Level Up (Official Video ) - IKKA Ft. DIVINE &...,10,9fgXP0FBZRI,108715,2.636435
26457,"HOT LAUNDE - Badshah Ft. Fotty Seven, Bali",10,iuqfU9Ll300,108643,2.636435
10066,THE POWER OF DREAMS - Badshah ft. Lisa Mishra,10,WdktZ2fNQC4,11819,2.636435


In [97]:
hybrid(1,'Diljit Dosanjh: CLASH (Official) Music Video')

Unnamed: 0,title,categoryId,video_id,likes,est
13596,Diljit Dosanjh: Welcome To My Hood (Official M...,10,1ExdPa00uz4,239088,2.636435
4576,Diljit Dosanjh: Born To Shine (Official Music ...,10,dCmp56tSSmA,229701,2.636435
1700,PEED: Diljit Dosanjh (Official) Music Video,10,cXUndHRKmXQ,162861,2.636435
14737,ASLE - Gurman Sandhu,10,ApvvUgXWc1w,87768,2.636435
9746,Peed Diljit Dosanjh Lyrical Video Song,10,tq3BdjJZW84,20032,2.636435


In [98]:
hybrid(1,'After TikTok Ban')

Unnamed: 0,title,categoryId,video_id,likes,est
6531,Girls in Online Classes,24,WDnLu877Adk,11222,2.793735
2116,When You Are 25+,24,IsooRg-gGB4,6692,2.793735
2702,Indians During Video Calls,24,dchGzOCUC-k,6079,2.793735
19003,Girls in Online Exams,24,-7HBWFF32_0,5566,2.793735
14625,Hyderabadi Friend,24,thT_-gqQf8E,2957,2.793735


In [99]:
smd.head()


Unnamed: 0,title,likes,categoryId,video_id,thumbnail_link,Rating
0,Sadak 2,224925,24,Iot0eF6EoNA,https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg,4
1,Kya Baat Aa : Karan Aujla (Official Video) Tania,655450,10,x-KbnJ9fvJc,https://i.ytimg.com/vi/x-KbnJ9fvJc/default.jpg,4
2,Diljit Dosanjh: CLASH (Official) Music Video,296533,10,KX06ksuS6Xo,https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg,4
3,Dil Ko Maine Di Kasam Video,743931,10,UsMRgnTcchY,https://i.ytimg.com/vi/UsMRgnTcchY/default.jpg,4
4,"Baarish (Official Video) Payal Dev,Stebin Ben",268817,10,WNSEXJJhKTU,https://i.ytimg.com/vi/WNSEXJJhKTU/default.jpg,4


In [100]:
#smd.to_csv('smd.csv')

In [101]:
#pickle.dump(smd,open('smd.pkl','wb'))
with open("smd.pkl", "wb") as f: 
    joblib.dump(smd, f) 

In [102]:
#pickle.dump(smd,open('smd.pkl','wb'))
with open("new.pkl", "wb") as f: 
    joblib.dump(new, f) 

In [103]:
smd['thumbnail_link'][0][:]

'https://i.ytimg.com/vi/Iot0eF6EoNA/default.jpg'

In [104]:
video_list=smd[['title','likes','thumbnail_link']]

In [105]:
with open("video_list.pkl", "wb") as f: 
    joblib.dump(video_list, f) 

In [106]:
video_list['title'][150:200]

158                              Sabaat Episode 19 Promo
159    Cash Latest Promo - 15th August 2020 - Anasuya...
161                   Ram mandir darshan with modi jii 😱
162                          Color Photo Official Teaser
163                          ALL ABOUT BHABHI’S BIRTHDAY
164                           Daily Current Affairs #317
165                                                 9 PM
166                        Kia Sonet - Too many Features
167                      Madan Gowri Lockdown Photoshoot
168    Opening ALL Of My RAKHI GIFTS! Makeup, BAG, CL...
169          Unique Pencil Sharpener Unboxing & Giveaway
170             DOCTOR (Official Video) Sidhu Moose Wala
173                  Sadak 2 Trailer पे पब्लिक का गुस्सा
175                          No Worries (Official Video)
181                                             Menamama
182    Gunjan Saxena movie से Indian Airforce नाराज़?...
183    Lord Shree Krishna Janmotsav - Janamashtami Ce...
184                            

In [107]:
smd.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7312 entries, 0 to 29996
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   title           7312 non-null   object  
 1   likes           7312 non-null   int64   
 2   categoryId      7312 non-null   int64   
 3   video_id        7312 non-null   object  
 4   thumbnail_link  7312 non-null   object  
 5   Rating          7312 non-null   category
dtypes: category(1), int64(2), object(3)
memory usage: 608.1+ KB


In [108]:
new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7312 entries, 0 to 7311
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   index           7312 non-null   int64   
 1   userID          7312 non-null   int64   
 2   video_id        7312 non-null   object  
 3   title           7312 non-null   object  
 4   likes           7312 non-null   int64   
 5   thumbnail_link  7312 non-null   object  
 6   Rating          7312 non-null   category
 7   joint_tags      7312 non-null   object  
dtypes: category(1), int64(3), object(4)
memory usage: 407.3+ KB
