Content-based recommendation systems are a type of recommender system used in information retrieval and e-commerce to provide personalized recommendations to users. These systems recommend items (such as articles, products, movies, music, etc.) to users based on the characteristics or attributes of the items and the preferences of the users.

Here's how content-based recommendation systems work:

Item Representation: Each item in the system is described or represented using a set of relevant attributes or features. For example, in a movie recommendation system, these attributes could include genres, actors, directors, and user ratings.

User Profile: The system creates a user profile that captures the preferences and interests of the user. This profile is typically constructed based on the user's historical interactions, such as items they have rated or liked in the past.

Matching: The system compares the attributes of the items to the user's profile to determine how well they match. This is often done using mathematical similarity measures like cosine similarity or Euclidean distance.

Recommendation: Items that are most similar to the user's profile are recommended to the user. The items that receive the highest similarity scores are typically ranked and presented as recommendations.

Advantages of content-based recommendation systems:

Transparency: Content-based systems are often more transparent than other recommendation approaches like collaborative filtering because they recommend items based on explicit features.

Cold Start Problem: Content-based systems can make recommendations even for new users or items because they rely on item attributes.

Reduced Data Sparsity: They are less affected by data sparsity issues compared to collaborative filtering, which can struggle when there's limited user-item interaction data.

However, content-based recommendation systems also have limitations:

Limited Serendipity: They may have difficulty introducing users to new or unexpected items because recommendations are based on existing user preferences.

Feature Engineering: The quality of recommendations depends heavily on the quality and relevance of the item features, which often requires careful feature engineering.

Over-Specialization: They can sometimes recommend items that are too similar to each other, leading to a lack of diversity in recommendations.

Content-based recommendation systems are commonly used alongside other recommendation techniques, such as collaborative filtering, to improve the quality and diversity of recommendations. Hybrid recommendation systems that combine multiple approaches aim to overcome the limitations of individual methods.

In [4]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
from tqdm.notebook import tqdm

import joblib
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer, PorterStemmer
from nltk.corpus import stopwords
import nltk

# Download NLTK data
nltk.download('stopwords')
nltk.download('wordnet')


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [5]:


# if you load in your dataset in google colab
# Load the dataset (replace 'your_dataset.csv' with the actual file name)
#df = pd.read_csv('wiki_movie_plots_deduped.csv')



# if load dataset in gooogle drive
from google.colab import drive
import pandas as pd

# Mount Google Drive
drive.mount('/content/drive')

# Define the path to your CSV file on Google Drive
csv_file_path = '/content/drive/My Drive/Content-based recommendation systems/wiki_movie_plots_deduped.csv'

# Load the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

# Display the head (first 5 rows) of the dataset
df.head()



Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,Release Year,Title,Origin/Ethnicity,Director,Cast,Genre,Wiki Page,Plot
0,1901,Kansas Saloon Smashers,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...,"A bartender is working at a saloon, serving dr..."
1,1901,Love by the Light of the Moon,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Love_by_the_Ligh...,"The moon, painted with a smiling face hangs ov..."
2,1901,The Martyred Presidents,American,Unknown,,unknown,https://en.wikipedia.org/wiki/The_Martyred_Pre...,"The film, just over a minute long, is composed..."
3,1901,"Terrible Teddy, the Grizzly King",American,Unknown,,unknown,"https://en.wikipedia.org/wiki/Terrible_Teddy,_...",Lasting just 61 seconds and consisting of two ...
4,1902,Jack and the Beanstalk,American,"George S. Fleming, Edwin S. Porter",,unknown,https://en.wikipedia.org/wiki/Jack_and_the_Bea...,The earliest known adaptation of the classic f...


In [4]:
# List the column names (features) of your dataset
column_names = df.columns.tolist()
print(column_names)


['Release Year', 'Title', 'Origin/Ethnicity', 'Director', 'Cast', 'Genre', 'Wiki Page', 'Plot']


In [11]:
df.shape

(34886, 9)

The dataset you've provided appears to be a collection of movie plot summaries. Here's an overview of the data structure and the kind of information it contains:

- Release Year: The year when the movie was released.

- Title: The title of the movie.

- Origin/Ethnicity: This could refer to the country or ethnic origin of the movie.
- Director: The name of the director of the movie.

- Cast: The cast involved in the movie. This field contains missing values in the initial rows.

- Genre: The genre of the movie. Initial entries are labeled as 'unknown'.

- Wiki Page: The Wikipedia page URL for the movie.

- Plot: A brief summary of the movie's plot.

From this initial glance, it seems this dataset can provide insights into the evolution of cinema over time, particularly in terms of genres, directors, and the representation of different cultures or ethnicities in films. It could also be used for tasks like content analysis, studying trends in movie themes and stories, or even for natural language processing tasks like summarization or genre classification based on plot descriptions.



In [None]:
# Merge all columns into a single text column
df['merged_text'] = df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)

# Display the head of the dataset with the merged_text column
df[['merged_text']].head()


# this cell just want to show me the dataset after merging, after finish cleaning text we have to do this step not now

Unnamed: 0,merged_text
0,1901 Kansas Saloon Smashers American Unknown u...
1,1901 Love by the Light of the Moon American Un...
2,1901 The Martyred Presidents American Unknown ...
3,"1901 Terrible Teddy, the Grizzly King American..."
4,1902 Jack and the Beanstalk American George S....


Before applying TF-IDF (Term Frequency-Inverse Document Frequency) or any other natural language processing (NLP) techniques, it's common to perform several preprocessing steps on your text data. These **preprocessing steps** can include:



**Text Cleaning**:

- Removing any HTML tags or special characters.

- Lowercasing all text to ensure consistency.
- Removing punctuation marks.
- Handling contractions and special characters (e.g., "don't" to "do not").


**Tokenization:**

- Splitting the text into individual words or tokens.
- Tokenization is usually performed by splitting the text using whitespace or other delimiters.

**Stopword Removal:**

Removing common stopwords (e.g., "the," "and," "is") that do not carry significant meaning.


**Stemming or Lemmatization:**

- Reducing words to their root form Stemming and lemmatization are two common techniques.


**Join Tokens Back into Text**:

- After preprocessing, you may want to join the tokens back into a single text string.

# Data Cleaning of wiki-movie dataset


**Action Items**:

Lower-Case the whole data frame

Director: Removing 'Director:' and 'Cast:'

Director, Cast: Removing '\r\n', '/n' and '/r'

Genre: Replacing '/' with Space

Director, Cast, Genre: Removing 'Uknonwn' and 'Nan'

Director: Separating Directors and Actors names

Director, Cast: Checking if the names are separated with ' and '

Director, Cast: Merging the first names and last names together

Director, Cast: Adding the words of 'Director' and 'Actor' as prefix

Plot: Removing English Stopwords

Doc: Removing special characters

In [6]:
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
import numpy as np

In [7]:

# Load the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

In [8]:
# Use a Subset of Dataset, helping to manage memory usage and avoid running out of RAM(overcome RAM memory limitations)
# Randomly sample a fraction of the dataset (e.g., 1.25 %)
# Set a random seed for reproducibility
np.random.seed(42)  # Replace 42 with your chosen seed number
data = df.sample(frac=0.0125)



In [9]:
data.shape

(436, 8)

In [10]:



# Cleaning up the 'Director' and 'Cast' columns
data["Director"] = data["Director"].str.replace("director:", "", regex=False)
data["Director"] = data["Director"].str.replace("cast:", "", regex=False)
data["Director"] = data["Director"].str.replace("\r\n", " ", regex=False)
data["Cast"] = data["Cast"].str.replace("\r\n", " ", regex=False)

# Cleaning up the 'Genre' column
data["Genre"] = data["Genre"].str.replace("/", " ", regex=False)

# Removing 'unknown' and 'nan' from 'Director', 'Cast', 'Genre'
data["Director"] = data["Director"].str.replace("unknown", "", regex=False)
data["Cast"] = data["Cast"].str.replace("unknown", "", regex=False)
data["Genre"] = data["Genre"].str.replace("unknown", "", regex=False)

data["Director"] = data["Director"].str.replace("nan", "", regex=False)
data["Cast"] = data["Cast"].str.replace("nan", "", regex=False)
data["Genre"] = data["Genre"].str.replace("nan", "", regex=False)

# Separating names in 'Director' and 'Cast', merging first and last names, and adding prefixes
data["Director"] = data["Director"].str.replace(" and ", ",", regex=False)
data["Cast"] = data["Cast"].str.replace(" and ", ",", regex=False)

data["Director"] = data["Director"].str.replace(" ", "", regex=False)
data["Cast"] = data["Cast"].str.replace(" ", "", regex=False)

data["Director"] = data["Director"].str.replace(",", " ", regex=False)
data["Cast"] = data["Cast"].str.replace(",", " ", regex=False)

data["Director"] = np.where(data["Director"].str.len() > 0,
                          'director ' + data["Director"],
                          data["Director"])

data["Cast"] = np.where(data["Cast"].str.len() > 0,
                      'actor ' + data["Cast"],
                      data["Cast"])

data["Director"] = data["Director"].str.replace(" ", " director", regex=False)
data["Cast"] = data["Cast"].str.replace(" ", " actor", regex=False)

# Removing English stopwords from 'Plot'
def remove_stopwords(text):
    words = text.split()
    return ' '.join(word for word in words if word not in ENGLISH_STOP_WORDS)

data['Plot'] = data['Plot'].apply(remove_stopwords)

# Show a sample of the cleaned data
data.sample(5)


Unnamed: 0,Release Year,Title,Origin/Ethnicity,Director,Cast,Genre,Wiki Page,Plot
21320,2013,Welcome to the Punch,British,director directorDirector:EranCreevy,actor actorDirector:EranCreevy\nCast:JamesMcAv...,,https://en.wikipedia.org/wiki/Welcome_to_the_P...,"Four men emerge building wearing gas masks, ha..."
23011,1972,Intimate Confessions Of A Chinese Courtesan,Hong Kong,director directorChorYuen,actor actorLilyHo actorBettyPeiTi actorYuehHua,,https://en.wikipedia.org/wiki/Intimate_Confess...,In regarded outrageous taboo-smashing Shaw Bro...
3578,1943,Hi Diddle Diddle,American,director directorAndrewL.Stone,actor actorAdolpheMenjou actorMarthaScott acto...,comedy,https://en.wikipedia.org/wiki/Hi_Diddle_Diddle,Young Janie Prescott married sailor sweetheart...
20542,1990,Nuns on the Run,British,director directorJonathanLynn,actor actorEricIdle actorRobbieColtrane actor,comedy,https://en.wikipedia.org/wiki/Nuns_on_the_Run,"After boss killed bank heist, London gangsters..."
33674,2012,Kaizoku Sentai Gokaiger vs. Space Sheriff Gava...,Japanese,director directorShojiroNakazawa,actor actorRyotaOzawa actorYukiYamada actorMao...,tokusatsu,https://en.wikipedia.org/wiki/Kaizoku_Sentai_G...,The film begins Gokai Galleon chased Super Dim...


# Merging the document

In [11]:
column_weights = {"Release Year": 10,
                  "Title": 1,
                  "Origin/Ethnicity": 5,
                  "Director": 5,
                  "Cast": 1,
                  "Genre": 10,
                  "Plot": 1}

# Initialize a new "doc" column with empty strings
df["doc"] = ""

# Iterate through the columns and concatenate their values based on weights
for col, weight in column_weights.items():
    df["doc"] += df[col].astype(str) + ' '  # Convert to string before concatenation

# Apply the weight to the "doc" column
df["doc"] = df["doc"].str.repeat(weight)

# Optionally, you can strip the trailing space in the "doc" column
df["doc"] = df["doc"].str.strip()

df.head()


Unnamed: 0,Release Year,Title,Origin/Ethnicity,Director,Cast,Genre,Wiki Page,Plot,doc
0,1901,Kansas Saloon Smashers,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...,"A bartender is working at a saloon, serving dr...",1901 Kansas Saloon Smashers American Unknown n...
1,1901,Love by the Light of the Moon,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Love_by_the_Ligh...,"The moon, painted with a smiling face hangs ov...",1901 Love by the Light of the Moon American Un...
2,1901,The Martyred Presidents,American,Unknown,,unknown,https://en.wikipedia.org/wiki/The_Martyred_Pre...,"The film, just over a minute long, is composed...",1901 The Martyred Presidents American Unknown ...
3,1901,"Terrible Teddy, the Grizzly King",American,Unknown,,unknown,"https://en.wikipedia.org/wiki/Terrible_Teddy,_...",Lasting just 61 seconds and consisting of two ...,"1901 Terrible Teddy, the Grizzly King American..."
4,1902,Jack and the Beanstalk,American,"George S. Fleming, Edwin S. Porter",,unknown,https://en.wikipedia.org/wiki/Jack_and_the_Bea...,The earliest known adaptation of the classic f...,1902 Jack and the Beanstalk American George S....


# Removing Special Characters

In [12]:
df["doc"] = df["doc"].str.replace("[^a-z 0-9]+", "", regex=True)

In [13]:
pd.set_option('display.max_colwidth', None)
df[["doc"]].sample(10)

Unnamed: 0,doc
19730,1964 he igh right un ritish alph homas irk ogarde eorge hakiris usan trasberg thriller n 1957 uno usan trasberg an merican archaeology student is visiting yprus and staying with the family of her fathers best friend r ndros oseph urst he witnesses an attack by two gunmen which results in the death of two ritish soldiers but is unable to identify the killers to the local ritish intelligence officer ajor cuire irk ogardeuno then realises that fugitive eneral kyros regoire slan is hiding in the house and r ndros is an collaborator fighter aghios eorge hakiris wants to kill uno in part because of her growing romantic relationship with cuireaghios organises an ambush to kill uno but she is saved by r ndros son mile who is mortally wounded uno escapes and is rescued by cuire who brings her to his apartment aghios leads an attack on cuires apartment which is unsuccessful in part because of help from fellow ritish intelligence officer aker enholm lliott who had an affair with cuires wifeuno flies to thens and realises that aghios is on the plane n arrival aghios tries to kill her again mortally wounding aker but is shot dead by cuire uno is reunited with cuire
30655,2005 asthuri aan amil ohithadas rasanna eera asmine drama he movie starts with a mature run umar rasanna now a collector of the district he story is narrated in a flashback where his father azhaniappan wants run to get married is father used to be a rich film producer and had borrowed money from a rich person but is unable to repay his debt he rich man wants his daughter to get married to run umar run does not want to get marriedmashankari eera asmine and run study in the same college ma is a lively mischievous girl on campus but at home faces acute poverty and takes up various odd jobs to take care of her house as well as to keep away from her sisters husband who makes advances towards her n the meantime run umars father attempts suicide because he is deeply in debt mashankari steps in and helps run who is preparing for exams he does everything to ensure that he becomes an officer e in turn promises to marry her and rescue her from her situation owever ma after an unfortunate encounter with her brotherinlaw lands in prison and how the couple finally manages to reunite is brought out in the climax
2498,1939 he rizona ildcat merican erbert eeds ane ithers eo arrillo western he orphaned ary ane atterson ane ithers is under the guardianship of anuel ernandez eo arrillo once known as the bandit l ato who led a gang of outlaws ary ane wants ernandez to revive the l ato gang to rescue the feckless onald illiam ill enry the lone survivor of a stage coach robbery engineered by the towns crooked sheriff enry ilcoxonts been a decade since l ato rode and ernandez is now too fat for his bandit costume ary ane aids the rescue by vandalizing the saddles of the sheriff and his posse hen l ato does rescue onald he is arrested uring the ensuing trial ary ane provides special pyrotechnics and the courtroom is evacuatedhen ary ane finds the stash from the stagecoach robbery hidden in the sheriffs office ernandez is appointed as the new sheriff12
4154,1945 alk in the un merican ewis ilestone ana ndrews ichard onte war drama n eptember 1943 the diverse group of fiftythree soldiers comprising a lead latoon of the exas ivision anxiously await their upcoming llied invasion of taly on a beach near alerno taly landing barge carries them to their objective during the predawn hours and the increasing danger of their situation is demonstrated when their young platoon leader ieutenant and obert owell is wounded by a shell fragment that destroys half of his face latoon ergeant ete alverson att illis takes over command and orders gt ddie orter erbert udley to lead the men to the beach while he tries to find the company commander and confirm their ordersirst aid man cilliams terling olloway remains with and and the rest of the men hit the beach and dig in while trying to elude the shelling and machinegun fire gt ill yne ana ndrews a corporal in the novel wonders what they will do if alverson does not return and after the sun rises the sergeants send the men into the woods to protect them from enemy aircraft yne remains on the beach to wait for alverson but learns from cilliams that both and and alverson are dead oon after cilliams is shot by an enemy airplane when he goes to a bluff to view the aerial attack on the beachheadyne walks to the woods and there discovers that three other men have been hit including gt oskins ames ardwell who was the senior surviving oskins wound means he cannot continue and orter as the next senior is forced to take command oskins warns yne as he is leaving to keep an eye on orter because he suspects orter is going to crack under the pressure of commandorter yne and gt ard loyd ridges then lead the men in three squads along a road toward their objective a bridge that they are to blow up that is near a farmhouse orter knows that the sixmile journey will be a dangerous one and grows agitated e warns the men to watch out for enemy tanks and aircraft s they walk the men shoot the breeze and discuss their likes and dislikes the nature of war and the food they wish they were eating nemy aircraft appear and one of them strafes the platoon as they run for cover in a ditch ome of the men are killed while one while is wounded vt mith orter grows increasingly agitatedfterwards orter is distracted when two retreating talian soldiers surrender to the platoon and confirm that they are on the right road he talians warn them that the area is controlled by erman troops and soon after the platoon meets a small reconnaissance patrol of merican soldiers fter the patrols motorcycle driver offers to ride to the farmhouse and report back orter becomes even more edgy as minutes pass without the drivers return inally yne tells the men to take a break while he sits with orter s machine gunner ivera ichard onte and his pal ake riedman eorge yne razz each other orter begins to break down and tells ard also called armer that he is putting yne in charge orter has a complete breakdown when a erman armored car approaches but ynes quick thinking prevails and the men blast the car with grenades and machinegun firehe bazooka men who yne had sent ahead to search for tanks blow up two tanks and another armored car but expend all of their bazooka ammunitionyne leaves a private named ohnson to guard the stillcrying orter yne pushes on and as the men march riedman tells ivera that he is a traveling salesman who is selling democracy to the natives he men finally reach the farmhouse but when a small patrol attempts to crawl through the field in front of the house they are shot at by the ermans and two men are killed yne and ard are baffled about what to do next when indy ohn reland a calm introspective soldier suggests circling around the farm via the river and blowing up the bridge without first taking the house yne sends two patrols headed by ard and indy to accomplish the mission then orders ivera to strafe the house while he leads a column of men in an attack on the house which he hopes will distract the ermans he remaining men nervously wait for their comrades to reach the bridge until finally ivera opens fire and yne and his men go over the stone wall and into the field ynes sight blurs as he crawls toward the house and when he comes across the body of ankin hris rake one of the fallen men still cradling his beloved ommygun the platoons constant refrain obody dies resounds through his headhe bridge is blown up and despite heavy losses the platoon captures the house hen at exactly noon indy ard and the remaining men wander through the house as armer fulfills his dream of eating an apple and yne adds another notch to the butt of ankins pet ommygun
20156,1973 he lockhouse ritish live ees eter ellers harles znavour world war ii n ay a mixed group of forced labourers held by erman forces take shelter from the bombardment inside a erman bunker but are then entombed when the entrances are blocked by shelling damage y coincidence the bunker is a storehouse so the prisoners have enough food and wine to last them for years owever they are trapped not for years but permanently and the film analyzes how they deal with their underground prison with their relationships and with death
28384,2013 mericanorn onfused esi alayalam artin rakkat ulquer almaan acob regory parna opinath comedy he story revolves around two spoiled youngsters ones saac ulquer almaan and his cousin orah acob regory ohns is the son of a billionaire named saac alu lex who is settled in ew ork while orahs mother left for aris with her new husband ohns and orah enjoy their luxurious life by driving luxurious cars going to pubs etc ssac decides to sent them to ndia telling them its a vacation ssac then blocks their ard by which they starts living through poverty of erala where they become famous
23541,2005 bout ove ong ong himoyama en ee hinyen hang ibai nan unknown n okyo he is a hinese computer graphic artist seeking to enrich his exposure while she is a apanese painter struggling to recover from a broken relationship his is the simplest and sweetest of the three stories starting with hidden mutual attraction and ending in their first meeting his is also the only one of the three stories with a subplot a really glorified use of the term of his friendship with two other art students both girls one hinese and one apanesen aipei she is a local girl suffering from a broken heart and he is a apanese visitor she asks in the middle of the night to help putting up a wall unit his is the only story with a scene of brief libido drive which however quickly subsides he rest of the story is on his helping her by asking her exboyfriend whether there is a chance of getting back together mong the three this is the story that plays most on the language barrier thing with some absolutely hilarious scenes resultingn hanghai he is a apanese student renting a room from her mother he probably has a crush on him at first sight but keeps it deeply hidden when she sees how devoted he is to his girlfriend ut when he gets a postcard from the girlfriend ending their relationship her attraction to him intensifies although she never reveals it his is the most poignant of the three stories omance aside this story also takes a quick jab at the maddening scene of urban development of hanghai
26274,2004 akeer orbidden ines ollywood hmed han unil hetty unny eol ohail han ohn braham auheed yrusi action drama romance aran ohail han and indiya auheed yrusi are childhood friends and live with arans brother rjun ana unny eol arans feelings for indiya are more than just of a friend however she is unaware of his feelings and falls in love with a sweet boy named aahil ohn braham gradually hen aran find this out he warns ahil to stay away from indiya but he doesnt and makes a love letter for her he next day aran sees the letter and tears it apart then arans friend ony poorva gnihotri humiliates him about falling in love with indiya and being so poor o aahil gets angry and tries to kill himself owever when ahils brother anju unil hetty who is a car mechanic finds him unconscious and badly wounded he cannot control his anger and goes looking for ony owever when anju gets to the college he sees aran sitting down wearing onys acket with his name on the back starts beating him up in public aran is hospitalised when anju finds out that he has beaten aran and not ony rjun a very powerful wealthy communist and gangster wants to avenge anju for badly beating up his brother owever rjun is unaware of indiyas feelings for aahil and arans for indiya aahil advises anju to give up his violent ways anju hands himself to rjun and gets badly beaten up by him aahil apologizes to anju and decides to leave indiyane month later aahil starts working in a cafe indiya tries to meet aahil everyday but aahil escapes every time ut soon indiya and aahil fall in love again anju also approves her eanwhile aran returns from hospital and decides to meet rjun and tell him about his love rjun is happy to know that aran is in love with indiya hat night ony challenges aahil for a fight aahil beats up ony brutally eanwhile aran proposes to indiya he tells him that she considers aran as a best friend but loves aahil aran becomes furious fter beating ony aahil takes his gun to kill aran anju enters rjuns bungalow to tell rjun about arans reality e beats up all his goons rjun cant control and starts beating anju ut ony comes to anjus aid and tells rjun about arans reality arans love turns to obsession and forces indiya to marry him on the spot ut aahil reaches there dangerous fight ensues between aahil and aran aran hits aahil repeatedly with an iron rod in the face and makes him unconscious rjun comes there to rescue indiya and talk to aran aran shoots anjus arm aahil regains consciousness aran is just about to shoot indiya and aahil but indiya tells aran that she hates him because of his actions rjun shoots aran with tears on his eyes aran dies on the spot rjun takes arans body to the church and is heartbroken aahil indiya and anju apologise to and pacify rjun saying that he was not wrong
26577,2009 hortkut ollywood eeraj ora kshaye hanna rshad arsi mrita ao comedy he story focuses on hekhar kshaye hanna who is currently an assistant director hoping to write and direct his own movie soon is friend aju rshad arsi is a struggling actor who has been waiting for hekhar to write a film script so he can star in owever when hekhar rejects him as the films hero aju decides to steal hekhars film script and release the film under his own name he film is released and turns out to be a blockbuster leading aju to stardom eartbroken hekhar at this time is left by his girlfriend ansi mrita ao ack of confidence hekhar writes a new and better film script and wants to star aju in it however when aju rejects the script hekhar decides to film the movie without aju knowing he is in it hekhar and his lowbudget film crew follow aju around everywhere to complete their movie ventually when the movie is finished it turns out that ajus role in the film was embarrassing and humiliating and turns aju into a box office bomb hekhars film succeeds and he is offered more movies aju turns into a flop and decides to steal someone elses script
12165,1994 ops and obbersons merican ichael itchie hevy hase ack alance ianne iest ason ames ichter comedy hen the police discover that a mob hitman has moved in next door to the obbersons they want to find out what he is up to o they set up a stakeout in the obbersons home ardnosed toughasnails ake tone ack alance and his young partner ony oore avid arry ray are assigned to the stakeout but now its a question of whether ake can last long enough to capture the bad guys he obbersons want to help and by doing so they drive ake crazy


# Removing Stopwords

In [14]:
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer,PorterStemmer
from nltk.corpus import stopwords

In [15]:
import nltk
nltk.download('stopwords')


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [16]:
stops = stopwords.words('english')
print(stops)

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

In [17]:
import nltk
nltk.download('wordnet')


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [18]:
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()

# Text preprocessing function
def preprocess(sentence):
    tokenizer = RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(sentence)

    filtered_words = [w for w in tokens if len(w) > 2 if not w in stops]  # Use the 'stops' variable
    stem_words = [stemmer.stem(w) for w in filtered_words]
    lemma_words = [lemmatizer.lemmatize(w) for w in stem_words]
    return " ".join(lemma_words)

df['doc_clean'] = df['doc'].map(lambda s: preprocess(s))


In [19]:
df[["doc", 'doc_clean']].sample(10)

Unnamed: 0,doc,doc_clean
6668,1956 hat ertain eeling merican elvin rank ob ope va arie aint earl ailey eorge anders erry athers comedy beautiful sophisticated ew ork woman who goes by the name unreath enry va arie aint seems to have it all he is not only the private secretary to the wealthy and popular cartoonist arry arkin eorge anders she is also his fianceeut back in ort uron ichigan when she was a girl she was plain old thel ankowski nd she used to be married to another cartoonist the talented but neurotic rancis ignan ob ope who was once an associate of the famed l appne day when arkins syndicate complains that his boyanddog comic strip nips and unty hasnt been as funny as it used to be unreath hatches a scheme arry is leaving on a business trip and she is busy planning their honeymoon so why not hire ignan to ghostwrite the stripignan doesnt want to do it and certainly cant stand the snooty arkin but he needs the money for his psychiatrist who is trying to find out why any setback or stress leads to ignan experiencing a bad case of nausea lot of interesting developments take place in arkins anhattan penthouse while the cartoonist is away ignans strips are humorous and a hit ld feelings begin to stir in unreath having him around arkins housekeeper ussie begins to play matchmaker young orphan orman arrives one day because arkin intends to adopt him ignan is impressed until he discovers that arkins interested only in the publicity not in the child he program erson to erson is coming to do a live interview so arkin wants a cute boy and happy puppy there by his side just like his cartoon figures nips and untyignan is offended e is supposed to find a small dog for arkin but instead brings home one called appy a gigantic hound ignan also draws a cartoon using arkins name portraying nips as a juvenile delinquent nd if that werent enough arkin comes home to find ignan and unreath dressed in matching pajamas each having drunk one too many martiniarkin fires ignan just before the live appearance which ignan proceeds to interrupt by declaring his love for arkins fiancee unreath decides to dump her betrothed and her fancy new name and live happily ever after with ignan orman and appy,1956 hat ertain eel merican elvin rank ope ari aint earl ailey eorg ander erri ather comedi beauti sophist ork woman goe name unreath enri ari aint seem privat secretari wealthi popular cartoonist arri arkin eorg ander also fianceeut back ort uron ichigan girl plain old thel ankowski use marri anoth cartoonist talent neurot ranci ignan ope associ fame appn day arkin syndic complain boyanddog comic strip nip unti hasnt funni use unreath hatch scheme arri leav busi trip busi plan honeymoon hire ignan ghostwrit stripignan doesnt want certainli cant stand snooti arkin need money psychiatrist tri find setback stress lead ignan experienc bad case nausea lot interest develop take place arkin anhattan penthous cartoonist away ignan strip humor hit feel begin stir unreath around arkin housekeep ussi begin play matchmak young orphan orman arriv one day arkin intend adopt ignan impress discov arkin interest public child program erson erson come live interview arkin want cute boy happi puppi side like cartoon figur nip untyignan offend suppos find small dog arkin instead bring home one call appi gigant hound ignan also draw cartoon use arkin name portray nip juvenil delinqu werent enough arkin come home find ignan unreath dress match pajama drunk one mani martiniarkin fire ignan live appear ignan proce interrupt declar love arkin fiance unreath decid dump betroth fanci new name live happili ever ignan orman appi
2455,1938 ky iant merican ew anders oan ontaine ichard ix arry arey drama adventure pon reaching retirement age olonel ornelius tockton arry arey is forced to leave the military accepting a job running the ransorld ir ines chool of eronautics in lendale alifornia tag ahill ichard ix an old friend from the war is the pilot on the commercial airliner taking him to lendale he colonel asks him to join the school staff but tag would rather fly hen the colonel arranges for tag a reservist to be recalled to active duty he orders him to take the assignment as his assistant tag reluctantly compliestockton imposes military discipline on the civilian school wo trainee mechanics are dismissed on the spot for being too slow tag warns his boss that he is pushing the men too hard but tockton disagrees hen tockton inspects the newest batch of students he is greatly displeased to find his own son en hester orris among them e would rather have him stay in the diplomatic service but en wants to design aircraften and tag become rivals for the affections of eg awrence oan ontaine the cousin of fellow school pilot and friend ergie erguson aul uilfoyle espite only seeing eg a couple of times tag impulsively proposes to her only to find she has already agreed to marry entag and ergie are assigned a dangerous pioneering mapping flight from alifornia to laska to ussia tockton pays them an awkward visit observing that their aircraft could carry three t is obvious that he wants his son to go along tag obligesen has a falling out with eg over his flying and she breaks off their engagement hen tag finds out he proposes again she accepts after he agrees this will be his last flight hey get married in uma immediately although there is no honeymoon as the mapping expedition departs within hours he flight becomes uncomfortably awkward after tag informs en about his marriageuring the flight the rudder becomes jammed forcing an emergency landing in the rctic wilderness to effect repairs hen they try to take off the landing gear proves too weak and the aircraft flips over en and tag are unharmed but ergies legs are broken hey devise a travois to carry ergie on the 300 mile trek to the coast hen it becomes apparent that they will not make it with the injured man as a burden ergie insists they leave him behind but they refusefter en and tag fall asleep however ergie drags himself out of their tent to freeze to death ventually tag becomes too exhausted to go on en is glad to leave him behind but then recalls the time tag stood up for him against his father after a near crash e turns around gets tag to his feet and supports him as they trudge along hortly afterward they stumble upon a settlementhen they return to the school eg rushes into ens arms eeing how she feels tag tells her to get their marriage annulled,1938 iant merican ander oan ontain ichard arri arey drama adventur pon reach retir age olonel orneliu tockton arri arey forc leav militari accept job run ransorld ine chool eronaut lendal alifornia tag ahil ichard old friend war pilot commerci airlin take lendal colonel ask join school staff tag would rather fli hen colonel arrang tag reservist recal activ duti order take assign assist tag reluctantli compliestockton impos militari disciplin civilian school traine mechan dismiss spot slow tag warn bos push men hard tockton disagre hen tockton inspect newest batch student greatli displeas find son hester orri among would rather stay diplomat servic want design aircraften tag becom rival affect awrenc oan ontain cousin fellow school pilot friend ergi erguson aul uilfoyl espit see coupl time tag impuls propos find alreadi agre marri entag ergi assign danger pioneer map flight alifornia laska ussia tockton pay awkward visit observ aircraft could carri three obviou want son along tag obligesen fall fli break engag hen tag find propos accept agre last flight hey get marri uma immedi although honeymoon map expedit depart within hour flight becom uncomfort awkward tag inform marriageur flight rudder becom jam forc emerg land rctic wilder effect repair hen tri take land gear prove weak aircraft flip tag unharm ergi leg broken hey devi travoi carri ergi 300 mile trek coast hen becom appar make injur man burden ergi insist leav behind refuseft tag fall asleep howev ergi drag tent freez death ventual tag becom exhaust glad leav behind recal time tag stood father near crash turn around get tag foot support trudg along hortli afterward stumbl upon settlementhen return school rush en arm ee feel tag tell get marriag annul
13108,1997 he ainmaker merican rancis ord oppola att amon laire anes anny eito on oight ary ay lace ickey ourke anny lover irginia adsen oy cheider drama udy aylor is a graduate of the niversity of emphis aw chool nlike most of his fellow grads he has no highpaying job lined up and is forced to apply for parttime positions while serving drinks at a emphis baresperate for a job he is introduced to yman ruiser tone a ruthless but successful ambulancechasing lawyer who makes him an associate o earn his fee udy is required to hunt for potential clients at a local hospital e meets eck hifflet a lessthanethical former insurance assessorturnedparalegal who has failed the bar exam six times owever eck is resourceful in gathering information and an expert on insurance lawsuitsudy has a case of insurance bad faith which could be worth several million dollars in damages hen tone is raided by the udy and eck set up a practice themselves hey file suit on behalf of a middleaged couple ot and uddy lack whose 22yearold son onny ay is dying of leukemia but could have been saved with a bone marrow transplant denied by their insurance carrier reat enefitudy passes the ennessee bar exam but has never argued a case before a judge and jury ow he finds himself up against a group of experienced lawyers from a large firm headed by eo rummond an attorney who uses unscrupulous tactics to win his caseshe original judge arvey ale is set to dismiss because he sees it as a socalled lottery case that slows down the judicial process ut a more sympathetic judge yrone ipler takes over when ale suffers a fatal heart attack ipler a former civil rights attorney immediately denies the insurance companys petition for dismissalhile seeking new clients udy meets elly iker a battered wife whose husband liffs beatings have put her in the hospital udy persuades elly to file for divorce which leads to a confrontation with liff that results in the abusive husbands death o keep udy from being implicated elly tells the police she killed her husband in selfdefense he district attorney declines to prosecuteonny ay dies but not before giving a video deposition at his home he case goes to trial where rummond gets the vital testimony of udys key witness ackie emanczyk stricken from the record evertheless thanks to udys determination and skillful crossexamination of reat enefits president ilfred eeley the jury finds for the plaintifft is a great triumph for udy and eck until the insurance company declares bankruptcy allowing it to avoid paying punitive damages here is no payout for the grieving parents and no fee for udyeciding that this success will create unrealistic expectations for future clients udy decides to abandon his new practice and teach law e and elly leave town together,1997 ainmak merican ranci ord oppola att amon lair ane anni eito oight ari lace ickey ourk anni lover irginia adsen cheider drama udi aylor graduat nivers emphi chool nlike fellow grad highpay job line forc appli parttim posit serv drink emphi baresper job introduc yman ruiser tone ruthless success ambulancechas lawyer make associ earn fee udi requir hunt potenti client local hospit meet eck hifflet lessthaneth former insur assessorturnedparaleg fail bar exam six time owev eck resourc gather inform expert insur lawsuitsudi case insur bad faith could worth sever million dollar damag hen tone raid udi eck set practic hey file suit behalf middleag coupl uddi lack whose 22yearold son onni die leukemia could save bone marrow transplant deni insur carrier reat enefitudi pas ennesse bar exam never argu case judg juri find group experienc lawyer larg firm head rummond attorney use unscrupul tactic win casesh origin judg arvey ale set dismiss see socal lotteri case slow judici process sympathet judg yrone ipler take ale suffer fatal heart attack ipler former civil right attorney immedi deni insur compani petit dismissalhil seek new client udi meet elli iker batter wife whose husband liff beat put hospit udi persuad elli file divorc lead confront liff result abus husband death keep udi implic elli tell polic kill husband selfdefens district attorney declin prosecuteonni die give video deposit home case goe trial rummond get vital testimoni udi key wit acki emanczyk stricken record evertheless thank udi determin skill crossexamin reat enefit presid ilfr eeley juri find plaintifft great triumph udi eck insur compani declar bankruptci allow avoid pay punit damag payout griev parent fee udyecid success creat unrealist expect futur client udi decid abandon new practic teach law elli leav town togeth
21309,2013 n ear ritish irector eremy overing irector eremy overingast ain e aestecker lice nglert llen eech unknown fter dating for just two weeks om ain e aestecker invites ucy lice nglert to go with him and some friends to a festival he night before om plans to take ucy to the ilairney ouse otel which he booked online and is hidden away on a series of remote roads in the rish countryside efore making their way to the hotel the couple stop at a pub and a confrontation occurs between om and some of the localsn the empty back road to the hotel om and ucy find themselves going in circles despite following the signs and their satnav stops working hey eventually realise that they keep returning to the same point no matter which route they take and are unable to find their way back to the main road trange things begin happening including ucy spotting a man in a white mask and someone attempting to grab her from the darknesshile speeding down the road away from their attacker om clips a man in the road e and ucy pick up the man who says his name is ax llen eech ax claims to be under attack by the same people stalking the couple owever he is eventually revealed to be the true culprit om kicks ax out of the car following a harrowing confrontation and ax breaks oms wrist in a subsequent fightucy and om take their torches to hide in the woods from him when their car runs out of petrol n the darkness om is grabbed and disappears ucy returns to the car alone and finds a petrol can in the front seat fter refilling the tank and with the satnav now mysteriously working again ucy drives on and eventually finds the hotel but discovers that it is abandoned he car park is a graveyard of derelict cars suggesting that she and om are not the first victimsax returns in a and over and pursues ucy hen ucy is able to stop the car she finds a tube running from the exhaust pipe into the boot he opens the boot and discovers om bound inside dead from carbon monoxide poisoning from the tube forced into his throats day breaks ucy finds the way back to the main road but as she drives over a lonely moor towards it she sees ax standing in road in the distance ax stretches out his arms and smiles at her ucy slams her foot on the pedal and accelerates towards ax,2013 ear ritish irector eremi over irector eremi overingast aesteck louse nglert llen eech unknown fter date two week aesteck invit uci louse nglert friend festiv night plan take uci ilairney ous otel book onlin hidden away seri remot road rish countrysid efor make way hotel coupl stop pub confront occur localsn empti back road hotel uci find go circl despit follow sign satnav stop work hey eventu realis keep return point matter rout take unabl find way back main road trang thing begin happen includ uci spot man white mask someon attempt grab darknesshil speed road away attack clip man road uci pick man say name llen eech claim attack peopl stalk coupl owev eventu reveal true culprit kick car follow harrow confront break om wrist subsequ fightuci take torch hide wood car run petrol dark grab disappear uci return car alon find petrol front seat fter refil tank satnav mysteri work uci drive eventu find hotel discov abandon car park graveyard derelict car suggest first victimsax return pursu uci hen uci abl stop car find tube run exhaust pipe boot open boot discov bound insid dead carbon monoxid poison tube forc throat day break uci find way back main road drive lone moor toward see stand road distanc stretch arm smile uci slam foot pedal acceler toward
17831,2009 aos ast ancer ustralian ruce eresford ruce reenwoodhi aoyle acachlanoan hen biography n the era of aos ultural evolution in the 60s70s 11yearold hinese boy i unxin resides in a rural village commune in handong rovince destined to labour in the fields s often occurred in those times government officials fanning out across the nation seeking young candidates for centralized training arrive at this school t first bypassed but selected after a plea by his teacher during the school visit i seems bewildered although piqued by the gruff preliminary inspection screening at the provincial capital city of ingdao orwarded to a eijing audition for a place in adame aos ance cademy he is admitted for ballet training based on a series of physique and flexibility examinationsears of arduous training follow i surpassing his initial lukewarm interest and mediocre performance after inspiration from senior teacher han whose advocacy of classical ussian ballet as opposed to the politically aimed physically strident form required by adame ao leads to the teachers apparent banishment ater during the course of a groundbreaking cultural visit to hina mericanbased nglish ballet director en tevenson impressed by is standout talent seeks him as an exchange student at his the ouston allet is determined courage garners a formerly disparaging teacher to influence the cademy to allow him the opportunity for a threemonth stay in the nited tatesis encounters with life cause questioning of the hinese ommunist arty dictates upon which he has been raised and he begins a relationship with an aspiring merican dancer lizabeth ackey uickly attracting the attention of the local ballet scene i together with tevenson requests a time extension in merica but the hinese government refuses verwhelmed by the opportunities offered in merica and in love with ackey i is determined to stay ith legal advice that the hinese government would recognize certain residence rights arising from an international marriage i and ackey rush into a marriage o declare personal responsibility for his decision and hopefully avoid consequences for his family and tevenson i visits the hinese onsulate in ouston he hinese resident diplomat forcibly detains i in an attempt to coerce his return to hina nknown to i the situation quickly evolves when the media and high level government agents both in the and hina become involved hen i perseveres in his refusal to repatriate the hinese overnment agrees to release him but revokes his citizenship and declares he can never return to the land of his birthi and lizabeth are set to depart for lorida but i is persuaded to stay by tevenson for his ballet company dooming lizabeths prospects of dancing success urdened by this plus concerned for and unable to communicate with his family unxin continues to excel as a dancer but his relationship with lizabeth disintegrates and their marriage ends ive years later as a show of goodwill the hinese government allows is parents to visit him in the where they finally witness his performance of he ite of pring and even reunite with him on stage i is eventually granted permission to visit hina ogether with his new wife ary cendry amilla ergotis an ustralian ballerina and coming back to the village of his youth he rejoins his family and his former teacher han who expresses regret that he never got to see i perform i and cendry give an impromptu outdoor ballet performance to the villages uproarious cheerlosing credits announce that i unxin danced in hina with the ouston allet in 1995 a performance broadcast to an audience of over 500 million people e and ary cendry now live in ustralia with their three children en tevenson left the ouston allet after 27 years as rtistic irector cclaimed as one of the worlds leading choreographers he is now rtistic irector of the exas allet heater harles oster still practices law in ouston e is recognized internationally as an authority on mmigration aw lizabeth ackey iz danced with the klahoma allet for some years he is now a speech therapist working mainly with children,2009 ao ast ancer ustralian ruce eresford ruce reenwoodhi aoyl acachlanoan hen biographi era ao ultur evolut 60s70 11yearold hines boy unxin resid rural villag commun handong rovinc destin labour field often occur time govern offici fan across nation seek young candid central train arriv school first bypass select plea teacher school visit seem bewild although piqu gruff preliminari inspect screen provinci capit citi ingdao orward eij audit place adam ao anc cademi admit ballet train base seri physiqu flexibl examinationsear arduou train follow surpass initi lukewarm interest mediocr perform inspir senior teacher han whose advocaci classic ussian ballet oppos polit aim physic strident form requir adam lead teacher appar banish ater cours groundbreak cultur visit hina mericanbas nglish ballet director tevenson impress standout talent seek exchang student ouston allet determin courag garner formerli disparag teacher influenc cademi allow opportun threemonth stay nite tatesi encount life caus question hines ommunist arti dictat upon rais begin relationship aspir merican dancer lizabeth ackey uickli attract attent local ballet scene togeth tevenson request time extens merica hines govern refus verwhelm opportun offer merica love ackey determin stay ith legal advic hines govern would recogn certain resid right aris intern marriag ackey rush marriag declar person respons decis hope avoid consequ famili tevenson visit hines onsul ouston hines resid diplomat forcibl detain attempt coerc return hina nknown situat quickli evolv medium high level govern agent hina becom involv hen persever refus repatri hines overn agre releas revok citizenship declar never return land birthi lizabeth set depart lorida persuad stay tevenson ballet compani doom lizabeth prospect danc success urden plu concern unabl commun famili unxin continu excel dancer relationship lizabeth disintegr marriag end ive year later show goodwil hines govern allow parent visit final wit perform ite pring even reunit stage eventu grant permiss visit hina ogeth new wife ari cendri amilla ergoti ustralian ballerina come back villag youth rejoin famili former teacher han express regret never got see perform cendri give impromptu outdoor ballet perform villag uproari cheerlos credit announc unxin danc hina ouston allet 1995 perform broadcast audienc 500 million peopl ari cendri live ustralia three child tevenson left ouston allet year rtistic irector cclaim one world lead choreograph rtistic irector exa allet heater harl oster still practic law ouston recogn intern author mmigrat lizabeth ackey danc klahoma allet year speech therapist work mainli child
31733,1972 ala itrula atha elugu nknown nan unknown harmaiah aster evanand and atyam aster urendra are friends and favorite students of havani rasad aggayya atyam is the son of hushaiah ikkilineni a rich landlord harmaiah is the son of otaiah ummadi enkateswara ao a laborer agaraju an unruly kid is the son of apaiah agabhushanam village president nimosity between land lords apaiah and hushaiah is reflected in the lives of their children agaraju challenges atyam about their ability to buy tickets for a circus show atyam unsuccessfully tries to get money from his parents hushaiah and hantamma emalata s part of the plan to get money harmaiah lies that atyam committed suicide hantamma dies of shock and atyam and harmaiah decide to never lie againrasad files a case against apaiah about mixing salt in ammonia fertilizer harmaiah tells the ollector and it causes apaiah to lose reputation and his father otaiah to lose his job harmaiah escapes from the house and hides in hushaiahs cattle house to avoid the ire of his father otaiah searches for his son and reaches the cattle house with a lamp and the grass is on fire hushaiahs group catches otaiah and misunderstands that apaiah sent him atyam comes to rescue and gives evidence before the ollector to save otaiah one of the village elders apaiah hushaiah and otaiah can understand the intensity of the friendship between harmaiah and atyam and their commitment to truth atyam and harmaiah escape from village and face several problems in the city rasad comes to know that militant revolutionaries led by his childhood friend hanu rishnam aju are going to kill apaiah and tells atyam and harma to inform apaiah ilitants and police face a fight and the rebel leader is injured and caught he movie ends with village elders hushaih apaiah and otaiah recognising the need for honest people like harmaiah and atyam in village,1972 ala itrula atha elugu nknown nan unknown harmaiah aster evanand atyam aster urendra friend favorit student havani rasad aggayya atyam son hushaiah ikkilineni rich landlord harmaiah son otaiah ummadi enkateswara labor agaraju unruli kid son apaiah agabhushanam villag presid nimos land lord apaiah hushaiah reflect live child agaraju challeng atyam abil buy ticket circu show atyam unsuccess tri get money parent hushaiah hantamma emalata part plan get money harmaiah lie atyam commit suicid hantamma die shock atyam harmaiah decid never lie againrasad file case apaiah mix salt ammonia fertil harmaiah tell ollector caus apaiah lose reput father otaiah lose job harmaiah escap hous hide hushaiah cattl hous avoid ire father otaiah search son reach cattl hous lamp grass fire hushaiah group catch otaiah misunderstand apaiah sent atyam come rescu give evid ollector save otaiah one villag elder apaiah hushaiah otaiah understand intens friendship harmaiah atyam commit truth atyam harmaiah escap villag face sever problem citi rasad come know milit revolutionari led childhood friend hanu rishnam aju go kill apaiah tell atyam harma inform apaiah ilit polic face fight rebel leader injur caught movi end villag elder hushaih apaiah otaiah recognis need honest peopl like harmaiah atyam villag
14466,2004 odzilla inal ars merican yuhei itamura sutomu itagawa asahiro atsuoka ei ikukawa science fiction ears after an initial attack of okyo in 1954 odzilla is entrapped under ice in ntarctica after a battle with the original otengo n later years environmental disasters cause the appearance of giant monsters and superhumans dubbed mutants who are then recruited into the arth efense orce to battle the monsters n upgraded otengo commanded by aptain ordon battles and destroys anda but the ship is wrecked in the process and its captain ouglas ordon is suspended from the utant soldier hinichi zaki is tasked with guarding a biologist r iyuki tonashi who is sent to study a mummified monster hey are teleported to nfant sland where they encounter the hobijin fairies of othra who reveal the mummified monster as igan an alien cyborg sent to destroy the arth who was ultimately defeated by othra hey warn that a battle between good and evil will happen soon and that zaki must choose a side iant monsters begin attacking several major cities he engage the creatures who mysteriously vanish at the same moment when an alien mothership appears over okyo he aliens named iliens warn that an incoming planet called orath will soon impact the arth peace pact is signed between arth and the iliens eanwhile inilla odzillas son is found in the forest by a boy and his grandfatheristrusting the iliens zaki iyuki and her sister nna discover that orath they saw is actually a hologram and that the aliens have replaced several members of the with duplicates fter their kind is exposed killing his superior to assume command the ilien ontroller reveals their plans to use humans as a food source while taking control of all the mutants save zaki also has the monsters placed under his control and awakens igan to have them to wipe out the he group escapes with igan pursuing them ordon convincing them to travel to ntarctica to release odzilla who easily destroys igan he otengo then guides odzilla into battle with the other monsters and returns to okyo to engage the iliens fter penetrating the mothership the group is captured and brought before as he has summoned orath to arth hough odzilla destroys orath just before it crashes it unleashes onster and the two monsters battlen upgraded igan aids onster but is intercepted by othra who is gravely wounded while managing to destroy the cyborg n the ilien ship as a fight breaks out while unable to take control of the human due to the hobijins blessing reveals that both he and zaki are superior beings known are eizers is fatally wounded but he triggers the ships selfdestruct as the group fall back to the otengo moments before the mothership explodes odzilla and onster continue their battle as the latter transforms into its true form eizer hidorah eizer hidorah initially gets the upperhand but odzilla emerges victorious in the end inilla shows up at the scene and convinces odzilla not to destroy the otengo he survivors watch odzilla and inilla return to the ocean,2004 odzilla inal ar merican yuhei itamura sutomu itagawa asahiro atsuoka ikukawa scienc fiction ear initi attack okyo 1954 odzilla entrap ice ntarctica battl origin otengo later year environment disast caus appear giant monster superhuman dub mutant recruit arth efens orc battl monster upgrad otengo command aptain ordon battl destroy anda ship wreck process captain ougla ordon suspend utant soldier hinichi zaki task guard biologist iyuki tonashi sent studi mummifi monster hey teleport nfant sland encount hobijin fairi othra reveal mummifi monster igan alien cyborg sent destroy arth ultim defeat othra hey warn battl good evil happen soon zaki must choos side iant monster begin attack sever major citi engag creatur mysteri vanish moment alien mothership appear okyo alien name ilien warn incom planet call orath soon impact arth peac pact sign arth ilien eanwhil inilla odzilla son found forest boy grandfatheristrust ilien zaki iyuki sister nna discov orath saw actual hologram alien replac sever member duplic fter kind expo kill superior assum command ilien ontrol reveal plan use human food sourc take control mutant save zaki also monster place control awaken igan wipe group escap igan pursu ordon convinc travel ntarctica releas odzilla easili destroy igan otengo guid odzilla battl monster return okyo engag ilien fter penetr mothership group captur brought summon orath arth hough odzilla destroy orath crash unleash onster two monster battlen upgrad igan aid onster intercept othra grave wound manag destroy cyborg ilien ship fight break unabl take control human due hobijin bless reveal zaki superior be known eizer fatal wound trigger ship selfdestruct group fall back otengo moment mothership explod odzilla onster continu battl latter transform true form eizer hidorah eizer hidorah initi get upperhand odzilla emerg victori end inilla show scene convinc odzilla destroy otengo survivor watch odzilla inilla return ocean
34191,2011 he yrammmid ussian ldar alavatov leksei erebryakov yodor ondarchuk yotr yodorov crime ussia early 1990s ergei amontov is looking for where to apply himself and his intellect nd so he orders a mockup of a security paper with imperial script rich ornament watermarks and his own portrait in the centern active advertising campaign begins little more than two weeks is enough to make people lined up for the mamonts owerful bankers and state structures are in confusion no one has a clue how to stop it and the has already accumulated more than 10 million investorsurthermore amontov is concerned that there are no rich people in the country and all oviet industry is exposed to privatization e accumulates private greed and decides to make honest privatization is way is blocked by the agent of estern imperialism elyavsky an allusion to oris erezovsky with his egabank an allusion to ogo elyavsky comes from the top he makes connections in the remlin and is in charge of television elyavsky proposes to share ussia amontov refuses do not trade with ussia which attracts the financial inspectorate who without checking documents imposes unthinkable demands for paying taxes upon him which amontov executes here is still enough money to ruin the bank of elyavsky n the country by that time are already 20 million investors and every week the number increases by a million the mamont goes on par with ruble amontov threatens to seize power with the help of investors who are facing ruin uring a oneminute audience with the resident amontov appears as a guardian for the state amid a corrupt environment and asks for a change in the law to allow foreigners to be involved in their financial system in order to subordinate the estern oligarchy and thereby make eltsins ussia leader of the world ut elyavsky begins to threaten the life of amontovs daughter and he eventually falls into a trap n the stankino ower the battered amontov again refuses to cooperate with elyavsky despite the proposed opportunity to become the head of state amontov hopes to leave with his daughter defending himself by having a recording of a conversation with a representative of the where he offered similar privileges from his assistant era but she escaping from the people of elyavsky drops the recording in a park and on charges of nonpayment of taxes amontov gets in prison and comes out after 7 years,2011 yrammmid ussian ldar alavatov leksei erebryakov yodor ondarchuk yotr yodorov crime ussia earli 1990 ergei amontov look appli intellect order mockup secur paper imperi script rich ornament watermark portrait centern activ advertis campaign begin littl two week enough make peopl line mamont ower banker state structur confus one clue stop alreadi accumul million investorsurthermor amontov concern rich peopl countri oviet industri expo privat accumul privat greed decid make honest privat way block agent estern imperi elyavski allus ori erezovski egabank allus ogo elyavski come top make connect remlin charg televis elyavski propos share ussia amontov refus trade ussia attract financi inspector without check document impos unthink demand pay tax upon amontov execut still enough money ruin bank elyavski countri time alreadi million investor everi week number increas million mamont goe par rubl amontov threaten seiz power help investor face ruin ure oneminut audienc resid amontov appear guardian state amid corrupt environ ask chang law allow foreign involv financi system order subordin estern oligarchi therebi make eltsin ussia leader world elyavski begin threaten life amontov daughter eventu fall trap stankino ower batter amontov refus cooper elyavski despit propos opportun becom head state amontov hope leav daughter defend record convers repres offer similar privileg assist era escap peopl elyavski drop record park charg nonpay tax amontov get prison come year
9599,1979 he love merican oss agen ohn axon oanna assidy oosevelt rier oan londell action am ellog ohn axon is an excop who works as a modern day bounty hunter in os ngeles e works for bailbondsman ill chwartz eenan ynn and is assigned to bring in exconvicts and criminals who have skipped bail ellog is frustrated over the low amount of money he receives from his jobs ecently divorced ellogs exwife is threatening to end weekend visitation rights to their young daughter over missing several alimony payments ne day ellog is offered a large offthebook 20000 bounty by his former police commander t ruger oward onig to bring in an excovict named ictor ale oosevelt rier who is suspected in the murders of various former prison guards in the area ale was brutalized in prison by the guards who used a fivepound leathercovered steel glove called the riot glove and has been using a copy of it to murder the prison guards who used to beat him with it ellog takes the job aware that the 20 grand reward will solve all of his financial problemsver the course of the film the acton switches back and forth between the lives of the protagonist ellog who narrates several aspects of his life as well as the antagonist ale who in between murdering the former prison guards makes a living as a guitarist in a jazz band and is popular and well liked among the tenants in the lowincome housing project where he lives ale soon realizes that ellog is on his tail when he learns that ellog has been asking questions about ales whereabouts ale begins stalking ellog as well as making phone calls to his house to stop trying to find himt the climax ellog and ale finally meet face to face when ellog tracks ale to the roof of ales apartment building where ale whom annoys ellog during most of the film by addressing him as hound dog offers to make bringing him in a challenge by giving ellog his riot glove to fight him with ellog accepts and a brutal and climatic brawl occurs on the roof of the building where both men batter each other senseless he fight ends in a stalemate when both of them collapse against a wall exhausted and ellog concedes defeat by removing the riot glove o show that he does not hold any grudge against him ale helps ellog up and begins to escort him from the building until t ruger suddenly shows up and shoots ale to death ruger tells ellog that the bounty for ictor ale was to kill him not to bring him in alive he residents of the building after hearing the gunshots rush up and literally beat ruger to death for killing one of their own leaving behind the battered and bloodied ellog on the floor n a final voiceover ellog explains that he nevertheless received the 20000 bounty for ictor ale and having used the money to pay off all of his debts was able to regain visitation rights to his daughter,1979 love merican os agen ohn axon oanna assidi oosevelt rier oan londel action ellog ohn axon excop work modern day bounti hunter ngele work bailbondsman ill chwartz eenan ynn assign bring exconvict crimin skip bail ellog frustrat low amount money receiv job ecent divorc ellog exwif threaten end weekend visit right young daughter miss sever alimoni payment day ellog offer larg offthebook 20000 bounti former polic command ruger oward onig bring excovict name ictor ale oosevelt rier suspect murder variou former prison guard area ale brutal prison guard use fivepound leathercov steel glove call riot glove use copi murder prison guard use beat ellog take job awar grand reward solv financi problemsv cours film acton switch back forth live protagonist ellog narrat sever aspect life well antagonist ale murder former prison guard make live guitarist jazz band popular well like among tenant lowincom hous project live ale soon realiz ellog tail learn ellog ask question ale whereabout ale begin stalk ellog well make phone call hous stop tri find himt climax ellog ale final meet face face ellog track ale roof ale apart build ale annoy ellog film address hound dog offer make bring challeng give ellog riot glove fight ellog accept brutal climat brawl occur roof build men batter senseless fight end stalem collaps wall exhaust ellog conced defeat remov riot glove show hold grudg ale help ellog begin escort build ruger suddenli show shoot ale death ruger tell ellog bounti ictor ale kill bring aliv resid build hear gunshot rush liter beat ruger death kill one leav behind batter bloodi ellog floor final voiceov ellog explain nevertheless receiv 20000 bounti ictor ale use money pay debt abl regain visit right daughter
5022,1949 ulsa merican tuart eisler usan ayward obert reston drama he plot revolved around the ulsa klahoma oil boom of the 1920s and detailed how obsession with accumulating wealth and power can tend to corrupt moral character2 he story begins with the death of rancher else ansing who is killed by an oil well blowout while visiting a well operated by anner etroleum to report that pollution from the oil production has killed some of his cattle4 he plot thickens as ansings daughter herokee acquires drilling rights and meets rad rady a geologist who wants the oil drillers to limit their drilling in order to minimize oil field depletion and to preserve the areas grasslands4 fire in a derrick tailing pool started by im edbird a herokee who had been made a rich owner of oil land through crooked dealings of oilmen and who later renounces his holdings results in an extravagant fire scene for which the movie got its scar nomination2 n its aftermath in recognition of the destruction caused by improper oil drilling and how money and power can corrupt even those who love the land the oil drillers and the geologist learn to work together2,1949 ulsa merican tuart eisler usan ayward obert reston drama plot revolv around ulsa klahoma oil boom 1920 detail ob accumul wealth power tend corrupt moral character2 stori begin death rancher el an kill oil well blowout visit well oper anner etroleum report pollut oil product kill cattle4 plot thicken an daughter heroke acquir drill right meet rad radi geologist want oil driller limit drill order minim oil field deplet preserv area grasslands4 fire derrick tail pool start edbird heroke made rich owner oil land crook deal oilman later renounc hold result extravag fire scene movi got scar nomination2 aftermath recognit destruct caus improp oil drill money power corrupt even love land oil driller geologist learn work together2



Lemmatization typically takes more time compared to stemming because it is a more complex and linguistically informed process. Here are some reasons why lemmatization is slower than stemming:

Linguistic Knowledge: Lemmatization relies on linguistic knowledge and rules to find the base or dictionary form (lemma) of a word. This involves understanding the part of speech of the word (e.g., noun, verb, adjective) and applying rules to convert it to its canonical form. Stemming, on the other hand, uses heuristics and simpler rules to remove suffixes and prefixes without considering the grammatical context.

Accuracy: Lemmatization aims to produce accurate base forms of words, which requires more extensive analysis and rule-based transformations. Stemming is a more aggressive approach that may produce less accurate results but is faster because it doesn't consider the context as much.

Lexicon or Dictionary: Lemmatization often requires access to a lexicon or dictionary of words and their lemmas. Stemming usually doesn't rely on a dictionary and operates purely based on pattern matching and rules.

Complexity of the English Language: English has a complex vocabulary with irregular forms and exceptions. Lemmatization algorithms need to account for these irregularities, which can make the process more time-consuming.

Part of Speech: Lemmatization considers the part of speech of a word, which adds an extra layer of complexity. Stemming treats all words as if they were nouns, which simplifies the process.

Contextual Analysis: Lemmatization may require contextual analysis to determine the correct base form based on the surrounding words and the sentence structure. Stemming typically doesn't consider context in this way.

Due to these factors, lemmatization is generally more accurate in producing meaningful base forms of words, but it comes at the cost of increased computational complexity and time. In contrast, stemming is faster but may produce less accurate results and may not always result in valid words. The choice between lemmatization and stemming depends on the specific requirements of the natural language processing task and the trade-off between accuracy and efficiency.

# TF-IDF

In [None]:
# TF-IDF Vectorization
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['doc_clean'])
column_names = vectorizer.get_feature_names_out()
df_tf_idf = pd.DataFrame(X.toarray(), columns=column_names)


# Cosine Similarity

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Cosine Similarity
df_cos_sim = pd.DataFrame(cosine_similarity(df_tf_idf, df_tf_idf))
df_cos_sim

# Converting Cosine Similarity Dataframe to Top-K Items

In [None]:
import warnings

# hide pandas warning messages
warnings.filterwarnings('ignore')

In [None]:
from tqdm.notebook import tqdm

K = 10
df_top_k = pd.DataFrame(index=df.index)

for col in tqdm(df.index):
    sim_scores = list(enumerate(df_cos_sim[col]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:K+1]
    movie_indices = [i[0] for i in sim_scores]
    df_top_k[col] = movie_indices


In [None]:
# saving similarity top-k dataframe

# Transpose
df_top_k = df_top_k.T

df_top_k.sample(10)

# saving similarity top-k dataframe
df_top_k.to_parquet("../data/movie_top_k_t.parquet")


# Streamlit Application

'
'
'