# Extracting Featues from Text

In [59]:
import pandas as pd

In [60]:
df = pd.read_csv("https://raw.githubusercontent.com/theleadio/datascience_demo/master/netflix_titles.csv")

In [61]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


## Importing the TFIDVectorizer Class

In [62]:
#TF-IDF
#TF - Term Frequency
#IDF - Inverse Document Frequency

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words = "english") #initialise the object


## Transform documents and query into vectors

In [63]:
from sklearn.metrics.pairwise import cosine_similarity


In [64]:
def feature_to_query(query_text):
    # transforming documents and query into vectors
    feature = tfidf.fit_transform(df["description"]) # fit the description column to the model
    query = query_text # the terms that we want to examine
    query_feature = tfidf.transform([query])  #transform the query into a vector form

    # Using Cosine Similarity to find the document vectors that are close to our query vector
    cosims = cosine_similarity(query_feature, feature).flatten()
    results = cosims.argsort()[-6:-1] # sort the coisms from smallest to biggest and find top 5 closest vectrors. Extract the top 5 biggest similarity from the results
    for r in results:
        print(df.iloc[r]["description"])
        print("------")



## Trying out different features to query

In [65]:
feature_to_query("quitted his job")


A grad student leaves her boyfriend in Delhi for a job interview in Mumbai, where she reconnects with an old flame for whom she still has feelings.
------
A teacher starts her job at a high school but is haunted by a suspicious death that occurred there weeks before... and begins fearing for her own life.
------
After being fired from her job and dumped by her cheating boyfriend, a comedian has a one-night stand. Weeks later, she finds out she's pregnant.
------
A gifted detective takes a job in a small town so he can spend more time with his family. But he's soon drawn into a web of disturbing murder cases.
------
Seeking job opportunities, a young man arrives in Cairo and becomes increasingly involved with the family of a wealthy businessman.
------


In [66]:
feature_to_query("adventure in the wild")


For Como and his friends, each new adventure is a lesson on why it's important to be considerate — and to take care of one another!
------
Two history buffs with an eye for valuables traverse through forgotten mines and abandoned landmarks in the wild, wild West to score collectibles.
------
When Arnold and his crew win a trip to San Lorenzo, their adventure in the wild forces them to take the same risky path as Arnold's missing parents.
------
Fievel and his family head west for what turns out to be a wild adventure. Deep in cowboy country, the intrepid mouse faces down a nasty feline.
------
A newly engaged couple's romantic vacation in Jamaica turns into a mischievous adventure that tests their union in wild and unexpected ways.
------


In [67]:
feature_to_query("zombie disease")

In the wake of a zombie apocalypse, survivors hold on to the hope of humanity by banding together to wage a fight for their own survival.
------
As a zombie outbreak sweeps the country, a dad and his daughter take a harrowing train journey in an attempt to reach the only city that's still safe.
------
After a mysterious disease kills every resident over 22 years old, survivors of a town must fend for themselves when the government quarantines them.
------
In the dark, early days of a zombie apocalypse, complete strangers band together to find the strength they need to survive and get back to loved ones.
------
When she finally encounters two other survivors, a woman alone in a world decimated by a zombie epidemic struggles to trust her new companions.
------


In [68]:
feature_to_query("hunted by assassins")

In a near-future world, single people are hunted and forced to find mates within 45 days, or be turned into animals and banished to the wilderness.
------
A withdrawn young woman hunted by a malicious cult is abducted by a brooding stranger and undergoes a bizarre transformation.
------
Demoted to an academy job, a cop trains five foolhardy students as assassins in his risky revenge plot against police corruption and the underworld.
------
Tatsumi sets out on a journey to help his poor village. When he's rescued by a band of assassins, he joins their fight against the corrupt government.
------
A young boy is wanted for a crime he has no recollection of committing and must go on the run, hunted by two powerful kings and their forces.
------


In [69]:
feature_to_query("love rejected glow up")

A determined entrepreneur navigates a love triangle between a young charmer and an older executive, leading her down an unconventional path to love.
------
When shy Kumar gets stuck on the losing end of a love triangle, the Love Doctor, Mokia, has a strategy for Kumar to win the love of his life.
------
Rejected by the Marines but eager to serve his country after 9/11, a simple-minded man performs amateur terrorist surveillance with a young runaway.
------
After he is rejected by the woman he loves and obesity-related issues kill his uncle, a lonely, overweight artist undergoes a major transformation.
------
Eager to marry but constantly rejected by women, a bachelor hopes to win over a former crush by accepting help from an unlikely source: her mother.
------
