# NLP with Machine Learning Assignment Solutions

## 0. Create a New Environment

Command line code to execute in the Terminal (Mac) or Anaconda Prompt (PC):


#### 1. view, create and switch environments
```
conda env list
conda create --name nlp_machine_learning
conda env list
conda activate nlp_machine_learning
```

#### 2. install and view packages
```
conda install python jupyter notebook pandas matplotlib scikit-learn spacy openpyxl numpy
conda install -c conda-forge vaderSentiment
conda list
```

#### 3. additional spacy download
```
python -m spacy download en_core_web_sm
```

## 1. Sentiment Analysis

1. Create a new _nlp_machine_learning_ environment
2. Launch Jupyter Notebook
3. Read in the _movie_reviews.csv_ file
4. Apply sentiment analysis to the _movie_info_ column
5. Sort the sentiment scores to return the top 10 and bottom 10 sentiment scores and their corresponding movie titles

In [1]:
# read in movie reviews
import pandas as pd

# view full movie_info text
pd.set_option('display.max_colwidth', None)

# read in the movie reviews data
df = pd.read_csv('../Data/movie_reviews.csv')
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit."
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity."


In [2]:
# sentiment analysis
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# create an analyzer object
analyzer = SentimentIntensityAnalyzer()

In [3]:
# view one movie
df.movie_info[0]

'Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his "boy," Ethan (Dennis Quaid) and Ethan\'s wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah\'s baby granddaughter, CJ. The problem is that CJ\'s mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey\'s soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey\'s adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ\'s best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.'

In [4]:
# get sentiment for one movie
analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(df.movie_info[0])['compound']

0.9837

In [5]:
# define a function to get sentiment for any movie
def get_sentiment(text):
    return analyzer.polarity_scores(text)['compound']

In [6]:
# apply the function on the entire column
df['sentiment'] = df['movie_info'].apply(get_sentiment)
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237


In [7]:
# top 10 most positive movies
df[['movie_title', 'movie_info', 'sentiment']].sort_values(by='sentiment', ascending=False).head(10)

Unnamed: 0,movie_title,movie_info,sentiment
23,Breakthrough,"BREAKTHROUGH is based on the inspirational true story of one mother's unfaltering love in the face of impossible odds. When Joyce Smith's adopted son John falls through an icy Missouri lake, all hope seems lost. But as John lies lifeless, Joyce refuses to give up. Her steadfast belief inspires those around her to continue to pray for John's recovery, even in the face of every case history and scientific prediction. From producer DeVon Franklin (Miracles from Heaven) and adapted for the screen by Grant Nieporte (Seven Pounds) from Joyce Smith's own book, BREAKTHROUGH is an enthralling reminder that faith and love can create a mountain of hope, and sometimes even a miracle.",0.9915
81,Missing Link,"This April, meet Mr. Link (Galifianakis): 8 feet tall, 630 lbs, and covered in fur, but don't let his appearance fool you... he is funny, sweet, and adorably literal, making him the world's most lovable legend at the heart of Missing Link, the globe-trotting family adventure from LAIKA. Tired of living a solitary life in the Pacific Northwest, Mr. Link recruits fearless explorer Sir Lionel Frost (Jackman) to guide him on a journey to find his long-lost relatives in the fabled valley of Shangri-La. Along with adventurer Adelina Fortnight (Saldana), our fearless trio of explorers encounter more than their fair share of peril as they travel to the far reaches of the world to help their new friend. Through it all, the three learn that sometimes you can find a family in the places you least expect.",0.9909
130,The Laundromat,"When her idyllic vacation takes an unthinkable turn, Ellen Martin (Academy Award winner Meryl Streep) begins investigating a fake insurance policy, only to find herself down a rabbit hole of questionable dealings that can be linked to a Panama City law firm and its vested interest in helping the world's wealthiest citizens amass even larger fortunes. The charming -- and very well-dressed -- founding partners Jürgen Mossack (Academy Award winner Gary Oldman) and Ramón Fonseca (Golden Globe nominee Antonio Banderas) are experts in the seductive ways shell companies and offshore accounts help the rich and powerful prosper. They are about to show us that Ellen's predicament only hints at the tax evasion, bribery and other illicit absurdities that the super wealthy indulge in to support the world's corrupt financial system.",0.9908
48,Five Feet Apart,"Stella Grant (Haley Lu Richardson) is every bit a seventeen-year-old... she's attached to her laptop and loves her best friends. But unlike most teenagers, she spends much of her time living in a hospital as a cystic fibrosis patient. Her life is full of routines, boundaries and self-control -- all of which is put to the test when she meets an impossibly charming fellow CF patient named Will Newman (Cole Sprouse). There's an instant flirtation, though restrictions dictate that they must maintain a safe distance between them. As their connection intensifies, so does the temptation to throw the rules out the window and embrace that attraction. Further complicating matters is Will's potentially dangerous rebellion against his ongoing medical treatment. Stella gradually inspires Will to live life to the fullest, but can she ultimately save the person she loves when even a single touch is off limits?",0.9889
156,UglyDolls,"In the adorably different town of Uglyville, weird is celebrated, strange is special and beauty is embraced as more than simply meets the eye. Here, the free-spirited Moxy (Clarkson) and her UglyDoll friends live every day in a whirlwind of bliss, letting their freak flags fly in a celebration of life and its endless possibilities. In this all-new story, the UglyDolls will go on a journey beyond the comfortable borders of Uglyville. There, they will confront what it means to be different, struggle with their desire to be loved, and ultimately discover that you don't have to be perfect to be amazing because who you truly are is what matters most.",0.9862
93,Red Joan,"In a picturesque village in England, Joan Stanley (Academy Award (R) winner Dame Judi Dench), lives in contented retirement. Then suddenly her tranquil existence is shattered as she's shockingly arrested by MI5. For Joan has been hiding an incredible past; she is one of the most influential spies in living history... Cambridge University in the 1930s, and the young Joan (Sophie Cookson), a demure physics student, falls intensely in love with a seductively attractive Russian saboteur, Leo. Through him, she begins to see that the world is on a knife-edge and perhaps must be saved from itself in the race to military supremacy. Post-war and now working at a top secret nuclear research facility, Joan is confronted with the impossible: Would you betray your country and your loved ones, if it meant saving them? What price would you pay for peace? Inspired by an extraordinary true story, Red Joan is the taut and emotional discovery of one woman's sacrifice in the face of incredible circumstances. A woman to whom we perhaps all owe our freedom.",0.9848
49,Giant Little Ones,"Franky Winter (Josh Wiggins) and Ballas Kohl (Darren Mann) have been best friends since childhood. They are high school royalty: handsome, stars of the swim team and popular with girls. They live a perfect teenage life - until the night of Franky's epic 17th birthday party, when Franky and Ballas are involved in an unexpected incident that changes their lives forever. Giant Little Ones is a heartfelt and intimate coming-of-age story about friendship, self-discovery, and the power of love without labels.",0.9839
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",0.9837
36,Dumbo,"From Disney and visionary director Tim Burton, the all-new grand live-action adventure ""Dumbo"" expands on the beloved classic story where differences are celebrated, family is cherished and dreams take flight. Circus owner Max Medici (Danny DeVito) enlists former star Holt Farrier (Colin Farrell) and his children Milly (Nico Parker) and Joe (Finley Hobbins) to care for a newborn elephant whose oversized ears make him a laughingstock in an already struggling circus. But when they discover that Dumbo can fly, the circus makes an incredible comeback, attracting persuasive entrepreneur V.A. Vandevere (Michael Keaton), who recruits the peculiar pachyderm for his newest, larger-than-life entertainment venture, Dreamland. Dumbo soars to new heights alongside a charming and spectacular aerial artist, Colette Marchant (Eva Green), until Holt learns that beneath its shiny veneer, Dreamland is full of dark secrets.",0.9801
71,Long Shot,"Fred Flarsky (Seth Rogen) is a gifted and free-spirited journalist with an affinity for trouble. Charlotte Field (Charlize Theron) is one of the most influential women in the world. Smart, sophisticated, and accomplished, she's a powerhouse diplomat with a talent for... well, mostly everything. The two have nothing in common, except that she was his babysitter and childhood crush. When Fred unexpectedly reconnects with Charlotte, he charms her with his self-deprecating humor and his memories of her youthful idealism. As she prepares to make a run for the Presidency, Charlotte impulsively hires Fred as her speechwriter, much to the dismay of her trusted advisors. A fish out of water on Charlotte's elite team, Fred is unprepared for her glamorous lifestyle in the limelight. However, sparks fly as their unmistakable chemistry leads to a round-the-world romance and a series of unexpected and dangerous incidents.",0.9778


In [8]:
# top 10 most negative movies
df[['movie_title', 'movie_info', 'sentiment']].sort_values(by='sentiment', ascending=True).head(10)

Unnamed: 0,movie_title,movie_info,sentiment
7,All Is True,"The year is 1613, Shakespeare is acknowledged as the greatest writer of the age. But disaster strikes when his renowned Globe Theatre burns to the ground, and devastated, Shakespeare returns to Stratford, where he must face a troubled past and a neglected family. Haunted by the death of his only son Hamnet, he struggles to mend the broken relationships with his wife and daughters. In so doing, he is ruthlessly forced to examine his own failings as husband and father. His very personal search for the truth uncovers secrets and lies within a family at war.",-0.9955
148,The Wind,"An unseen evil haunts the homestead in this chilling, folkloric tale of madness, paranoia, and otherworldly terror. Lizzy (Caitlin Gerard) is a tough, resourceful frontierswoman settling a remote stretch of land on the 19th-century American frontier. Isolated from civilization in a desolate wilderness where the wind never stops howling, she begins to sense a sinister presence that seems to be borne of the land itself, an overwhelming dread that her husband (Ashley Zukerman) dismisses as superstition. When a newlywed couple arrives on a nearby homestead, their presence amplifies Lizzy's fears, setting into motion a shocking chain of events. Masterfully blending haunting visuals with pulse-pounding sound design, director Emma Tammi evokes a godforsaken world in which the forces of nature come alive with quivering menace.",-0.9838
83,Nightmare Cinema,"In this twisted horror anthology, five strangers are drawn to an abandoned theater and forced to watch their deepest and darkest fears play out before them. Lurking in the shadows is the Projectionist, who preys upon their souls with his collection of disturbing films. As each reel spins its sinister tale, the characters find frightening parallels to their own lives.",-0.9756
154,Triple Threat,"TRIPLE THREAT, the newest feature from Johnson, is an adrenaline fueled and gritty action thriller starring some of the biggest names in action today. Michael Jai White (BLACK DYNAMITE; UNDISPUTED 2: LAST MAN STANDING), Scott Adkins (Marvel's DOCTOR STRANGE; THE EXPENDABLES 2), Michael Bisping (xXx: RETURN OF XANDER CAGE) star as a group of professional assassins hired to take out a billionaire's daughter who is intent on bringing down a major crime syndicate. A down and out team of mercenaries, played by Tony Jaa (ONG BAK TRILOGY; xXx: RETURN OF XANDER CAGE), Iko Uwais (THE RAID 1 & 2; STAR WARS: THE FORCE AWAKENS) and Tiger Chen (MAN OF TAI CHI), must take on the assassins and stop them before they kill their target. The film co-stars JeeJa Yanin (CHOCOLATE) Michael Wong (Cold War) and Celina Jade (WOLF WARRIOR 2).",-0.9696
11,Angel of Mine,"Noomi Rapace (The Girl with the Dragon Tattoo) stars as a woman on the edge in this intense psychological thriller. Having suffered a tragic loss years earlier, Lizzie (Rapace) is trying to rebuild her life when she suddenly becomes obsessed with a neighbor's daughter, believing the girl to be her own child. As Lizzie's shocking, threatening acts grow increasingly dangerous, they lead to an explosive confrontation with the girl's angry, defensive mother (Yvonne Strahovski, ""The Handmaid's Tale"").",-0.9687
27,Charlie Says,"Three young women were sentenced to death for the infamous Manson murders. Their sentences became life imprisonment when the death penalty was lifted in California. One young graduate student was sent in to teach them. Through her, we witness their transformations as they face the reality of their horrific crimes.",-0.9643
113,The Curse of La Llorona,"In 1970s Los Angeles, La Llorona is stalking the night -- and the children. Ignoring the eerie warning of a troubled mother suspected of child endangerment, a social worker and her own small kids are soon drawn into a frightening supernatural realm. Their only hope to survive La Llorona's deadly wrath may be a disillusioned priest and the mysticism he practices to keep evil at bay, on the fringes where fear and faith collide.",-0.9628
87,Pet Sematary,"Based on the seminal horror novel by Stephen King, Pet Sematary follows Dr. Louis Creed (Jason Clarke), who, after relocating with his wife Rachel (Amy Seimetz) and their two young children from Boston to rural Maine, discovers a mysterious burial ground hidden deep in the woods near the family's new home. When tragedy strikes, Louis turns to his unusual neighbor, Jud Crandall (John Lithgow), setting off a perilous chain reaction that unleashes an unfathomable evil with horrific consequences.",-0.959
142,The Standoff at Sparrow Creek,"After a mass shooting at a police funeral, reclusive ex-cop Gannon finds himself unwittingly forced out of retirement when he realizes that the killer belongs to the same militia he joined after quitting the force. Understanding that the shooting could set off a chain reaction of copycat violence across the country, Gannon quarantines his fellow militiamen in the remote lumber mill they call their headquarters. There, he sets about a series of grueling interrogations, intent on ferreting out the killer and turning him over to the authorities to prevent further bloodshed.",-0.959
40,El Chicano,"When L.A.P.D. Detective Diego Hernandez is assigned a career-making case investigating a vicious cartel, he uncovers links to his brother's supposed suicide and a turf battle that's about to swallow his neighborhood. Torn between playing by the book and seeking justice, he resurrects the masked street legend El Chicano. Now, out to take down his childhood buddy turned gang boss, he sets off a bloody war to defend his city and avenge his brother's murder",-0.9578


## 2. Text Classification

1. Clean and normalize the _movie_info_ column using the _maven_text_preprocessing.py_ module
2. Create a Count Vectorizer
* Remove stop words
* Set the minimum document frequency to 10%
3. Create a Naïve Bayes model and a Logistic Regression model to predict which movies are directed by women vs men using the CV
4. Compare their accuracy scores and classification reports
5. Using the better performing model, return the top 5 movies that the model predicts are most likely directed by a women

In [9]:
# decrease the column width
pd.set_option('display.max_colwidth', 100)

# view movie reviews
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Eth...",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their a...",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and se...",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this can...",0.9237


In [10]:
# import the text preprocessing steps we created in the last section
import maven_text_preprocessing

# apply them to the reviews
df['text_clean'] = maven_text_preprocessing.clean_and_normalize(df.movie_info)
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment,text_clean
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Eth...",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their a...",0.9837,bailey voice josh gad live good life michigan farm boy ethan dennis quaid ethans wife hannah mar...
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and se...",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this can...",0.9237,separate owner dog set 400mile journey safety security place call home way meet series new frien...


In [11]:
# import libraries
from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

In [12]:
# create a count vectorizer matrix
cv = CountVectorizer(stop_words='english', min_df=.1)
X = cv.fit_transform(df.text_clean)

In [13]:
# view the features / inputs X
X_df = pd.DataFrame(X.toarray(), columns=cv.get_feature_names_out())
X_df.head()

Unnamed: 0,begin,discover,family,film,follow,force,friend,home,leave,life,...,man,new,set,star,story,turn,woman,world,year,young
0,1,0,0,0,0,0,1,0,1,3,...,0,2,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,1,1,0,1,...,0,1,1,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,2,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,0,0,0,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0


In [14]:
# view the target / output y
y = df.director_gender
y.head()

0    female
1      male
2      male
3    female
4    female
Name: director_gender, dtype: object

In [15]:
# view the number of directors of each gender
y.value_counts()

director_gender
male      134
female     32
Name: count, dtype: int64

In [16]:
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_df, y, test_size=0.2, random_state=42)

# Naive Bayes model
nb = MultinomialNB()
nb.fit(X_train, y_train)

# Predict
y_pred_nb = nb.predict(X_test)

# Evaluate
print(classification_report(y_test, y_pred_nb))
print("Accuracy:", accuracy_score(y_test, y_pred_nb))

              precision    recall  f1-score   support

      female       0.25      0.20      0.22         5
        male       0.87      0.90      0.88        29

    accuracy                           0.79        34
   macro avg       0.56      0.55      0.55        34
weighted avg       0.78      0.79      0.78        34

Accuracy: 0.7941176470588235


In [17]:
# Logistic Regression model
lr = LogisticRegression()
lr.fit(X_train, y_train)

# Predict
y_pred_lr = lr.predict(X_test)

# Evaluate
print(classification_report(y_test, y_pred_lr))
print("Accuracy:", accuracy_score(y_test, y_pred_lr))

              precision    recall  f1-score   support

      female       0.29      0.40      0.33         5
        male       0.89      0.83      0.86        29

    accuracy                           0.76        34
   macro avg       0.59      0.61      0.60        34
weighted avg       0.80      0.76      0.78        34

Accuracy: 0.7647058823529411


In [18]:
# Which is the better performing model?
# Even though Naive Bayes has a higher accuracy score than Logistic Regression, our main goal is to correctly identify
# female-directed movies, and both the precision and recall scores are higher for females with the Logistic Regression
# model, so for the next task, we're going to move forward with Logistic Regression as our higher performing model.

In [19]:
# which movies are most likely directed by women
import numpy as np

pd.set_option('display.max_colwidth', None) # view full text once again

# calculate probability scores for each movie
df['female_director_prediction'] = lr.predict_proba(X_df)[:, 0]

# display the top scores
(df[['movie_title', 'movie_info', 'directors',
     'director_gender', 'female_director_prediction']]
 .sort_values('female_director_prediction', ascending=False)
 .head())

Unnamed: 0,movie_title,movie_info,directors,director_gender,female_director_prediction
55,Greta,"A sweet, naïve young woman trying to make it on her own in New York City, Frances (Chloë Grace Moretz) doesn't think twice about returning the handbag she finds on the subway to its rightful owner. That owner is Greta (Isabelle Huppert), an eccentric French piano teacher with a love for classical music and an aching loneliness. Having recently lost her mother, Frances quickly grows closer to widowed Greta. The two become fast friends - but Greta's maternal charms begin to dissolve and grow increasingly disturbing as Frances discovers that nothing in Greta's life is what it seems in this suspense thriller from Academy Award (R) winning director Neil Jordan.",Neil Jordan,male,0.840252
27,Charlie Says,"Three young women were sentenced to death for the infamous Manson murders. Their sentences became life imprisonment when the death penalty was lifted in California. One young graduate student was sent in to teach them. Through her, we witness their transformations as they face the reality of their horrific crimes.",Mary Harron,female,0.721752
140,The Secret Life of Pets 2,"THE SECRET LIFE OF PETS 2 will follow summer 2016's blockbuster about the lives our pets lead after we leave for work or school each day. Illumination founder and CEO Chris Meledandri and his longtime collaborator Janet Healy will produce the sequel to the comedy that had the best opening ever for an original film, animated or otherwise. THE SECRET LIFE OF PETS 2 will see the return of writer Brian Lynch (Minions) and once again be directed by Chris Renaud (Despicable Me series, Dr. Seuss' The Lorax).","Chris Renaud, Jonathan Del Val",male,0.692048
76,Mary Magdalene,"Set in the Holy Land in the first century C.E., a young woman leaves her small fishing village and traditional family behind to join a radical new social movement. At its head is a charismatic leader, Jesus of Nazareth, who promises that the world is changing. Mary is searching for a new way of living, and an authenticity that is denied her by the rigid hierarchies of the day. As the notoriety of the group spread and more are drawn to follow Jesus' inspirational message, Mary's spiritual journey places her at the heart of a story that will lead to the capital city of Jerusalem, where she must confront the reality of Jesus' destiny and her own place within it.",Garth Davis,male,0.671338
69,Little,"Marsai Martin (TV's Black-ish) stars in and executive produces Universal Pictures' LITTLE, a comedy from producer Will Packer (Girls Trip, Ride Along and Think Like a Man series) based on an idea the young actress pitched. Directed by Tina Gordon (Peeples), the film tells the story of a woman who-when the pressures of adulthood become too much to bear-gets the chance to relive the carefree life of her younger self.",Tina Gordon Chism,female,0.620955


## 3. Topic Modeling

1. Using the same preprocessed data as the last assignment, create a Tfidf Vectorizer
* Remove stop words
* Start with min_df = 0.05 and max_df=0.2
2. Create an NMF model to find the main topics in the movie descriptions
* Start with n_components=2
3. Tweak the model by updating the Tfidf Vectorizer parameters and number of topics
4. Interpret and name the topics
5. For two of the topics, return the top movies that contain the topic

In [20]:
# decrease the column width
pd.set_option('display.max_colwidth', 100)

# view movie reviews
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment,text_clean,female_director_prediction
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Eth...",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their a...",0.9837,bailey voice josh gad live good life michigan farm boy ethan dennis quaid ethans wife hannah mar...,0.545264
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and se...",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this can...",0.9237,separate owner dog set 400mile journey safety security place call home way meet series new frien...,0.107074


In [21]:
# imports
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

In [22]:
# create a tfidf vectorizer matrix
tv = TfidfVectorizer(stop_words='english', min_df=.02, max_df=.2) # start with 0.05 and 0.2 and slowly tweak
                                                                  # end with 0.02 and 0.2
Xt = tv.fit_transform(df.text_clean)

In [23]:
# view the matrix
Xt_df = pd.DataFrame(Xt.toarray(), columns=tv.get_feature_names_out())
Xt_df.head()

Unnamed: 0,abandon,ability,academy,act,action,adventure,allnew,ambition,american,amy,...,win,winner,woman,wood,work,write,writer,year,york,young
0,0.0,0.0,0.0,0.0,0.0,0.173939,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.242848,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.250586,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
# apply nmf with n topics
nmf_model = NMF(n_components=6, random_state=42) # start with n_components=2 and increase by 1
                                                 # end with n_components=6
W = nmf_model.fit_transform(Xt_df)  # document-topic matrix
H = nmf_model.components_  # topic-term matrix

In [25]:
# create a function to view the top terms in each topic
def display_topics(H, num_words=10):
    for topic_num, topic_array in enumerate(H):
        top_features = topic_array.argsort()[::-1][:num_words]
        top_words = [tv.get_feature_names_out()[i] for i in top_features]
        print("Topic", topic_num+1, ":", ', '.join(top_words))

display_topics(H)

Topic 1 : family, father, grow, face, home, young, try, return, turn, son
Topic 2 : film, true, base, star, comedy, inspire, follow, tell, feature, event
Topic 3 : friend, good, live, dream, love, leave, meet, school, childhood, help
Topic 4 : academy, award, winner, nominee, help, violent, turn, jason, skill, include
Topic 5 : set, force, war, universe, man, black, neighborhood, decade, police, adventure
Topic 6 : child, sinister, evil, mother, deadly, horror, play, draw, supernatural, social


In [26]:
# documents to topics
doc_topics = pd.DataFrame(W)
doc_topics.columns = ['family', 'true stories', 'friends', 'award winners', 'adventure', 'horror']
doc_topics

Unnamed: 0,family,true stories,friends,award winners,adventure,horror
0,0.000000,0.000000,0.386611,0.000000,0.000000,0.000000
1,0.000000,0.000000,0.160890,0.000000,0.175711,0.000000
2,0.010004,0.048791,0.111847,0.002226,0.003119,0.000000
3,0.076959,0.124083,0.000000,0.003979,0.000000,0.003756
4,0.000000,0.094903,0.122133,0.000570,0.039085,0.072190
...,...,...,...,...,...,...
161,0.000000,0.000000,0.000000,0.000000,0.135256,0.130529
162,0.000000,0.356354,0.000000,0.000000,0.000000,0.020535
163,0.000000,0.143371,0.000000,0.012895,0.000000,0.238026
164,0.000000,0.000000,0.306076,0.000000,0.000000,0.000000


In [27]:
# view full text once again
pd.set_option('display.max_colwidth', None)

# combine with movie title and info
movies_topics = pd.concat([df[['movie_title', 'movie_info']], doc_topics], axis=1)
movies_topics.head()

Unnamed: 0,movie_title,movie_info,family,true stories,friends,award winners,adventure,horror
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",0.0,0.0,0.386611,0.0,0.0,0.0
1,A Dog's Way Home,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",0.0,0.0,0.16089,0.0,0.175711,0.0
2,A Tuba to Cuba,"The leader of New Orleans' famed Preservation Hall Jazz Band seeks to fulfill his late father's dream of retracing their musical roots to the shores of Cuba in search of the indigenous music that gave birth to New Orleans jazz. A TUBA TO CUBA celebrates the triumph of the human spirit expressed through the universal language of music and challenges us to resolve to build bridges, not walls.",0.010004,0.048791,0.111847,0.002226,0.003119,0.0
3,A Vigilante,"A once abused woman, Sadie (Olivia Wilde), devotes herself to ridding victims of their domestic abusers while hunting down the husband she must kill to truly be free. A Vigilante is a thriller inspired by the strength and bravery of real domestic abuse survivors and the incredible obstacles to safety they face.",0.076959,0.124083,0.0,0.003979,0.0,0.003756
4,After,"Based on Anna Todd's best-selling novel which became a publishing sensation on social storytelling platform Wattpad, AFTER follows Tessa (Langford), a dedicated student, dutiful daughter and loyal girlfriend to her high school sweetheart, as she enters her first semester in college. Armed with grand ambitions for her future, her guarded world opens up when she meets the dark and mysterious Hardin Scott (Tiffin), a magnetic, brooding rebel who makes her question all she thought she knew about herself and what she wants out of life.",0.0,0.094903,0.122133,0.00057,0.039085,0.07219


In [28]:
# sort on family
movies_topics.sort_values('family', ascending=False).head()

Unnamed: 0,movie_title,movie_info,family,true stories,friends,award winners,adventure,horror
160,Us,"Haunted by an unexplainable and unresolved trauma from her past and compounded by a string of eerie coincidences, Adelaide feels her paranoia elevate to high-alert as she grows increasingly certain that something bad is going to befall her family. After spending a tense beach day with their friends, the Tylers (Emmy winner Elisabeth Moss, Tim Heidecker, Cali Sheldon, Noelle Sheldon), Adelaide and her family return to their vacation home. When darkness falls, the Wilsons discover the silhouette of four figures holding hands as they stand in the driveway. Us pits an endearing American family against a terrifying and uncanny opponent: doppelgängers of themselves.",0.297273,0.0,0.0,0.014277,0.0,0.0
56,Gwen,"Gwen is a young girl desperately trying to hold her home together--struggling with her mother's mysterious illness, her father's absence and a ruthless mining company encroaching on their land. As a growing darkness begins to take grip of her home, the local community grows suspicious and turns on Gwen and her family.",0.260287,0.0,0.0,0.0,0.0,0.029774
7,All Is True,"The year is 1613, Shakespeare is acknowledged as the greatest writer of the age. But disaster strikes when his renowned Globe Theatre burns to the ground, and devastated, Shakespeare returns to Stratford, where he must face a troubled past and a neglected family. Haunted by the death of his only son Hamnet, he struggles to mend the broken relationships with his wife and daughters. In so doing, he is ruthlessly forced to examine his own failings as husband and father. His very personal search for the truth uncovers secrets and lies within a family at war.",0.257575,0.0,0.0,0.0,0.035706,0.0
33,Don't Come Back from the Moon,"DON'T COME BACK FROM THE MOON is a story of abandonment, when all the men in a remote California desert town walk away from their families, one by one. They leave their angry, frustrated sons and daughters behind -- kids who act out, engage in acts of petty burglary and vandalism, and look for love and family connection in the aftermath of their abandonment, all the while trying to understand why their fathers have ""gone to the moon,"" leaving them to traverse the difficult path to adulthood alone.",0.236933,0.0,0.028529,0.0,0.0,0.0
47,Fighting with My Family,"FIGHTING WITH MY FAMILY is a heartwarming comedy based on the incredible true story of WWE Superstar Paige(TM). Born into a tight-knit wrestling family, Paige and her brother Zak are ecstatic when they get the once-in-a-lifetime opportunity to try out for WWE. But when only Paige earns a spot in the competitive training program, she must leave her family and face this new, cut-throat world alone. Paige's journey pushes her to dig deep, fight for her family, and ultimately prove to the world that what makes her different is the very thing that can make her a star.",0.212328,0.091144,0.0,0.0,0.0,0.0


In [29]:
# sort on true stories
movies_topics.sort_values('true stories', ascending=False).head(5)

Unnamed: 0,movie_title,movie_info,family,true stories,friends,award winners,adventure,horror
84,On the Basis of Sex,The film tells an inspiring and spirited true story that follows young lawyer Ruth Bader Ginsburg as she teams with her husband Marty to bring a groundbreaking case before the U.S. Court of Appeals and overturn a century of gender discrimination. The feature will premiere in 2018 in line with Justice Ginsburg's 25th anniversary on the Supreme Court.,0.0,0.422488,0.0,0.0,0.0,0.002615
145,The Upside,"Inspired by a true story, The Upside is a heartfelt comedy about a recently paroled ex-convict (Kevin Hart) who strikes up an unusual and unlikely friendship with a paralyzed billionaire (Bryan Cranston). Directed by Neil Burger and written by Jon Hartmere, The Upside is based on the hit 2011 French film The Intouchables.",0.0,0.401136,0.0,0.0,0.0,0.0
162,What Men Want,"Inspired by the Nancy Meyers hit romantic comedy WHAT WOMEN WANT, this film follows the story of a female sports agent (Henson) who has been constantly boxed out by her male colleagues. When she gains the power to hear mens' thought, she is able to shift the paradigm to her advantage as she races to sign the NBA's next superstar",0.0,0.356354,0.0,0.0,0.0,0.020535
66,King of Thieves,A true crime film about a crew of retired crooks who pull off a major heist in London's jewelry district. What starts off as their last criminal hurrah quickly turns into a brutal nightmare due to greed. Based on infamous true events.,0.0,0.354077,0.0,0.0,0.0,0.0
69,Little,"Marsai Martin (TV's Black-ish) stars in and executive produces Universal Pictures' LITTLE, a comedy from producer Will Packer (Girls Trip, Ride Along and Think Like a Man series) based on an idea the young actress pitched. Directed by Tina Gordon (Peeples), the film tells the story of a woman who-when the pressures of adulthood become too much to bear-gets the chance to relive the carefree life of her younger self.",0.018764,0.346127,0.016354,0.0,0.0,0.0
