### NLP with Machine Learning

In this chapter, we discuss tasks that can be done using traditional NLP methods, including rules-based and supervised and unsupervised machine learning techniques.

What is machine learning (ML)?

In simple terms, we are trying to teach machines how to learn like humans.

Defi: ML algorithms are used enable computers to learn and make decisions from data.

The algorithms fall under two main categories:

- supervised Learning: Use historical data to predict the future.
  Examples: (Numeric) What will house prices look like for the next 12 months? (Text) How can I flag a suspicious email as spam?
- Unsupervised Learning: Finding patterns and relationships in data.
  Examples: (Numeric) How can I segment my customers? (Text) What hidden themes are in these product reviews?

Common Algorithms:

There are multiple machine learning algorithms. We can use these common ML algorithms for natural language processing once we preprocessed text data.
                        
Supervised Learning - Regression (Linear, Regularized, Time Series Analysis) & Classifications (Logistic, Decision Trees, Random Forest, Gradient Boosted Trees, Naive Bayes)
Unsupervised Learning (DBSCAN, Hierarchical Clustering, Principal Component Analysis, Non-Negative Matrix Factorization)

Traditional NLP:

Common NLP tasks are aften solve using traditional NLP methods, such as simple rules-based techniques or more advanced ML algorithms.

NLP Tasks we will be covering:

- Sentimental Analysis: Identifying the positivity or negativity of text (Technique: Rules-based, Library: VADER, Input format: Raw text (because order matters))
- Text Classification: Classifying text as one label or another (Technique: Supervised Learning (Naive Bayes), Library: scikit-learn, Input format: CV/TF-IDF)
- Topic Modeling: Finding themes within a corpus of text (that is, many text documents) (Techniques: Unsupervised learning, Library: scikit-learn, Input format: CV/TF-IDF)

Traditional vs Modern NLP:

When should I use traditional/modern NLP techniques?

Note that traditional NLP involves machine learning techniques, and modern NLP involves deep learning techniques. If we have an option to choose from, it is recommended to start simple. We can ask the following questions to determine which one to choose.

- What is my NLP goal?
  If my goal is sentiment analysis/text classification/topic modelling, these can be performed with traditional techniques. If my goal is text/generation/machine translation/question answering, the traditional techniques may not be sufficient. 
- How much data do I have?
  If I have small data, I can use traditional techniques, and if I have big data, modern techniques can be used.

Similar to Chapter 2, before moving forward, we will create a new environment called 'nlp_machine_learning' and install the following packages:

- jupyter
- matplotlib
- notebook
- numpy
- openpyxl
- pandas
- python
- scikit-learn
- spacy

We also want to install the package: `vaderSentiment`. If we run the usual command to install a package, we will get an error. This is because it is not available in the  default Anaconda channel. We can install it using an alternative channel. It is available in the 'conda-forge' channel. This channel is maintained by the community.

#### Sentiment Analysis

This is used to determine the positivity or negativity of text. A score between +1 and -1 will be given to each block of text.

Note: This will be applied to raw text.

This can be done with libraries such as `VADER`, classification techniques, or modern NLP techniques. Here we will be using `VADER` (Valence Aware Dictionary and sEntiment Reasoner). This works well with informal text (social media text, online reviews).

Note: Steps with different libraries are mostly similar.

Step 1: Import `SentimentIntesityAnalyzer`.

Step 2: Identify the corresponding text.

Step 3: Create a new `SentimentIntesityAnalyzer` object.

Step 4: Obtain polarity scores.

In the output, the important score is the `compound` score. It tells about the positivity/negativity. This is calculated using a series of rules. First, VADER assigns predefined sentiment weights to words (amazing = 2.8, horrible = 2.5). Then incorporate modifiers (not, very, caps, punctuation, ...) and compute a final score.

In [1]:
import pandas as pd

# create a list of sentences
data = [
    "When life gives you lemons, make lemonade! ðŸ™‚",
    "She bought 2 lemons for $1 at Maven Market.",
    "A dozen lemons will make a gallon of lemonade. [AllRecipes]",
    "lemon, lemon, lemons, lemon, lemon, lemons",
    "He's running to the market to get a lemon â€” there's a great sale today.",
    "iced tea is my favorite",
    "I didn't like the taste of that lemonade at all.",
    "My lemons went bad before I could use them, unfortunately.",
]

# expand the column width to see the full sentences
pd.set_option('display.max_colwidth', None)

# turn it into a dataframe
data_df = pd.DataFrame(data, columns=["sentence"])
data_df.head()

# make a copy of the dataframe
df = data_df.copy()
df.head()

Unnamed: 0,sentence
0,"When life gives you lemons, make lemonade! ðŸ™‚"
1,She bought 2 lemons for $1 at Maven Market.
2,A dozen lemons will make a gallon of lemonade. [AllRecipes]
3,"lemon, lemon, lemons, lemon, lemon, lemons"
4,He's running to the market to get a lemon â€” there's a great sale today.


In [2]:
### Import the VADER library

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [3]:
### First, we will test the code with the first sentence.

test = df['sentence'][0]
test

'When life gives you lemons, make lemonade! ðŸ™‚'

In [4]:
analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(test)

{'neg': 0.0, 'neu': 0.75, 'pos': 0.25, 'compound': 0.4587}

The above output tell us which percentage of the sentence is negative/neutral/positive and final sentinal score.

In [5]:
analyzer.polarity_scores(test)['compound']

0.4587

In [6]:
### Now we make it function and apply it to the entire column.

def get_sentiment(text):
    analyzer = SentimentIntensityAnalyzer()
    return analyzer.polarity_scores(text)['compound']

In [7]:
df['sentence'].apply(get_sentiment)

0    0.4587
1    0.0000
2    0.0000
3    0.0000
4    0.6249
5    0.4588
6   -0.2755
7   -0.7096
Name: sentence, dtype: float64

In [8]:
df['sentiment'] = df['sentence'].apply(get_sentiment)
df

Unnamed: 0,sentence,sentiment
0,"When life gives you lemons, make lemonade! ðŸ™‚",0.4587
1,She bought 2 lemons for $1 at Maven Market.,0.0
2,A dozen lemons will make a gallon of lemonade. [AllRecipes],0.0
3,"lemon, lemon, lemons, lemon, lemon, lemons",0.0
4,He's running to the market to get a lemon â€” there's a great sale today.,0.6249
5,iced tea is my favorite,0.4588
6,I didn't like the taste of that lemonade at all.,-0.2755
7,"My lemons went bad before I could use them, unfortunately.",-0.7096


Task: Create two lists containing top 10 feel-good movies and the top 10 darkest movies according to data.

1. Read in the _movie_reviews.csv_ file
2. Apply sentiment analysis to the _movie_info_ column
3. Sort the sentiment scores to return the top 10 and bottom 10 sentiment scores and their corresponding movie titles

In [10]:
df = pd.read_csv('Chapter3_movie_reviews.csv')
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit."
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity."


In [11]:
### Step 1: Import packages for sentiment analysis
### Step 2: Create a new object for sentimental analysis

analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(df['movie_info'][0])

### Sentiment score for the first movie is highly positive.
### Now we can apply it to the entire column.

{'neg': 0.051, 'neu': 0.694, 'pos': 0.255, 'compound': 0.9837}

In [14]:
def get_sentiment(text):
    analyzer = SentimentIntensityAnalyzer()
    return analyzer.polarity_scores(text)['compound']

In [15]:
df['movie_info'].apply(get_sentiment)

0      0.9837
1      0.9237
2      0.9360
3     -0.0334
4      0.9349
        ...  
161   -0.2732
162    0.9158
163   -0.5106
164    0.9081
165    0.1365
Name: movie_info, Length: 166, dtype: float64

In [16]:
df['sentiment'] = df['movie_info'].apply(get_sentiment)
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237


In [18]:
### Top 10 feel-good movies

df[['movie_title','movie_info','sentiment']].sort_values(by='sentiment', ascending=False).head(10)

Unnamed: 0,movie_title,movie_info,sentiment
23,Breakthrough,"BREAKTHROUGH is based on the inspirational true story of one mother's unfaltering love in the face of impossible odds. When Joyce Smith's adopted son John falls through an icy Missouri lake, all hope seems lost. But as John lies lifeless, Joyce refuses to give up. Her steadfast belief inspires those around her to continue to pray for John's recovery, even in the face of every case history and scientific prediction. From producer DeVon Franklin (Miracles from Heaven) and adapted for the screen by Grant Nieporte (Seven Pounds) from Joyce Smith's own book, BREAKTHROUGH is an enthralling reminder that faith and love can create a mountain of hope, and sometimes even a miracle.",0.9915
81,Missing Link,"This April, meet Mr. Link (Galifianakis): 8 feet tall, 630 lbs, and covered in fur, but don't let his appearance fool you... he is funny, sweet, and adorably literal, making him the world's most lovable legend at the heart of Missing Link, the globe-trotting family adventure from LAIKA. Tired of living a solitary life in the Pacific Northwest, Mr. Link recruits fearless explorer Sir Lionel Frost (Jackman) to guide him on a journey to find his long-lost relatives in the fabled valley of Shangri-La. Along with adventurer Adelina Fortnight (Saldana), our fearless trio of explorers encounter more than their fair share of peril as they travel to the far reaches of the world to help their new friend. Through it all, the three learn that sometimes you can find a family in the places you least expect.",0.9909
130,The Laundromat,"When her idyllic vacation takes an unthinkable turn, Ellen Martin (Academy Award winner Meryl Streep) begins investigating a fake insurance policy, only to find herself down a rabbit hole of questionable dealings that can be linked to a Panama City law firm and its vested interest in helping the world's wealthiest citizens amass even larger fortunes. The charming -- and very well-dressed -- founding partners JÃ¼rgen Mossack (Academy Award winner Gary Oldman) and RamÃ³n Fonseca (Golden Globe nominee Antonio Banderas) are experts in the seductive ways shell companies and offshore accounts help the rich and powerful prosper. They are about to show us that Ellen's predicament only hints at the tax evasion, bribery and other illicit absurdities that the super wealthy indulge in to support the world's corrupt financial system.",0.9908
48,Five Feet Apart,"Stella Grant (Haley Lu Richardson) is every bit a seventeen-year-old... she's attached to her laptop and loves her best friends. But unlike most teenagers, she spends much of her time living in a hospital as a cystic fibrosis patient. Her life is full of routines, boundaries and self-control -- all of which is put to the test when she meets an impossibly charming fellow CF patient named Will Newman (Cole Sprouse). There's an instant flirtation, though restrictions dictate that they must maintain a safe distance between them. As their connection intensifies, so does the temptation to throw the rules out the window and embrace that attraction. Further complicating matters is Will's potentially dangerous rebellion against his ongoing medical treatment. Stella gradually inspires Will to live life to the fullest, but can she ultimately save the person she loves when even a single touch is off limits?",0.9889
156,UglyDolls,"In the adorably different town of Uglyville, weird is celebrated, strange is special and beauty is embraced as more than simply meets the eye. Here, the free-spirited Moxy (Clarkson) and her UglyDoll friends live every day in a whirlwind of bliss, letting their freak flags fly in a celebration of life and its endless possibilities. In this all-new story, the UglyDolls will go on a journey beyond the comfortable borders of Uglyville. There, they will confront what it means to be different, struggle with their desire to be loved, and ultimately discover that you don't have to be perfect to be amazing because who you truly are is what matters most.",0.9862
93,Red Joan,"In a picturesque village in England, Joan Stanley (Academy Award (R) winner Dame Judi Dench), lives in contented retirement. Then suddenly her tranquil existence is shattered as she's shockingly arrested by MI5. For Joan has been hiding an incredible past; she is one of the most influential spies in living history... Cambridge University in the 1930s, and the young Joan (Sophie Cookson), a demure physics student, falls intensely in love with a seductively attractive Russian saboteur, Leo. Through him, she begins to see that the world is on a knife-edge and perhaps must be saved from itself in the race to military supremacy. Post-war and now working at a top secret nuclear research facility, Joan is confronted with the impossible: Would you betray your country and your loved ones, if it meant saving them? What price would you pay for peace? Inspired by an extraordinary true story, Red Joan is the taut and emotional discovery of one woman's sacrifice in the face of incredible circumstances. A woman to whom we perhaps all owe our freedom.",0.9848
49,Giant Little Ones,"Franky Winter (Josh Wiggins) and Ballas Kohl (Darren Mann) have been best friends since childhood. They are high school royalty: handsome, stars of the swim team and popular with girls. They live a perfect teenage life - until the night of Franky's epic 17th birthday party, when Franky and Ballas are involved in an unexpected incident that changes their lives forever. Giant Little Ones is a heartfelt and intimate coming-of-age story about friendship, self-discovery, and the power of love without labels.",0.9839
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",0.9837
36,Dumbo,"From Disney and visionary director Tim Burton, the all-new grand live-action adventure ""Dumbo"" expands on the beloved classic story where differences are celebrated, family is cherished and dreams take flight. Circus owner Max Medici (Danny DeVito) enlists former star Holt Farrier (Colin Farrell) and his children Milly (Nico Parker) and Joe (Finley Hobbins) to care for a newborn elephant whose oversized ears make him a laughingstock in an already struggling circus. But when they discover that Dumbo can fly, the circus makes an incredible comeback, attracting persuasive entrepreneur V.A. Vandevere (Michael Keaton), who recruits the peculiar pachyderm for his newest, larger-than-life entertainment venture, Dreamland. Dumbo soars to new heights alongside a charming and spectacular aerial artist, Colette Marchant (Eva Green), until Holt learns that beneath its shiny veneer, Dreamland is full of dark secrets.",0.9801
71,Long Shot,"Fred Flarsky (Seth Rogen) is a gifted and free-spirited journalist with an affinity for trouble. Charlotte Field (Charlize Theron) is one of the most influential women in the world. Smart, sophisticated, and accomplished, she's a powerhouse diplomat with a talent for... well, mostly everything. The two have nothing in common, except that she was his babysitter and childhood crush. When Fred unexpectedly reconnects with Charlotte, he charms her with his self-deprecating humor and his memories of her youthful idealism. As she prepares to make a run for the Presidency, Charlotte impulsively hires Fred as her speechwriter, much to the dismay of her trusted advisors. A fish out of water on Charlotte's elite team, Fred is unprepared for her glamorous lifestyle in the limelight. However, sparks fly as their unmistakable chemistry leads to a round-the-world romance and a series of unexpected and dangerous incidents.",0.9778


In [19]:
### Top 10 dark movies

df[['movie_title','movie_info','sentiment']].sort_values(by='sentiment', ascending=False).tail(10)

Unnamed: 0,movie_title,movie_info,sentiment
40,El Chicano,"When L.A.P.D. Detective Diego Hernandez is assigned a career-making case investigating a vicious cartel, he uncovers links to his brother's supposed suicide and a turf battle that's about to swallow his neighborhood. Torn between playing by the book and seeking justice, he resurrects the masked street legend El Chicano. Now, out to take down his childhood buddy turned gang boss, he sets off a bloody war to defend his city and avenge his brother's murder",-0.9578
142,The Standoff at Sparrow Creek,"After a mass shooting at a police funeral, reclusive ex-cop Gannon finds himself unwittingly forced out of retirement when he realizes that the killer belongs to the same militia he joined after quitting the force. Understanding that the shooting could set off a chain reaction of copycat violence across the country, Gannon quarantines his fellow militiamen in the remote lumber mill they call their headquarters. There, he sets about a series of grueling interrogations, intent on ferreting out the killer and turning him over to the authorities to prevent further bloodshed.",-0.959
87,Pet Sematary,"Based on the seminal horror novel by Stephen King, Pet Sematary follows Dr. Louis Creed (Jason Clarke), who, after relocating with his wife Rachel (Amy Seimetz) and their two young children from Boston to rural Maine, discovers a mysterious burial ground hidden deep in the woods near the family's new home. When tragedy strikes, Louis turns to his unusual neighbor, Jud Crandall (John Lithgow), setting off a perilous chain reaction that unleashes an unfathomable evil with horrific consequences.",-0.959
113,The Curse of La Llorona,"In 1970s Los Angeles, La Llorona is stalking the night -- and the children. Ignoring the eerie warning of a troubled mother suspected of child endangerment, a social worker and her own small kids are soon drawn into a frightening supernatural realm. Their only hope to survive La Llorona's deadly wrath may be a disillusioned priest and the mysticism he practices to keep evil at bay, on the fringes where fear and faith collide.",-0.9628
27,Charlie Says,"Three young women were sentenced to death for the infamous Manson murders. Their sentences became life imprisonment when the death penalty was lifted in California. One young graduate student was sent in to teach them. Through her, we witness their transformations as they face the reality of their horrific crimes.",-0.9643
11,Angel of Mine,"Noomi Rapace (The Girl with the Dragon Tattoo) stars as a woman on the edge in this intense psychological thriller. Having suffered a tragic loss years earlier, Lizzie (Rapace) is trying to rebuild her life when she suddenly becomes obsessed with a neighbor's daughter, believing the girl to be her own child. As Lizzie's shocking, threatening acts grow increasingly dangerous, they lead to an explosive confrontation with the girl's angry, defensive mother (Yvonne Strahovski, ""The Handmaid's Tale"").",-0.9687
154,Triple Threat,"TRIPLE THREAT, the newest feature from Johnson, is an adrenaline fueled and gritty action thriller starring some of the biggest names in action today. Michael Jai White (BLACK DYNAMITE; UNDISPUTED 2: LAST MAN STANDING), Scott Adkins (Marvel's DOCTOR STRANGE; THE EXPENDABLES 2), Michael Bisping (xXx: RETURN OF XANDER CAGE) star as a group of professional assassins hired to take out a billionaire's daughter who is intent on bringing down a major crime syndicate. A down and out team of mercenaries, played by Tony Jaa (ONG BAK TRILOGY; xXx: RETURN OF XANDER CAGE), Iko Uwais (THE RAID 1 & 2; STAR WARS: THE FORCE AWAKENS) and Tiger Chen (MAN OF TAI CHI), must take on the assassins and stop them before they kill their target. The film co-stars JeeJa Yanin (CHOCOLATE) Michael Wong (Cold War) and Celina Jade (WOLF WARRIOR 2).",-0.9696
83,Nightmare Cinema,"In this twisted horror anthology, five strangers are drawn to an abandoned theater and forced to watch their deepest and darkest fears play out before them. Lurking in the shadows is the Projectionist, who preys upon their souls with his collection of disturbing films. As each reel spins its sinister tale, the characters find frightening parallels to their own lives.",-0.9756
148,The Wind,"An unseen evil haunts the homestead in this chilling, folkloric tale of madness, paranoia, and otherworldly terror. Lizzy (Caitlin Gerard) is a tough, resourceful frontierswoman settling a remote stretch of land on the 19th-century American frontier. Isolated from civilization in a desolate wilderness where the wind never stops howling, she begins to sense a sinister presence that seems to be borne of the land itself, an overwhelming dread that her husband (Ashley Zukerman) dismisses as superstition. When a newlywed couple arrives on a nearby homestead, their presence amplifies Lizzy's fears, setting into motion a shocking chain of events. Masterfully blending haunting visuals with pulse-pounding sound design, director Emma Tammi evokes a godforsaken world in which the forces of nature come alive with quivering menace.",-0.9838
7,All Is True,"The year is 1613, Shakespeare is acknowledged as the greatest writer of the age. But disaster strikes when his renowned Globe Theatre burns to the ground, and devastated, Shakespeare returns to Stratford, where he must face a troubled past and a neglected family. Haunted by the death of his only son Hamnet, he struggles to mend the broken relationships with his wife and daughters. In so doing, he is ruthlessly forced to examine his own failings as husband and father. His very personal search for the truth uncovers secrets and lies within a family at war.",-0.9955


In [None]:
### You may have noticed it has done a good job identifying feel-good and darkest moves. However, it has missed some points too.
### This can be addressed with more advanced models.

#### Text Classification