### NLP with Machine Learning

In this chapter, we discuss tasks that can be done using traditional NLP methods, including rules-based and supervised and unsupervised machine learning techniques.

What is machine learning (ML)?

In simple terms, we are trying to teach machines how to learn like humans.

Defi: ML algorithms are used enable computers to learn and make decisions from data.

The algorithms fall under two main categories:

- supervised Learning: Use historical data to predict the future.
  Examples: (Numeric) What will house prices look like for the next 12 months? (Text) How can I flag a suspicious email as spam?
- Unsupervised Learning: Finding patterns and relationships in data.
  Examples: (Numeric) How can I segment my customers? (Text) What hidden themes are in these product reviews?

Common Algorithms:

There are multiple machine learning algorithms. We can use these common ML algorithms for natural language processing once we preprocessed text data.
                        
Supervised Learning - Regression (Linear, Regularized, Time Series Analysis) & Classifications (Logistic, Decision Trees, Random Forest, Gradient Boosted Trees, Naive Bayes)
Unsupervised Learning (DBSCAN, Hierarchical Clustering, Principal Component Analysis, Non-Negative Matrix Factorization)

Traditional NLP:

Common NLP tasks are aften solve using traditional NLP methods, such as simple rules-based techniques or more advanced ML algorithms.

NLP Tasks we will be covering:

- Sentimental Analysis: Identifying the positivity or negativity of text (Technique: Rules-based, Library: VADER, Input format: Raw text (because order matters))
- Text Classification: Classifying text as one label or another (Technique: Supervised Learning (Naive Bayes), Library: scikit-learn, Input format: CV/TF-IDF)
- Topic Modeling: Finding themes within a corpus of text (that is, many text documents) (Techniques: Unsupervised learning, Library: scikit-learn, Input format: CV/TF-IDF)

Traditional vs Modern NLP:

When should I use traditional/modern NLP techniques?

Note that traditional NLP involves machine learning techniques, and modern NLP involves deep learning techniques. If we have an option to choose from, it is recommended to start simple. We can ask the following questions to determine which one to choose.

- What is my NLP goal?
  If my goal is sentiment analysis/text classification/topic modelling, these can be performed with traditional techniques. If my goal is text/generation/machine translation/question answering, the traditional techniques may not be sufficient. 
- How much data do I have?
  If I have small data, I can use traditional techniques, and if I have big data, modern techniques can be used.

Similar to Chapter 2, before moving forward, we will create a new environment called 'nlp_machine_learning' and install the following packages:

- jupyter
- matplotlib
- notebook
- numpy
- openpyxl
- pandas
- python
- scikit-learn
- spacy

We also want to install the package: `vaderSentiment`. If we run the usual command to install a package, we will get an error. This is because it is not available in the  default Anaconda channel. We can install it using an alternative channel. It is available in the 'conda-forge' channel. This channel is maintained by the community.

#### Sentiment Analysis

This is used to determine the positivity or negativity of text. A score between +1 and -1 will be given to each block of text.

Note: This will be applied to raw text.

This can be done with libraries such as `VADER`, classification techniques, or modern NLP techniques. Here we will be using `VADER` (Valence Aware Dictionary and sEntiment Reasoner). This works well with informal text (social media text, online reviews).

Note: Steps with different libraries are mostly similar.

Step 1: Import `SentimentIntesityAnalyzer`.

Step 2: Identify the corresponding text.

Step 3: Create a new `SentimentIntesityAnalyzer` object.

Step 4: Obtain polarity scores.

In the output, the important score is the `compound` score. It tells about the positivity/negativity. This is calculated using a series of rules. First, VADER assigns predefined sentiment weights to words (amazing = 2.8, horrible = 2.5). Then incorporate modifiers (not, very, caps, punctuation, ...) and compute a final score.

In [1]:
import pandas as pd

# create a list of sentences
data = [
    "When life gives you lemons, make lemonade! ðŸ™‚",
    "She bought 2 lemons for $1 at Maven Market.",
    "A dozen lemons will make a gallon of lemonade. [AllRecipes]",
    "lemon, lemon, lemons, lemon, lemon, lemons",
    "He's running to the market to get a lemon â€” there's a great sale today.",
    "iced tea is my favorite",
    "I didn't like the taste of that lemonade at all.",
    "My lemons went bad before I could use them, unfortunately.",
]

# expand the column width to see the full sentences
pd.set_option('display.max_colwidth', None)

# turn it into a dataframe
data_df = pd.DataFrame(data, columns=["sentence"])
data_df.head()

# make a copy of the dataframe
df = data_df.copy()
df.head()

Unnamed: 0,sentence
0,"When life gives you lemons, make lemonade! ðŸ™‚"
1,She bought 2 lemons for $1 at Maven Market.
2,A dozen lemons will make a gallon of lemonade. [AllRecipes]
3,"lemon, lemon, lemons, lemon, lemon, lemons"
4,He's running to the market to get a lemon â€” there's a great sale today.


In [2]:
### Import the VADER library

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [3]:
### First, we will test the code with the first sentence.

test = df['sentence'][0]
test

'When life gives you lemons, make lemonade! ðŸ™‚'

In [4]:
analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(test)

{'neg': 0.0, 'neu': 0.75, 'pos': 0.25, 'compound': 0.4587}

The above output tell us which percentage of the sentence is negative/neutral/positive and final sentinal score.

In [5]:
analyzer.polarity_scores(test)['compound']

0.4587

In [6]:
### Now we make it function and apply it to the entire column.

def get_sentiment(text):
    analyzer = SentimentIntensityAnalyzer()
    return analyzer.polarity_scores(text)['compound']

In [7]:
df['sentence'].apply(get_sentiment)

0    0.4587
1    0.0000
2    0.0000
3    0.0000
4    0.6249
5    0.4588
6   -0.2755
7   -0.7096
Name: sentence, dtype: float64

In [8]:
df['sentiment'] = df['sentence'].apply(get_sentiment)
df

Unnamed: 0,sentence,sentiment
0,"When life gives you lemons, make lemonade! ðŸ™‚",0.4587
1,She bought 2 lemons for $1 at Maven Market.,0.0
2,A dozen lemons will make a gallon of lemonade. [AllRecipes],0.0
3,"lemon, lemon, lemons, lemon, lemon, lemons",0.0
4,He's running to the market to get a lemon â€” there's a great sale today.,0.6249
5,iced tea is my favorite,0.4588
6,I didn't like the taste of that lemonade at all.,-0.2755
7,"My lemons went bad before I could use them, unfortunately.",-0.7096


Task: Create two lists containing top 10 feel-good movies and the top 10 darkest movies according to data.

1. Read in the _movie_reviews.csv_ file
2. Apply sentiment analysis to the _movie_info_ column
3. Sort the sentiment scores to return the top 10 and bottom 10 sentiment scores and their corresponding movie titles

In [9]:
df = pd.read_csv('Chapter3_movie_reviews.csv')
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit."
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity."


In [10]:
### Step 1: Import packages for sentiment analysis
### Step 2: Create a new object for sentimental analysis

analyzer = SentimentIntensityAnalyzer()
analyzer.polarity_scores(df['movie_info'][0])

### Sentiment score for the first movie is highly positive.
### Now we can apply it to the entire column.

{'neg': 0.051, 'neu': 0.694, 'pos': 0.255, 'compound': 0.9837}

In [11]:
def get_sentiment(text):
    analyzer = SentimentIntensityAnalyzer()
    return analyzer.polarity_scores(text)['compound']

In [12]:
df['movie_info'].apply(get_sentiment)

0      0.9837
1      0.9237
2      0.9360
3     -0.0334
4      0.9349
        ...  
161   -0.2732
162    0.9158
163   -0.5106
164    0.9081
165    0.1365
Name: movie_info, Length: 166, dtype: float64

In [13]:
df['sentiment'] = df['movie_info'].apply(get_sentiment)
df.head(2)

Unnamed: 0,movie_title,rating,genre,in_theaters_date,movie_info,directors,director_gender,tomatometer_rating,audience_rating,critics_consensus,sentiment
0,A Dog's Journey,PG,"Drama, Kids & Family",5/17/19,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",Gail Mancuso,female,50,92,"A Dog's Journey is as sentimental as one might expect, but even cynical viewers may find their ability to resist shedding a tear stretched to the puppermost limit.",0.9837
1,A Dog's Way Home,PG,Drama,1/11/19,"Separated from her owner, a dog sets off on an 400-mile journey to get back to the safety and security of the place she calls home. Along the way, she meets a series of new friends and manages to bring a little bit of comfort and joy to their lives.",Charles Martin Smith,male,60,71,"A Dog's Way Home may not quite be a family-friendly animal drama fan's best friend, but this canine adventure is no less heartwarming for its familiarity.",0.9237


In [14]:
### Top 10 feel-good movies

df[['movie_title','movie_info','sentiment']].sort_values(by='sentiment', ascending=False).head(10)

Unnamed: 0,movie_title,movie_info,sentiment
23,Breakthrough,"BREAKTHROUGH is based on the inspirational true story of one mother's unfaltering love in the face of impossible odds. When Joyce Smith's adopted son John falls through an icy Missouri lake, all hope seems lost. But as John lies lifeless, Joyce refuses to give up. Her steadfast belief inspires those around her to continue to pray for John's recovery, even in the face of every case history and scientific prediction. From producer DeVon Franklin (Miracles from Heaven) and adapted for the screen by Grant Nieporte (Seven Pounds) from Joyce Smith's own book, BREAKTHROUGH is an enthralling reminder that faith and love can create a mountain of hope, and sometimes even a miracle.",0.9915
81,Missing Link,"This April, meet Mr. Link (Galifianakis): 8 feet tall, 630 lbs, and covered in fur, but don't let his appearance fool you... he is funny, sweet, and adorably literal, making him the world's most lovable legend at the heart of Missing Link, the globe-trotting family adventure from LAIKA. Tired of living a solitary life in the Pacific Northwest, Mr. Link recruits fearless explorer Sir Lionel Frost (Jackman) to guide him on a journey to find his long-lost relatives in the fabled valley of Shangri-La. Along with adventurer Adelina Fortnight (Saldana), our fearless trio of explorers encounter more than their fair share of peril as they travel to the far reaches of the world to help their new friend. Through it all, the three learn that sometimes you can find a family in the places you least expect.",0.9909
130,The Laundromat,"When her idyllic vacation takes an unthinkable turn, Ellen Martin (Academy Award winner Meryl Streep) begins investigating a fake insurance policy, only to find herself down a rabbit hole of questionable dealings that can be linked to a Panama City law firm and its vested interest in helping the world's wealthiest citizens amass even larger fortunes. The charming -- and very well-dressed -- founding partners JÃ¼rgen Mossack (Academy Award winner Gary Oldman) and RamÃ³n Fonseca (Golden Globe nominee Antonio Banderas) are experts in the seductive ways shell companies and offshore accounts help the rich and powerful prosper. They are about to show us that Ellen's predicament only hints at the tax evasion, bribery and other illicit absurdities that the super wealthy indulge in to support the world's corrupt financial system.",0.9908
48,Five Feet Apart,"Stella Grant (Haley Lu Richardson) is every bit a seventeen-year-old... she's attached to her laptop and loves her best friends. But unlike most teenagers, she spends much of her time living in a hospital as a cystic fibrosis patient. Her life is full of routines, boundaries and self-control -- all of which is put to the test when she meets an impossibly charming fellow CF patient named Will Newman (Cole Sprouse). There's an instant flirtation, though restrictions dictate that they must maintain a safe distance between them. As their connection intensifies, so does the temptation to throw the rules out the window and embrace that attraction. Further complicating matters is Will's potentially dangerous rebellion against his ongoing medical treatment. Stella gradually inspires Will to live life to the fullest, but can she ultimately save the person she loves when even a single touch is off limits?",0.9889
156,UglyDolls,"In the adorably different town of Uglyville, weird is celebrated, strange is special and beauty is embraced as more than simply meets the eye. Here, the free-spirited Moxy (Clarkson) and her UglyDoll friends live every day in a whirlwind of bliss, letting their freak flags fly in a celebration of life and its endless possibilities. In this all-new story, the UglyDolls will go on a journey beyond the comfortable borders of Uglyville. There, they will confront what it means to be different, struggle with their desire to be loved, and ultimately discover that you don't have to be perfect to be amazing because who you truly are is what matters most.",0.9862
93,Red Joan,"In a picturesque village in England, Joan Stanley (Academy Award (R) winner Dame Judi Dench), lives in contented retirement. Then suddenly her tranquil existence is shattered as she's shockingly arrested by MI5. For Joan has been hiding an incredible past; she is one of the most influential spies in living history... Cambridge University in the 1930s, and the young Joan (Sophie Cookson), a demure physics student, falls intensely in love with a seductively attractive Russian saboteur, Leo. Through him, she begins to see that the world is on a knife-edge and perhaps must be saved from itself in the race to military supremacy. Post-war and now working at a top secret nuclear research facility, Joan is confronted with the impossible: Would you betray your country and your loved ones, if it meant saving them? What price would you pay for peace? Inspired by an extraordinary true story, Red Joan is the taut and emotional discovery of one woman's sacrifice in the face of incredible circumstances. A woman to whom we perhaps all owe our freedom.",0.9848
49,Giant Little Ones,"Franky Winter (Josh Wiggins) and Ballas Kohl (Darren Mann) have been best friends since childhood. They are high school royalty: handsome, stars of the swim team and popular with girls. They live a perfect teenage life - until the night of Franky's epic 17th birthday party, when Franky and Ballas are involved in an unexpected incident that changes their lives forever. Giant Little Ones is a heartfelt and intimate coming-of-age story about friendship, self-discovery, and the power of love without labels.",0.9839
0,A Dog's Journey,"Bailey (voiced again by Josh Gad) is living the good life on the Michigan farm of his ""boy,"" Ethan (Dennis Quaid) and Ethan's wife Hannah (Marg Helgenberger). He even has a new playmate: Ethan and Hannah's baby granddaughter, CJ. The problem is that CJ's mom, Gloria (Betty Gilpin), decides to take CJ away. As Bailey's soul prepares to leave this life for a new one, he makes a promise to Ethan to find CJ and protect her at any cost. Thus begins Bailey's adventure through multiple lives filled with love, friendship and devotion as he, CJ (Kathryn Prescott), and CJ's best friend Trent (Henry Lau) experience joy and heartbreak, music and laughter, and few really good belly rubs.",0.9837
36,Dumbo,"From Disney and visionary director Tim Burton, the all-new grand live-action adventure ""Dumbo"" expands on the beloved classic story where differences are celebrated, family is cherished and dreams take flight. Circus owner Max Medici (Danny DeVito) enlists former star Holt Farrier (Colin Farrell) and his children Milly (Nico Parker) and Joe (Finley Hobbins) to care for a newborn elephant whose oversized ears make him a laughingstock in an already struggling circus. But when they discover that Dumbo can fly, the circus makes an incredible comeback, attracting persuasive entrepreneur V.A. Vandevere (Michael Keaton), who recruits the peculiar pachyderm for his newest, larger-than-life entertainment venture, Dreamland. Dumbo soars to new heights alongside a charming and spectacular aerial artist, Colette Marchant (Eva Green), until Holt learns that beneath its shiny veneer, Dreamland is full of dark secrets.",0.9801
71,Long Shot,"Fred Flarsky (Seth Rogen) is a gifted and free-spirited journalist with an affinity for trouble. Charlotte Field (Charlize Theron) is one of the most influential women in the world. Smart, sophisticated, and accomplished, she's a powerhouse diplomat with a talent for... well, mostly everything. The two have nothing in common, except that she was his babysitter and childhood crush. When Fred unexpectedly reconnects with Charlotte, he charms her with his self-deprecating humor and his memories of her youthful idealism. As she prepares to make a run for the Presidency, Charlotte impulsively hires Fred as her speechwriter, much to the dismay of her trusted advisors. A fish out of water on Charlotte's elite team, Fred is unprepared for her glamorous lifestyle in the limelight. However, sparks fly as their unmistakable chemistry leads to a round-the-world romance and a series of unexpected and dangerous incidents.",0.9778


In [15]:
### Top 10 dark movies

df[['movie_title','movie_info','sentiment']].sort_values(by='sentiment', ascending=False).tail(10)

Unnamed: 0,movie_title,movie_info,sentiment
40,El Chicano,"When L.A.P.D. Detective Diego Hernandez is assigned a career-making case investigating a vicious cartel, he uncovers links to his brother's supposed suicide and a turf battle that's about to swallow his neighborhood. Torn between playing by the book and seeking justice, he resurrects the masked street legend El Chicano. Now, out to take down his childhood buddy turned gang boss, he sets off a bloody war to defend his city and avenge his brother's murder",-0.9578
142,The Standoff at Sparrow Creek,"After a mass shooting at a police funeral, reclusive ex-cop Gannon finds himself unwittingly forced out of retirement when he realizes that the killer belongs to the same militia he joined after quitting the force. Understanding that the shooting could set off a chain reaction of copycat violence across the country, Gannon quarantines his fellow militiamen in the remote lumber mill they call their headquarters. There, he sets about a series of grueling interrogations, intent on ferreting out the killer and turning him over to the authorities to prevent further bloodshed.",-0.959
87,Pet Sematary,"Based on the seminal horror novel by Stephen King, Pet Sematary follows Dr. Louis Creed (Jason Clarke), who, after relocating with his wife Rachel (Amy Seimetz) and their two young children from Boston to rural Maine, discovers a mysterious burial ground hidden deep in the woods near the family's new home. When tragedy strikes, Louis turns to his unusual neighbor, Jud Crandall (John Lithgow), setting off a perilous chain reaction that unleashes an unfathomable evil with horrific consequences.",-0.959
113,The Curse of La Llorona,"In 1970s Los Angeles, La Llorona is stalking the night -- and the children. Ignoring the eerie warning of a troubled mother suspected of child endangerment, a social worker and her own small kids are soon drawn into a frightening supernatural realm. Their only hope to survive La Llorona's deadly wrath may be a disillusioned priest and the mysticism he practices to keep evil at bay, on the fringes where fear and faith collide.",-0.9628
27,Charlie Says,"Three young women were sentenced to death for the infamous Manson murders. Their sentences became life imprisonment when the death penalty was lifted in California. One young graduate student was sent in to teach them. Through her, we witness their transformations as they face the reality of their horrific crimes.",-0.9643
11,Angel of Mine,"Noomi Rapace (The Girl with the Dragon Tattoo) stars as a woman on the edge in this intense psychological thriller. Having suffered a tragic loss years earlier, Lizzie (Rapace) is trying to rebuild her life when she suddenly becomes obsessed with a neighbor's daughter, believing the girl to be her own child. As Lizzie's shocking, threatening acts grow increasingly dangerous, they lead to an explosive confrontation with the girl's angry, defensive mother (Yvonne Strahovski, ""The Handmaid's Tale"").",-0.9687
154,Triple Threat,"TRIPLE THREAT, the newest feature from Johnson, is an adrenaline fueled and gritty action thriller starring some of the biggest names in action today. Michael Jai White (BLACK DYNAMITE; UNDISPUTED 2: LAST MAN STANDING), Scott Adkins (Marvel's DOCTOR STRANGE; THE EXPENDABLES 2), Michael Bisping (xXx: RETURN OF XANDER CAGE) star as a group of professional assassins hired to take out a billionaire's daughter who is intent on bringing down a major crime syndicate. A down and out team of mercenaries, played by Tony Jaa (ONG BAK TRILOGY; xXx: RETURN OF XANDER CAGE), Iko Uwais (THE RAID 1 & 2; STAR WARS: THE FORCE AWAKENS) and Tiger Chen (MAN OF TAI CHI), must take on the assassins and stop them before they kill their target. The film co-stars JeeJa Yanin (CHOCOLATE) Michael Wong (Cold War) and Celina Jade (WOLF WARRIOR 2).",-0.9696
83,Nightmare Cinema,"In this twisted horror anthology, five strangers are drawn to an abandoned theater and forced to watch their deepest and darkest fears play out before them. Lurking in the shadows is the Projectionist, who preys upon their souls with his collection of disturbing films. As each reel spins its sinister tale, the characters find frightening parallels to their own lives.",-0.9756
148,The Wind,"An unseen evil haunts the homestead in this chilling, folkloric tale of madness, paranoia, and otherworldly terror. Lizzy (Caitlin Gerard) is a tough, resourceful frontierswoman settling a remote stretch of land on the 19th-century American frontier. Isolated from civilization in a desolate wilderness where the wind never stops howling, she begins to sense a sinister presence that seems to be borne of the land itself, an overwhelming dread that her husband (Ashley Zukerman) dismisses as superstition. When a newlywed couple arrives on a nearby homestead, their presence amplifies Lizzy's fears, setting into motion a shocking chain of events. Masterfully blending haunting visuals with pulse-pounding sound design, director Emma Tammi evokes a godforsaken world in which the forces of nature come alive with quivering menace.",-0.9838
7,All Is True,"The year is 1613, Shakespeare is acknowledged as the greatest writer of the age. But disaster strikes when his renowned Globe Theatre burns to the ground, and devastated, Shakespeare returns to Stratford, where he must face a troubled past and a neglected family. Haunted by the death of his only son Hamnet, he struggles to mend the broken relationships with his wife and daughters. In so doing, he is ruthlessly forced to examine his own failings as husband and father. His very personal search for the truth uncovers secrets and lies within a family at war.",-0.9955


In [16]:
### You may have noticed that it has done a good job identifying feel-good and darkest moves. However, it has missed some points too.
### This can be addressed with more advanced models.

#### Text Classification

This is a supervised learning NLP example. Text classification is used to categorize text into groups based on labeled data.

Example: Let's say we have emails, and some are not spam and others are spam. Thus, these emails are prelabelled as spam and not spam. Then, for a new email, text classification will tell us if it's spam or not spam.

Text Classification Algorithms:

We can input vectorized text data into any classification algorithm (KNN, Logistic Regression, Decision Trees, Random Forest, Gradient Boosted Trees).

We also focus on Naive Bayes, which is another classification algorithm that works specifically well on text data.

Which algorithm should I choose?

There is a correct answer for this. For small datasets (< 10k rows), we can start with Naive Bayes and other simple models (logistic regression and KNN).

For mediam size datasests (< 100k rows), start with Logistic Regression and other classification techniques such as Decision Trees, Random Forests, and Gradient Boosted Trees.

For large datasets (> 1M rows), strat with Gradient Boosted Trees and potentially move on to modern NLP techniques with LLMs.

##### Naive Bayes

This is a common technique used for text classification, and it is based on Bayes theorem, which assumes conditionally independent features.

Example: Spam or not? It assumes an email containing the words 'ASAP' and '$' is independent. However, in reality, they have a high chance of appearing together. However, the algorithm works surprisingly well on text data. Chance of email is a spam: P(Spam|ASAP) = (P(ASAP|Spam)*P(Spam))/P(ASAP)

We can extend the above probability calculation for two words: P(Spam|ASAP, $)

To perform Naive Bayes (NB), we will be using sklearn's `MultinomialNB`. The input should be a CountVectorizer or tfidfVectorizer.
There are no parameters to tune with Naive Bayes.

Note: If we consider counts, we will be using `MultinomialNB`. However, we just examine the existence of keywords; we can use `BernoulliNB`.

##### GOAL: Predict which reviews are high priority (vs low priority) that we need to address right away

In [17]:
# import libraries
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

In [18]:
### Import Data (pop chip reviews)

reviews = pd.read_excel('Chapter3_Popchip_Reviews.xlsx')
reviews.head(3)

Unnamed: 0,Id,UserId,Rating,Priority,Title,Text
0,23689,A21SYVGVNG8RAS,5,Low,Yummy snacks!,Popchips are the bomb!! I use the parmesan garlic to scoop up cottage cheese as a healthy alternative to chips and dip. My healthy eating program is saved.
1,23690,AQJYXC0MPRQJL,5,Low,Great chip that is different from the rest,"I like the puffed nature of this chip that makes it more unique in the chip market. I ordered the Salt and Vinegar and absolutely love that flavor, hands down my favorite chip ever. I have tried the cheddar and regular flavors as well. The cheddar is about a 4/5 and the regular is about a 3/5 because I prefer strong flavors and obviously that would not be the case for the regular. The Salt and Vinegar is kind of weak compared to some regular S&V chips, but is quite flavorful and makes you wanting to come back for more."
2,23691,A30NYUHEDLWI0Y,5,Low,Great Alternative to Potato Chips,"I just love these chips! I was always a big fan of potato chips, but haven't had one since I discovered popchips. They are great for dipping or all alone. I am constantly re-ordering them. One note however-if you are on a low salt diet these chips are probably not for you. They are high in sodium. We go through a case every two months. If you love them it pays to join the subscribe and save program through Amazon. You save money and stay supplied!"


Note that based on the reviews, we have identified high and low priority cases. The goal is, when a new review comes, identify whether it is a high/low priority case.

In [19]:
reviews.shape

### Thus, there are 564 reviews

(564, 6)

In [20]:
### Check how many high priority and low priority lavel cases we have in the data.

reviews['Priority'].value_counts()

Priority
Low     447
High    117
Name: count, dtype: int64

**Note**: The next step is to clean the reviews. That is, apply all the text cleaning techniques we carried out in Chapter 2 using Pandas and spaCy. This is available in the Python script: "Chapter3_maven_text_preprocessing." and we will directly import that to clean the reviews (Text) 

In [21]:
import Chapter3_maven_text_preprocessing

In [22]:
Chapter3_maven_text_preprocessing.clean_and_normalize(reviews['Text'])

0                                                                                                                                                                                                                                                                                                                                               popchip bomb   use parmesan garlic scoop cottage cheese healthy alternative chip dip   healthy eat program save
1                                                                                                                                                                       like puff nature chip make unique chip market   order salt vinegar absolutely love flavor hand favorite chip   try cheddar regular flavor   cheddar 45 regular 35 prefer strong flavor obviously case regular   salt vinegar kind weak compare regular sv chip flavorful make want come
2                                                                                                       

In [24]:
reviews['Text_Clean'] = Chapter3_maven_text_preprocessing.clean_and_normalize(reviews['Text'])
reviews.head(3)

Unnamed: 0,Id,UserId,Rating,Priority,Title,Text,Text_Clean
0,23689,A21SYVGVNG8RAS,5,Low,Yummy snacks!,Popchips are the bomb!! I use the parmesan garlic to scoop up cottage cheese as a healthy alternative to chips and dip. My healthy eating program is saved.,popchip bomb use parmesan garlic scoop cottage cheese healthy alternative chip dip healthy eat program save
1,23690,AQJYXC0MPRQJL,5,Low,Great chip that is different from the rest,"I like the puffed nature of this chip that makes it more unique in the chip market. I ordered the Salt and Vinegar and absolutely love that flavor, hands down my favorite chip ever. I have tried the cheddar and regular flavors as well. The cheddar is about a 4/5 and the regular is about a 3/5 because I prefer strong flavors and obviously that would not be the case for the regular. The Salt and Vinegar is kind of weak compared to some regular S&V chips, but is quite flavorful and makes you wanting to come back for more.",like puff nature chip make unique chip market order salt vinegar absolutely love flavor hand favorite chip try cheddar regular flavor cheddar 45 regular 35 prefer strong flavor obviously case regular salt vinegar kind weak compare regular sv chip flavorful make want come
2,23691,A30NYUHEDLWI0Y,5,Low,Great Alternative to Potato Chips,"I just love these chips! I was always a big fan of potato chips, but haven't had one since I discovered popchips. They are great for dipping or all alone. I am constantly re-ordering them. One note however-if you are on a low salt diet these chips are probably not for you. They are high in sodium. We go through a case every two months. If you love them it pays to join the subscribe and save program through Amazon. You save money and stay supplied!",love chip big fan potato chip not discover popchip great dipping constantly reorder note howeverif low salt diet chip probably high sodium case month love pay join subscribe save program amazon save money stay supply


**First Fit**: CV, Naive Bayes

In [27]:
cv = CountVectorizer()
X = cv.fit_transform(reviews['Text_Clean'])
X_df = pd.DataFrame(X.toarray(),  columns=cv.get_feature_names_out())
X_df

Unnamed: 0,08,08ounce,0br,10,100,1000,100150,100cal,100calories,100cals,...,yuck,yucky,yum,yummy,yummybr,zero,zesty,zip,ziplock,zowie
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
559,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
560,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
561,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
562,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


There are far too many columns (terms). This may lead to overfitting. Thus, we will add some paramters to reduce the number of columns.

In [30]:
cv = CountVectorizer(stop_words='english', min_df=0.2, ngram_range=(1,2))
X = cv.fit_transform(reviews['Text_Clean'])
X_df = pd.DataFrame(X.toarray(),  columns=cv.get_feature_names_out())
X_df

Unnamed: 0,bag,buy,calorie,chip,eat,flavor,good,great,like,love,popchip,potato,potato chip,salt,snack,taste,try
0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0
1,0,0,0,4,0,3,0,0,1,1,0,0,0,2,0,0,1
2,0,0,0,3,0,0,0,1,0,2,1,1,1,1,0,0,0
3,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0
4,1,0,0,2,1,2,0,1,2,0,0,1,1,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
559,0,0,0,3,3,1,1,5,0,1,1,4,3,0,0,1,0
560,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1
561,0,0,0,2,0,1,0,2,0,0,0,0,0,0,0,2,0
562,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0


In [31]:
### The X_df becomes our input to the model. Output of the model is priority.

y = reviews['Priority']
y.head()

0     Low
1     Low
2     Low
3    High
4     Low
Name: Priority, dtype: object

In [32]:
### Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X_df, y, test_size=0.2, random_state=42)

### Model
model = MultinomialNB()
model.fit(X_train, y_train)

### Predict
y_pred = model.predict(X_test)

### Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.8407079646017699
              precision    recall  f1-score   support

        High       0.60      0.16      0.25        19
         Low       0.85      0.98      0.91        94

    accuracy                           0.84       113
   macro avg       0.73      0.57      0.58       113
weighted avg       0.81      0.84      0.80       113



If the accuracy is closer to 1, it is a better model.

In the classification report, we have a bit more detail. For high priority cases, it showed 60% acuracy, and for low priority cases, it showed 85% accuracy.

When prioproty is actually low, the model predict 98% them correctly. However, for high priority cases, the model only predict 16% correctly.

However, we can further improve this with additional steps.

Since now we have trainied the model, we can test it with unseen data.

In [33]:
new_reviews = pd.Series([
    "Pop chips are my favorite! I love these chips so much.",
    "Taste bad. I don't like the flavor options or taste.",
    "Solid snack."
])

In [34]:
new_reviews_clean = Chapter3_maven_text_preprocessing.clean_and_normalize(new_reviews)

In [36]:
### When we are vectorizing the new data, we do not want to fit again because we want to have the matrix with the same terms as for the training data.
### Thus, we only transform the data.

pd.DataFrame(cv.transform(new_reviews_clean).toarray(), columns=cv.get_feature_names_out())

Unnamed: 0,bag,buy,calorie,chip,eat,flavor,good,great,like,love,popchip,potato,potato chip,salt,snack,taste,try
0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,2,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


In [37]:
new_review_df = pd.DataFrame(cv.transform(new_reviews_clean).toarray(), columns=cv.get_feature_names_out())

In [38]:
model.predict(new_review_df)

array(['Low', 'High', 'Low'], dtype='<U4')

Among the three reviews, the second review has the high priority.
This makes sense as second review is "Taste bad. I don't like the flavor options or taste."

Now we try different vectorizer and a different model.

**Second fit**: tfidf, logistic regression

In [40]:
tv = TfidfVectorizer(stop_words='english', min_df=0.2, ngram_range=(1,2))
Xt = tv.fit_transform(reviews['Text_Clean'])
Xt_df = pd.DataFrame(Xt.toarray(),  columns=tv.get_feature_names_out())
Xt_df

Unnamed: 0,bag,buy,calorie,chip,eat,flavor,good,great,like,love,popchip,potato,potato chip,salt,snack,taste,try
0,0.000000,0.000000,0.0,0.392603,0.656435,0.000000,0.000000,0.000000,0.000000,0.000000,0.644170,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,0.000000,0.000000,0.0,0.561185,0.000000,0.537701,0.000000,0.000000,0.195524,0.213766,0.000000,0.000000,0.000000,0.513094,0.000000,0.000000,0.220814
2,0.000000,0.000000,0.0,0.517908,0.000000,0.000000,0.000000,0.295101,0.000000,0.526082,0.283255,0.277355,0.333330,0.315684,0.000000,0.000000,0.000000
3,0.000000,0.690063,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.512918,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.510616,0.000000
4,0.252776,0.000000,0.0,0.340747,0.284866,0.435318,0.000000,0.291234,0.474884,0.000000,0.000000,0.273721,0.328962,0.000000,0.000000,0.236376,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
559,0.000000,0.000000,0.0,0.216106,0.361330,0.092028,0.103897,0.615680,0.000000,0.109758,0.118193,0.462925,0.417263,0.000000,0.000000,0.099942,0.000000
560,0.381673,0.000000,0.0,0.000000,0.000000,0.328649,0.000000,0.439742,0.000000,0.000000,0.422089,0.000000,0.000000,0.000000,0.459181,0.000000,0.404891
561,0.000000,0.000000,0.0,0.399843,0.000000,0.255407,0.000000,0.683486,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.554742,0.000000
562,0.000000,0.000000,0.0,0.000000,0.000000,0.537244,0.606536,0.000000,0.586074,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [42]:
### Now we have new input for our model (instead of counts)
### However, the output is the same.

### Train/Test split
Xt_train, Xt_test, yt_train, yt_test = train_test_split(Xt_df, y, test_size=0.2, random_state=42)

### Model
model_lr = LogisticRegression()
model_lr.fit(Xt_train, yt_train)

### Predict
y_pred_lr = model_lr.predict(Xt_test)

### Evaluate
print("Accuracy:", accuracy_score(yt_test, y_pred_lr))
print(classification_report(yt_test, y_pred_lr))

Accuracy: 0.8407079646017699
              precision    recall  f1-score   support

        High       1.00      0.05      0.10        19
         Low       0.84      1.00      0.91        94

    accuracy                           0.84       113
   macro avg       0.92      0.53      0.51       113
weighted avg       0.87      0.84      0.78       113



Note that accuracy is very similar to Naive Bayes. However, we can observe improvements in precision (60% -> 100%). That is all the reviews predicted as high priority with the logistic regression model is high priority.

Also, when the actual priority is low, the logistics regression model flags all of them correctly.
Nonetheless, when you check recall for high-priority cases, there are only 5% correct identifications with the logistic regression model. This is lower than the Naive Bayes model (16% -> 5%).

Since our original goal is to identify high-priority cases, this is not good, and we need to make our model better.

Also note that withthe  logistic regression model, it is returning the probability for high and low (instead of binary output). Before moving foarward we quickly look at the probability for high and low priority probabilities for each case.

In [43]:
model_lr.predict_proba(Xt_df)

array([[0.2590195 , 0.7409805 ],
       [0.22819851, 0.77180149],
       [0.07517457, 0.92482543],
       ...,
       [0.26803343, 0.73196657],
       [0.24644181, 0.75355819],
       [0.20264458, 0.79735542]])

The first column is high priority probability, and the row total is 1.

In [46]:
### Prediction with the logistics regression model:

model_lr.predict_proba(Xt_df)[:,0]

array([0.2590195 , 0.22819851, 0.07517457, 0.55084249, 0.28135196,
       0.42143348, 0.13014302, 0.45950577, 0.19000063, 0.20928769,
       0.08128715, 0.11913647, 0.18970972, 0.12345081, 0.17529228,
       0.12367452, 0.22577408, 0.09994249, 0.27437717, 0.33074725,
       0.27090502, 0.24832781, 0.23109895, 0.13185631, 0.26086998,
       0.34978928, 0.53080851, 0.15765899, 0.28110348, 0.20123437,
       0.20087253, 0.16332925, 0.13850747, 0.15590012, 0.11812842,
       0.09899561, 0.32010859, 0.21458308, 0.09494073, 0.19633111,
       0.20512783, 0.10352704, 0.19914808, 0.27270382, 0.22237213,
       0.22405244, 0.11570263, 0.24937863, 0.19781639, 0.13421686,
       0.17134759, 0.1937814 , 0.17032817, 0.33000877, 0.39657518,
       0.20886955, 0.12573129, 0.17000677, 0.30940847, 0.13326123,
       0.17506651, 0.07552167, 0.13144144, 0.15706054, 0.12930077,
       0.28183166, 0.25899659, 0.14538294, 0.57666407, 0.1204422 ,
       0.16132391, 0.30459573, 0.10361138, 0.36380046, 0.07670

In [45]:
### Prediction with Naive Bayes model:

model.predict_proba(X_df)[:,0]

array([2.00606025e-01, 3.05999182e-01, 4.50902259e-02, 4.56157826e-01,
       3.20883714e-01, 4.18204899e-01, 1.54005429e-01, 3.75503577e-01,
       1.78683480e-01, 2.83890796e-01, 7.26104077e-02, 1.33490244e-05,
       1.93733762e-01, 1.38622499e-01, 1.10357785e-01, 1.47288781e-01,
       2.39527407e-01, 2.50905850e-02, 3.54375244e-01, 6.02176877e-01,
       2.46526372e-01, 7.57451449e-01, 1.93450910e-01, 1.88707238e-01,
       5.37637671e-01, 3.74549055e-01, 4.91842071e-01, 8.87985501e-02,
       3.07093068e-01, 2.22732121e-01, 1.43487730e-01, 2.02852387e-01,
       1.54249969e-01, 1.68274969e-01, 4.21893217e-02, 9.62578354e-02,
       3.87608333e-01, 1.75239894e-01, 9.01422882e-02, 1.92344597e-01,
       1.99513854e-01, 9.23946972e-02, 1.37767006e-01, 2.53028853e-01,
       2.13769548e-01, 1.42396287e-01, 1.05727556e-01, 2.34878470e-01,
       1.64877877e-01, 8.20076513e-02, 2.11173147e-01, 1.15203761e-01,
       1.89430799e-01, 2.74138256e-01, 3.97848365e-01, 2.53163800e-01,
      

In [47]:
### Now we will add these details to the dataframe

reviews['Predictions_NB'] = model.predict_proba(X_df)[:,0]

reviews['Predictions_LR'] = model_lr.predict_proba(Xt_df)[:,0]

In [48]:
reviews.head(3)

Unnamed: 0,Id,UserId,Rating,Priority,Title,Text,Text_Clean,Predictions_NB,Predictions_LR
0,23689,A21SYVGVNG8RAS,5,Low,Yummy snacks!,Popchips are the bomb!! I use the parmesan garlic to scoop up cottage cheese as a healthy alternative to chips and dip. My healthy eating program is saved.,popchip bomb use parmesan garlic scoop cottage cheese healthy alternative chip dip healthy eat program save,0.200606,0.259019
1,23690,AQJYXC0MPRQJL,5,Low,Great chip that is different from the rest,"I like the puffed nature of this chip that makes it more unique in the chip market. I ordered the Salt and Vinegar and absolutely love that flavor, hands down my favorite chip ever. I have tried the cheddar and regular flavors as well. The cheddar is about a 4/5 and the regular is about a 3/5 because I prefer strong flavors and obviously that would not be the case for the regular. The Salt and Vinegar is kind of weak compared to some regular S&V chips, but is quite flavorful and makes you wanting to come back for more.",like puff nature chip make unique chip market order salt vinegar absolutely love flavor hand favorite chip try cheddar regular flavor cheddar 45 regular 35 prefer strong flavor obviously case regular salt vinegar kind weak compare regular sv chip flavorful make want come,0.305999,0.228199
2,23691,A30NYUHEDLWI0Y,5,Low,Great Alternative to Potato Chips,"I just love these chips! I was always a big fan of potato chips, but haven't had one since I discovered popchips. They are great for dipping or all alone. I am constantly re-ordering them. One note however-if you are on a low salt diet these chips are probably not for you. They are high in sodium. We go through a case every two months. If you love them it pays to join the subscribe and save program through Amazon. You save money and stay supplied!",love chip big fan potato chip not discover popchip great dipping constantly reorder note howeverif low salt diet chip probably high sodium case month love pay join subscribe save program amazon save money stay supply,0.04509,0.075175


In [50]:
### Let's sort the values to determine which cases would have a high probability for high priority under Naive Bayes:

reviews.sort_values(by='Predictions_NB', ascending=False).head(10)

Unnamed: 0,Id,UserId,Rating,Priority,Title,Text,Text_Clean,Predictions_NB,Predictions_LR
550,24239,A2ZKS33N6Y3EPC,3,High,"Taste more like ""Tomato and Basil"" than ""Chili and Lime""","NOTE: This review is for the Chili and Lime Flavor Popchip. Amazon had a separate page for it but then merged the product and its reviews into one.<br /><br />It's hard to objectively review food since everyone's palate and tastes are different. So what I can say about this particular Popchip flavor that should be useful for most folks out there is that it doesn't really taste like Chili and Lime you're ""probably"" expecting. The Chili and Lime most folks probably are expecting if they grew up on Frito Lay products is very sharp and sweet (and of course artificial) - but it's what we liked if we ate more than a bag.<br /><br />The best way I can describe this flavor is that it has a ""tomato"" like taste to it with a somewhat tangy sour note that is suppose to be the lime component. Together they turn into an odd combination that registers other flavors in your mind than Chili and Lime - at least it did to me and others who tasted it with me. If you eat the skin of a green bell pepper, you can kind of get at what Popchips were trying to do with the Chili taste on this version, but I have no idea how some sour salt can be akin to lime. For myself personally, I thought it tasted like ""Tomato and Basil"" you would find on Pita chip flavors and baked snacks.<br /><br />Whether or not you agree with my above description of the flavor, I would highly suggest you try to get this in a sample pack and try it out first. BBQ + Salt & Vinegar Popchips are still my staples for now.",note review chili lime flavor popchip amazon separate page merge product review onebr br hard objectively review food everyone palate taste different particular popchip flavor useful folk not taste like chili lime probably expect chili lime folk probably expect grow frito lay product sharp sweet course artificial like eat bagbr br good way describe flavor tomato like taste somewhat tangy sour note suppose lime component turn odd combination register flavor mind chili lime taste eat skin green bell pepper kind popchip try chili taste version idea sour salt akin lime personally think taste like tomato basil find pita chip flavor bake snacksbr br agree description flavor highly suggest try sample pack try bbq salt vinegar popchip staple,0.973989,0.478529
96,23785,AE5AHEH3NLPBZ,3,High,Tastes Like Celery,"I really like pop chips, but this flavor isn't the best. I was expecting these to taste like chili peppers and lime (Spicy, Sweet, and Tart), but instead of going for a chili pepper taste, they went for a chili the food taste. This wouldn't be so bad, except they taste overwhelmingly of tomato and celery. The reason they didn't call them Tomato and Celery Chips is because it is sounds gross and no one would buy that, and unfortunately it tastes like it sounds.",like pop chip flavor not good expect taste like chili pepper lime spicy sweet tart instead go chili pepper taste go chili food taste not bad taste overwhelmingly tomato celery reason not tomato celery chip sound gross buy unfortunately taste like sound,0.854032,0.495605
463,24152,A2ZMMQ4W17EK2N,2,High,Original PopChips,"Bought the Original flavor from the store and just tried them tonight. They were very greasy and salty. I did not like them a lot. I will not purchase this original flavor again. However I can't complain because I got the 3 ounce bag for only $1.00 at the store while they were on sale. I tried the BBq flavor and they are delicious. I bought the sea salt & vinegar, and cheddar but haven't tried those yet.",buy original flavor store try tonight greasy salty like lot purchase original flavor not complain get 3 ounce bag 100 store sale try bbq flavor delicious buy sea salt vinegar cheddar not try,0.760037,0.439521
21,23710,ASIMCC20UVK58,5,Low,Great Chips Less Fat,"I eat chips almost every day and decided I wanted to find something that tastes as good but is lighter on unnecessary fat than regular types of chips. I bought a case of Popchips BBQ. These are satisfying and taste great. They don't taste exactly like any full fat chip products I've had mainly because they're not greasy at all, but they have a nice BBQ potato chip flavor. These are thick, crunchy, and light. I first bought the .8 oz bags and this serving size is on the small side for me with lunch (would probably be alright for a snack). 3 of the .8 oz bags works for me which of course bumps up the fat intake, but considering the same volume of ""regular"" chips has much more fat it is a significant fat decrease overall which is what I was looking for. I find the 3 ounce bags to be perfect. Even eating all 3 ounces works out to significantly less fat and calories than eating the same volume of other chips. This makes Popchips very satisfying to me, and I have bought many cases through Amazon.<br /><br />Heads up (mid-2011): Unfortunately the price has gone up significantly for these chips through Amazon, causing me to cancel my subscribe & save subscriptions. Popchips have popped up in local stores for significantly less per ounce. I love the convenience of the portioned bags and subscription but it's hard to justify paying double for the same product.<br /><br />The flavors are pretty straight forward but here's my thoughts...<br />Original flavor: Tastes like a plain potato chip minus the grease. Not my favorite flavor, but good for what it is. This flavor would probably be good with some kind of dip.<br />Chedder: Cheddar quickly became tied with BBQ for my favorite. Like BBQ the cheddar flavor is very strong. Great chips.<br />Salt & Pepper: Very strong pepper. To enjoy these you have to really like pepper. I like them, but they're not a favorite.<br />Sea Salt & Vinegar: I'm not a fan of vinegar, but strangely I enjoy this flavor. They're indeed salty with a fairly strong vinegar flavor.",eat chip day decide want find taste good light unnecessary fat regular type chip buy case popchip bbq satisfy taste great not taste exactly like fat chip product ve mainly greasy nice bbq potato chip flavor thick crunchy light buy 8 oz bag serve size small lunch probably alright snack 3 8 oz bag work course bump fat intake consider volume regular chip fat significant fat decrease overall look find 3 ounce bag perfect eat 3 ounce work significantly fat calorie eat volume chip make popchip satisfying buy case amazonbr br head mid2011 unfortunately price go significantly chip amazon cause cancel subscribe save subscription popchip pop local store significantly ounce love convenience portion bag subscription hard justify pay double productbr br flavor pretty straight forward here thoughtsbr original flavor taste like plain potato chip minus grease favorite flavor good flavor probably good kind dipbr chedder cheddar quickly tie bbq favorite like bbq cheddar flavor strong great chipsbr salt pepper strong pepper enjoy like pepper like favoritebr sea salt vinegar m fan vinegar strangely enjoy flavor salty fairly strong vinegar flavor,0.757451,0.248328
157,23846,A1HYH206E18XVC,5,Low,Tangy and terrific,"When I asked my older daughter to describe this flavor, she said to be sure to mention the word tangy. That is a fair description as the lime does heighten the taste buds and enhances the slight heat from the chili.<br /><br />My family really enjoys this flavor and it is among our favorites. We have tried most of the other varieties of Popchips and have our own preferences. My older daughter likes salt and pepper, barbecue, and this flavor the best. I like barbecue, sour cream and onion, and this flavor the best. My wife likes salt and vinegar and this flavor the best. My younger daughter does not like this flavor. She prefers barbecue, cheese, and sour cream and onion. Our least favorite is the original, probably because it is so plain by comparison.<br /><br />To me, Popchips are sort of a cross of potato chips, popcorn, and rice cakes. They are potato, but popped like popcorn and sort of puffy like rice cakes. They definitely have more flavor than many rice cakes and are a nice alternative to popcorn. They also can be used with dips although they never seem to last very long in our house.",ask old daughter describe flavor say sure mention word tangy fair description lime heighten taste bud enhance slight heat chilibr br family enjoy flavor favorite try variety popchip preference old daughter like salt pepper barbecue flavor good like barbecue sour cream onion flavor good wife like salt vinegar flavor good young daughter like flavor prefer barbecue cheese sour cream onion favorite original probably plain comparisonbr br popchip sort cross potato chip popcorn rice cake potato pop like popcorn sort puffy like rice cake definitely flavor rice cake nice alternative popcorn dip long house,0.728147,0.305065
363,24052,AEU9NQ5EDBW91,4,Low,Great introduction to these chips,"Okay, I personally loved how these chips taste. They are super crunchy and flavorful. I chose to take away one star because the Sea Salt & Vinegar flavor was way too salty. BBQ was my favorite. Next time I order, I am getting just the BBQ. They taste just like the greasy unhealthier versions made by other brands. The Cheddar was good but you can tell it was a healthy version of this flavor chips. The rest of the flavors I would say taste as they normally do, nothing special or out of the typical range of those flavors.",okay personally love chip taste super crunchy flavorful choose away star sea salt vinegar flavor way salty bbq favorite time order get bbq taste like greasy unhealthier version brand cheddar good tell healthy version flavor chip rest flavor taste normally special typical range flavor,0.721433,0.428386
497,24186,A2H7CZLHBP0N3D,1,High,Tastes like cardboard!,"I read about these on [...] and decided to try them. I bought the variety pack so as not to get stuck with one flavor if I didn't like them. I really tried to like these because of the health benefits but I ended up throwing out the whole case after tasting one of each flavor. If you are looking for these as a substitute for potato chips, forget it. They really do taste like dissolving cardboard. Stick with baked Lays!",read decide try buy variety pack stick flavor not like try like health benefit end throw case taste flavor look substitute potato chip forget taste like dissolve cardboard stick baked lay,0.713274,0.453579
300,23989,A39222ZAUZUN9W,5,Low,Best Chips Ever,"These chips are a regular in my house now. In fact they are the only chips I buy. The BBQ flavor is my favorite. I buy these in single serving packs because if I buy a bigger bag, it is too hard to stop eating them.",chip regular house fact chip buy bbq flavor favorite buy single serve pack buy big bag hard stop eat,0.679353,0.442072
170,23859,A1Y8ZODMDEIYYK,5,Low,Delicious,"My favorite PopChips flavor is the Parmesan Garlic flavor, but this is definitely my new second favorite. It does have a lot of lime flavor, with the chili flavor being a little less present. That said, it is there and it is tasty and not spicy. I was really pleased with this purchase and will definitely buy them again.",favorite popchip flavor parmesan garlic flavor definitely new second favorite lot lime flavor chili flavor little present say tasty spicy pleased purchase definitely buy,0.665765,0.533245
453,24142,AOZ8OAFA0MF9N,1,High,"Other Flavors are Great, these Not So Much","The name of this flavor sounds fantastic, and I wondered why they didn't include them in their multi-flavor assortment. Then I tasted them. These chips are terrible, there is no parmesan flavor to speak of, and they somehow messed up the garlic as well. Avoid these and the also-no-incuded-assortment flavor of ""jalapeno"". Very disappointed.<br /><br />The cheddar, vinegar & salt and peppered flavors are FANTASTIC!",flavor sound fantastic wonder not include multiflavor assortment taste chip terrible parmesan flavor speak mess garlic avoid alsonoincudedassortment flavor jalapeno disappointedbr br cheddar vinegar salt pepper flavor fantastic,0.661523,0.522654
