# Classifying Food Poisoning Message using Unsupervised Learning

## 1. Problem Definition


>The objective of this notebook is to address the reported cases of food poisoning among patrons who dined at a restaurant in suburban Banglore.



## 2. Methodology

The objective of this notebook is achieved by using an unsupervised learning technique known as `clustering`. 

Clustering is an unsupervised learning method is a method in which we draw references from datasets consisting of input data without labeled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a set of examples.



We will be using `MiniBatchKMeans` as the estimator to conduct the unsupervised learning. 

The `MiniBatchKmeans` is a variant of the `KMeans` algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function.

## 3. Importing Modules

run the following line in a new cell before importing the modules to install all the required modules:

`!pip install -r requirements.txt`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import TfidfVectorizer

## 4. Importing and Preparing the Unlabeled Data for Training the Model

In [2]:
texts = pd.read_csv("./food-poisoning-messages.csv", index_col=False, header=None)
pd.set_option('display.max_colwidth', None)
len(texts)

2000

In [3]:
texts.head()

Unnamed: 0,0
0,"Recently, there have been concerning reports about an uptick in cases of food poisoning associated with a popular restaurant in the city. Numerous individuals who dined at this establishment have reported symptoms such as severe nausea, vomiting, and stomach cramps within hours of consuming their meals. Health authorities are actively investigating the matter, urging anyone who has experienced similar symptoms to seek medical attention and report their cases."
1,"It has come to our attention that there is an ongoing investigation into a local eatery due to multiple instances of food poisoning reported by customers. Health officials are working diligently to trace the source of the contamination and identify the specific food items responsible for the illnesses. In the meantime, it is strongly advised to avoid dining at this particular restaurant until further notice."
2,"Amidst growing concerns about food safety, it has been discovered that a restaurant in our community is facing allegations of negligence leading to a surge in food poisoning cases. Customers have reported severe gastrointestinal distress, and the local health department is actively examining the establishment's hygiene practices. It is recommended that individuals who have recently patronized this restaurant and experienced symptoms like diarrhea or fever seek immediate medical attention."
3,"Disturbing reports have emerged about a local food establishment where several patrons have fallen ill with symptoms consistent with food poisoning. The affected individuals have reported consuming meals from this specific restaurant, prompting health authorities to initiate an urgent investigation. As a precautionary measure, individuals who have dined at this establishment recently are advised to monitor their health closely and report any symptoms to the local health department."
4,"In light of recent events, it has been brought to our attention that there is an ongoing investigation into a suspected outbreak of food poisoning linked to a popular restaurant chain. Numerous patrons have reported falling seriously ill after consuming meals at various branches of this establishment. Authorities are conducting thorough inspections and laboratory tests to pinpoint the exact cause of the contamination and ensure the safety of the public."


### Shuffling the Data and Converting to Significant Numerical Values using `TfidfVectorizer`

`TfidfVectorizer` converts the given data into a matrix of tf-idf features.

The tf–idf is the product of two statistics, term frequency and inverse document frequency. There are various ways for determining the exact values of both statistics. A formula that aims to define the importance of a keyword or phrase within a document or a web page.

In [4]:
texts = texts.sample(frac=1, random_state=42)

In [5]:
texts.head()

Unnamed: 0,0
1860,"To delve into the nuances of perfumery, I attended a fragrance blending masterclass. Understanding base, middle, and top notes, and crafting a personalized fragrance not only heightened olfactory senses but also unveiled the artistry behind creating unique scents."
353,"A recent meal at a restaurant has resulted in symptoms of food poisoning, including nausea and persistent digestive discomfort."
1333,"Reconnecting with old friends over a video call, distance can't dull true friendships."
905,"Customers who dined at a local establishment reported experiencing symptoms consistent with food poisoning, leading to heightened scrutiny of the restaurant's hygiene practices."
1289,"On a quest for intellectual stimulation, I joined a philosophy reading group. Engaging in deep discussions on existentialism, ethics, and metaphysics broadened my perspectives and fostered a community of critical thinkers committed to exploring the profound questions of existence."


Now we need to convert this df into a list to pass it throught the `TfidfVectorizer`.

In [6]:
texts.to_numpy().tolist()[:10]

[['To delve into the nuances of perfumery, I attended a fragrance blending masterclass. Understanding base, middle, and top notes, and crafting a personalized fragrance not only heightened olfactory senses but also unveiled the artistry behind creating unique scents.'],
 ['A recent meal at a restaurant has resulted in symptoms of food poisoning, including nausea and persistent digestive discomfort.'],
 ["Reconnecting with old friends over a video call, distance can't dull true friendships."],
 ["Customers who dined at a local establishment reported experiencing symptoms consistent with food poisoning, leading to heightened scrutiny of the restaurant's hygiene practices."],
 ['On a quest for intellectual stimulation, I joined a philosophy reading group. Engaging in deep discussions on existentialism, ethics, and metaphysics broadened my perspectives and fostered a community of critical thinkers committed to exploring the profound questions of existence.'],
 ['A weekend immersed in immer

This is a 2d list which further needs to be reduced to a 1d list.

In [7]:
texts = [j for i in texts.to_numpy().tolist() for j in i] #converting 2d list to 1d list
texts[:10]

['To delve into the nuances of perfumery, I attended a fragrance blending masterclass. Understanding base, middle, and top notes, and crafting a personalized fragrance not only heightened olfactory senses but also unveiled the artistry behind creating unique scents.',
 'A recent meal at a restaurant has resulted in symptoms of food poisoning, including nausea and persistent digestive discomfort.',
 "Reconnecting with old friends over a video call, distance can't dull true friendships.",
 "Customers who dined at a local establishment reported experiencing symptoms consistent with food poisoning, leading to heightened scrutiny of the restaurant's hygiene practices.",
 'On a quest for intellectual stimulation, I joined a philosophy reading group. Engaging in deep discussions on existentialism, ethics, and metaphysics broadened my perspectives and fostered a community of critical thinkers committed to exploring the profound questions of existence.',
 'A weekend immersed in immersive theate

Now the list can be passed through the vectorizer.

In [8]:
vectorizer = TfidfVectorizer(stop_words="english")
X = vectorizer.fit_transform(texts)

## 5. Training the Model

In [9]:
number_of_clusters = 2
model = MiniBatchKMeans(n_clusters=number_of_clusters, random_state=42)
model.fit(X);



### Grouping the texts based on clusters


In [10]:
clusters = model.labels_

grouped_texts = {}
for i,text in enumerate(texts):
    cluster = clusters[i]
    if cluster not in grouped_texts:
        grouped_texts[cluster] = []
    grouped_texts[cluster].append(text)

In [11]:
clusters

array([1, 1, 1, ..., 1, 1, 0], dtype=int32)

In [12]:
pd.DataFrame(clusters,texts)

Unnamed: 0,0
"To delve into the nuances of perfumery, I attended a fragrance blending masterclass. Understanding base, middle, and top notes, and crafting a personalized fragrance not only heightened olfactory senses but also unveiled the artistry behind creating unique scents.",1
"A recent meal at a restaurant has resulted in symptoms of food poisoning, including nausea and persistent digestive discomfort.",1
"Reconnecting with old friends over a video call, distance can't dull true friendships.",1
"Customers who dined at a local establishment reported experiencing symptoms consistent with food poisoning, leading to heightened scrutiny of the restaurant's hygiene practices.",1
"On a quest for intellectual stimulation, I joined a philosophy reading group. Engaging in deep discussions on existentialism, ethics, and metaphysics broadened my perspectives and fostered a community of critical thinkers committed to exploring the profound questions of existence.",1
...,...
"Took a pottery class, getting hands-on with clay is therapeutic.",0
"Seeking intellectual stimulation, I enrolled in a philosophy of science course. The discussions on scientific methodology, the philosophy behind empirical inquiry, and the intersection of science and ethics deepened my understanding of the scientific endeavor.",1
Experiencing nausea and diarrhea after dining out; suspecting foodborne contamination.,1
Spent the evening making homemade pizza with fresh toppings. Better than takeout!,1


### Identifying the food poisoning related cluster

In [13]:
keywords = ["food poisoning", "sick", "hospitalized", "nausea", "vomiting", "cramps", "diarrhea"]

food_poisoning_cluster = [cluster for cluster, texts in grouped_texts.items() if any(keyword in ''.join(texts).lower() for keyword in keywords)][0]
food_poisoning_cluster

1

## 6. Predicting Values

Now let's use the trained model to predict if a given text is related to food poisoning or not.

In [14]:
def predictCluster(text):
    vector = vectorizer.transform([text])
    pred = model.predict(vector)[0]
    return pred

def isFoodPoisoning(text):
    return True if (predictCluster(text) == food_poisoning_cluster) else False

In [15]:
example_text = "I have nausea and stomachache because of food poisoning."
if isFoodPoisoning(example_text):
    print("The message is food poisoning related.")
else:
    print("The message is not food poisoning related.")

The message is food poisoning related.


## 7. Conclusion

The model has been successfully trained using Unsupervised Learning techniques to achieve the objective of classifying if a given text message is related to food poisoning or not.