# Naive Bayes
Naive Bayes is a classification algorithm family based on the Bayes theorem and the premise of feature independence. Despite the "naive" premise, Naive Bayes classifiers have been shown to perform effectively in a variety of real-world contexts, with particular success in text classification tasks.

We are going to predict if a person has corona based on a view features using Naive Bayes

In [2]:
# moduls
import pandas as pd
import numpy as np
import  sklearn as sk
from sklearn.naive_bayes import CategoricalNB

# Getting the data
The data are in characters, but sklearn works better with numbers, so I'll change the "yes" and "no" to 0 for no and 1 for yes for the headach column.
The roomate columns are not included in this dataframe since including them will have a negative impact on the results.

In [3]:
data = {"shivers": [1,0,1,0,0,1,1],
        "running nose": [0,0,1,1,0,0,1],
        "headache": [0,1,0,0,1,0,1],
        "test results": [0,0,1,0,1,0,1]}
df = pd.DataFrame(data)
df

Unnamed: 0,shivers,running nose,headache,test results
0,1,0,0,0
1,0,0,1,0
2,1,1,0,1
3,0,1,0,0
4,0,0,1,1
5,1,0,0,0
6,1,1,1,1


# Train data with CategoricalNB

In [6]:
X = df.drop(columns=["test results"])
y = df["test results"]
clf = CategoricalNB(force_alpha=True)
clf.fit(X,y)
print("result for predicting:")
print(clf.predict(X[3:4]))

result for predicting:
[0]


Predicting with all of the characteristics and classes should be easy, therefore the NB will get it right. 

# Naive Bayes


In [60]:
df = pd.DataFrame(data)

# Separate the features (symptoms) and the target (test results)
X = df[['shivers', 'running nose', 'headache']]
y = df['test results']

# Calculate the prior probabilities
p_negative = sum(y == 0) / len(y)
p_positive = sum(y == 1) / len(y)

# Calculate the conditional probabilities for row number 5 (index 4)
new_instance = X.iloc[4]

p_shivers_given_negative = sum((X.loc[y == 0, 'shivers'] == new_instance['shivers'])) / sum(y == 0)
p_running_nose_given_negative = sum((X.loc[y == 0, 'running nose'] == new_instance['running nose'])) / sum(y == 0)
p_headache_given_negative = sum((X.loc[y == 0, 'headache'] == new_instance['headache'])) / sum(y == 0)

p_shivers_given_positive = sum((X.loc[y == 1, 'shivers'] == new_instance['shivers'])) / sum(y == 1)
p_running_nose_given_positive = sum((X.loc[y == 1, 'running nose'] == new_instance['running nose'])) / sum(y == 1)
p_headache_given_positive = sum((X.loc[y == 1, 'headache'] == new_instance['headache'])) / sum(y == 1)

# Calculate the probabilities for row number 5
p_row_given_negative = p_shivers_given_negative * p_running_nose_given_negative * p_headache_given_negative
p_row_given_positive = p_shivers_given_positive * p_running_nose_given_positive * p_headache_given_positive

# Compare the probabilities to determine the predicted class
predicted_class = 0 if p_row_given_negative > p_row_given_positive else 1

print("Predicted Class for row number 5:", predicted_class)
print("prob for being neg",p_row_given_negative)
print("prob for pos:",p_row_given_positive)


Predicted Class for row number 5: 0
prob for being neg 0.09375
prob for pos: 0.07407407407407407


# Conclusion
This outcome is understandable.
It is projected to be negative because only one of the symptoms is present, implying that the likelihood of that person having covid is minimal.

As a result, the algorithm will not classify them as positive because it is most likely that this person is suffering from something other than covid.
