# Predict Disease by User Symptoms using Naive Bayes

- User enters their symptoms
- Train model using Bayes Algorithm and the Bernoulli classifier
- Predict disease based on symptoms -> output the probability of disease
- Can also check against a certain disease the user enters

### Import Necessary Libraries

In [42]:
import pandas as pd
from sklearn.naive_bayes import BernoulliNB
# to turn user symptoms into a valid list of 0s and 1s
from Convert import to_valid_list

### Import and Visualize Dataset 
- Dataset was based off of https://impact.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html
- Original dataset is a list of diseases and associated symptoms
- Created a new dataset using some of the information in the original dataset so symptoms associated with the disease will have a value of 1 and symptoms not associated with the disease will have a value of 0.

In [43]:
dataset = pd.read_csv('disease.csv')

# display first five rows of dataset
dataset.head()

Unnamed: 0,cough,fever,shortness of breath,pain chest,diarrhea,vomiting,unresponsiveness,asthenia,dyspnea,pain abdominal,...,cicatrisation,mediastinal shift,impaired cognition,snuffle,chill,headache,guaiac positive,decreased body weight,sore to touch,disease
0,0,0,1,1,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,hypertensive disease
1,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,coronary heart disease
2,1,0,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,failure heart congestive
3,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,asthma
4,1,0,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,chronic obstructive airway disease


### Naive Bayes algorithm with Bernoulli Classifier 
- Function to find the probabilities of the features (user symptoms) belonging to each class (disease)
- Bernoulli Classifier was used as the features are binary/boolean valued
- The model is trained on the entire dataset, instead of a train dataset since we wanted every disease in the dataset to be included when considering the predictions and probabilities.
- Parameters: 
 - user_symptoms: list of symptoms entered by user
 - disease: "All diseases" (default if user doesn't choose anything from dropdown) or a specific disease in our dataset selected by user from the dropdown menu
   - if "All diseases" is entered by user, this function will return the disease with the highest probabililty
   - if a specific disease is selected by the user, this function will return the probability of the symptoms indicating that disease
- Returns a statement with the disease and probability of symptoms indicating that disease

In [44]:
def NaiveBayes(user_symptoms, disease):
    
    symptoms = to_valid_list(user_symptoms)
    symptoms_list = [symptoms]
    # load the datasets
    # training dataset (includes all the data for all the diseases)

    # X = features of dataset
    X = dataset.drop(columns=['disease'])
    # y = target (classes) of dataset
    y = dataset['disease']
    
    # fit the model with the dataset using the Bernoulli classifier
    model = BernoulliNB()
    model.fit(X, y)
    
    # predict the target class (disease) for the user symptoms
    result = model.predict(symptoms_list[0:1])[0]
    # to get probabilities of all classes (diseases) for the user symptoms
    # used to get probability of the symptoms indicating a disease
    prob = pd.DataFrame(model.predict_proba(symptoms_list[0:1]), columns=model.classes_)
    
    # return value
    output = ""
    
    # return disease with highest probability and the probability of that disease
    if disease == "All diseases":
        pred_disease = "Based on the symptoms, the predicted disease is {}. The probability of your symptoms indicating {} is {}%".format(result, result, prob.at[0,result]*100)
        output = pred_disease
    # return probability of specific disease selected by user
    else:
        specific_disease = "The probability of your symptoms indicating {} is {}%".format(disease,prob.at[0,disease]*100)
        output = specific_disease
    
    return output

### Sample Function Output

- User enters symptoms and checks against 'All diseases'
- Call NaiveBayes() function to see what the disease with the highest probability is

In [45]:
user_sym = "cough, shortness of breath, wheezing"
print(NaiveBayes(user_sym, "All diseases"))

Based on the symptoms, the predicted disease is asthma. The probability of your symptoms indicating asthma is 38.795226368630445%


----------------------------------------------------------------------------------------------------------------------
- User enters symptoms and specific disease to check against (in this case, 'pneumonia')
- Call NaiveBayes() function to see what the probability of that specific disease is

In [46]:
user_sym = "cough, shortness of breath, wheezing"
print(NaiveBayes(user_sym, "pneumonia"))

The probability of your symptoms indicating pneumonia is 4.849403296078814%
