## ***Let's first understand what is going on in algorithm.***

Naive Baye's is a classification algorithm that uses Baye's rule to classify. First we train the algorithm using training dataset and improved using cross-validation data and lastly, tested using test data. Let's dig deeper into mathematics of this algorithm:

**Conditional probability :**

$$ P(A|B) = \frac {P(A\cup B)}{P(B)} $$

The above equation means probability of A, given B is equal to probability of occurence of both A and B divided by probability of B.

Similarly we can write $P(B|A)$ as:

$$ P(B|A) = \frac {P(A\cup B)}{P(A)} $$
Let's just for a second visualize the above equation as:
$$ P(B|A)\ .P(A) = P(A\cup B)$$
Let's put this value of $P(A \cup B)$ in initial equation:
$$ $$

$$ P(A|B) = \frac {P(B|A)\ .P(A)}{P(B)} $$

$$ **Baye's Rule** $$

**This equation above is Baye's rule.**

1. **P(A|B)** is called Posterior probability 
2. **P(A)** is called Prior probability
3. **P(B)** is called Normalizing probability
4. **P(B|A)** is called Likelihood probability

## Generally in NBC, these cases can appear:

**1. B is UNIVARIATE**
- B is Discrete
    - Binomial
    - Multinomial
- B is continous

**2. B is MULTI-VARIATE**
- all features in B are continous
- all features in B are discrete - binomial or multinomial  
- some features in B are discrete and some are continous

### **DATASET : Breast_Cancer_Dataset**

# CASE 1 : B is UNIVARIATE CONTINOUS

As you can see below, our dataset has 33 columns, but we are taking **B => `radius_mean`** and  **A => `diagnosis`** for this case.

- $\textbf{P(diagnosis='M' | radiusMean=}x_i\textbf{) = }$
$$ $$
$$ \ \frac{P(radiusMean=x_i|diagnosis='M').P(diagnosis='M')}{P(radiusMean=x_i|diagnosis='M').P(diagnosis='M') + P(radiusMean=x_i|diagnosis='B').P(diagnosis='B') }$$

- $\textbf{P(diagnosis='B' | radiusMean=}x_i\textbf{) = } $
$$ $$
$$ = \ \frac{P(radiusMean=x_i|diagnosis='B').P(diagnosis='B')}{P(radiusMean=x_i|diagnosis='M').P(diagnosis='M') + P(radiusMean=x_i|diagnosis='B').P(diagnosis='B') }$$

Let's see how all the four probabilities in above 2 terms are going to be calculated for this case:

1. $\textbf{P(A|B)}$ is the posterior probability, that is getting a tumor 'B' or 'M' for a given radius_mean value.
<br>

2. $\textbf{P(A)}$ is the Prior probability which means no matter what the radius_mean what is the probability of tumor being 'B' or 'M'. It is calculate by measuring the relative frequency of 'B' or 'M'(since relative frequency in sample is the best estimate of probability in population).
<br>

3. $\textbf{P(B)}$, the Normalizing Probability is rewritten as denominator in above 2 terms because P(B) means probability of radius_mean = x(some value) which can be calculated adding the probability that radius_mean = x when tumor is 'B' and radius_mean = x when tumor is 'M'.<br>These value are calculated from data itself.
<br>

4. $\textbf{P(B|A)}$, the Likelihood Probability is probability of given radius_mean = x when A = 'B' or 'M' depending upon what you take A.

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as s
import matplotlib.pyplot as plt

### Prepping DF

In [None]:
data = pd.read_csv('/home/pramila/Desktop/DataSets/TumorData.csv')

In [None]:
datacp = data.copy()

In [None]:
datacp.head()

So, for our case first we will take diagnosis and radius_mean column

In [None]:
datacp1 = datacp.iloc[:,1:3]

In [None]:
datacp1.shape

### Dividing the data for testing, cross-validation and training

In [None]:
datacp1_training = datacp1.iloc[:int(0.7*len(datacp1)), :]

In [None]:
remaining_data = datacp1.iloc[int(0.7*len(datacp1)):, :]

In [None]:
cross_validation_data = remaining_data.iloc[:int(0.5*len(remaining_data)), :]

In [None]:
testing_data = remaining_data.iloc[int(0.5*len(remaining_data)):, :]

### Prior Probabilities for each class

In [None]:
# calculating prior probability, here we take A = 'B'
datacp1_training.columns

- `pr_pb_b` - prior probability of diagnosis = 'B'

- `pr_pb_m` - prior probability of diagnosis = 'M'

In [None]:
pr_pb_b = len(datacp1_training.loc[datacp1_training['diagnosis']=='B', 'diagnosis']) / len(datacp1_training)
pr_pb_m = len(datacp1_training.loc[datacp1_training['diagnosis']=='M', 'diagnosis']) / len(datacp1_training)

In [None]:
pr_pb_b

In [None]:
pr_pb_m

### Parameters' values for Likelihood Probability for each class

In [None]:
all_benign_data = datacp1_training.loc[datacp1_training['diagnosis'] == 'B', 'radius_mean']
all_malign_data = datacp1_training.loc[datacp1_training['diagnosis'] == 'M', 'radius_mean']

mu_b = np.array(all_benign_data).mean()
sigma_b = np.array(all_benign_data).std()

mu_m = np.array(all_malign_data).mean()
sigma_m = np.array(all_malign_data).std()

In [None]:
mu_b

### Posterior Probability 

- `p_x_a_equal_B` - Likelihood probability of **x** for class 'B'

- `p_x_a_equal_M` - Likelihood probability of **x** for class 'M'

In [None]:
def posterior_probability(x):
    '''Takes each value of x and returns posterior probability of 'B' for the given x
    '''
    p_x_a_equal_B = s.norm.pdf(x, loc=mu_b, scale=sigma_b)
    p_x_a_equal_M = s.norm.pdf(x, loc=mu_m, scale=sigma_m)
    pp_B_given_x = ((p_x_a_equal_B)*(pr_pb_b)) / ((p_x_a_equal_B*pr_pb_b) + (p_x_a_equal_M*pr_pb_m))
    return pp_B_given_x

`posterior_p_c1_cvd` : Posterior probability for case 1, on Cross Validation Data

In [None]:
posterior_p_c1_cvd = posterior_probability(cross_validation_data.iloc[:, 1])

### Comparing predicted values with original values

#### - Cross Validation data

In [None]:
predicted_values_cvd= np.uint(posterior_p_c1_cvd > 0.5)

In [None]:
cross_validation_data.replace(to_replace=['B', 'M'], value=[1,0], inplace=True)

In [None]:
#cross_validation_data

`accuracy_c1_cv` : Accuracy on Cross-Validation data for case 1

In [None]:
accuracy_c1_cv = np.count_nonzero(np.uint(predicted_values_cvd == cross_validation_data.iloc[:,0])) / len(cross_validation_data)

#### - training data

In [None]:
datacp1_training.replace(to_replace=['B', 'M'], value=[1,0], inplace=True)

In [None]:
#datacp1_training

`posterior_p_c1_td` : posterior probability for x in training data, for case 1

In [None]:
posterior_p_c1_td = posterior_probability(datacp1_training.iloc[:,1])

In [None]:
#posterior_p_c1_td

In [None]:
predicted_values_td = np.uint(posterior_p_c1_td > 0.5)

In [None]:
#predicted_values_td

In [None]:
accuracy_c1_td = np.count_nonzero(np.uint(predicted_values_td == datacp1_training.iloc[:,0]))/len(datacp1_training)

# CASE 2 : B is MULTIVARIATE, all features are continous 

Now, we know prior probability is going to remain same for both 'B' and 'M'. We just have to calculate posterior probability using Normalizing and LIkelihood Probabilities. 
Since in our P(A/B), B is multi-variate continous random variable, we need to calculate covariance from training data and use that value of covariance to find joint probability, hence finally calculating Posterior Probability. Let's see how everything is going on mathematically:

**B => `radius_mean` $\cap$ `texture_mean`**

**A => `diagnosis`**

$$ P(A\mid B) = \frac{P(B\mid A='B').P(A)}{P(B\mid A).P(A='M') + P(B\mid A).P(A='B')}$$

**To calculate** $P(B \mid A)$ :

$$ P(radius\_mean=x \cap texture\_mean=y) \ = \ \frac{1}{ ({\sqrt{2\pi}})^n  {\sqrt{|\sum |}}} e^\frac{-1}{2} \bigl(\begin{bmatrix} x - \mu_x \\ y - \mu_y \end{bmatrix}\bigr)^T (\sum)^{-1} \bigl(\begin{bmatrix} x - \mu_x \\ y - \mu_y \end{bmatrix}\bigr) $$

Here $\sum$ represents covariance matrix.

### Prepping DF

In [None]:
datacp.head()

In [None]:
datacp2 = datacp.iloc[:,:4]

In [None]:
datacp2.head()

### Dividing the data for testing, cross-validation and training

In [None]:
datacp2_training = datacp2.iloc[:int(0.7*len(datacp2)),:]
remaining_data_cp2 = datacp2.iloc[int(0.7*len(datacp2)):,:]
datacp2_cv = remaining_data_cp2.iloc[: int(0.5*len(remaining_data_cp2)),:]
datacp2_test = remaining_data_cp2.iloc[int(0.5*len(remaining_data_cp2)):, :]

### Parameters that are Covariance Matrix and Mean, for each class

In [None]:
benign_data = datacp2_training.loc[datacp2_training['diagnosis'] == 'B', ['radius_mean', 'texture_mean']]
malign_data = datacp2_training.loc[datacp2_training['diagnosis'] == 'M', ['radius_mean', 'texture_mean']]

In [None]:
benign_data

#### Means for each class 

In [None]:
mu_b = np.array(benign_data.iloc[:,:].mean())
mu_m = np.array(malign_data.iloc[:,:].mean())

In [None]:
mu_b

In [None]:
mu_m

#### Covariance Matrices for each class

In [None]:
cov_b = np.array(benign_data.iloc[:,:].cov())

In [None]:
cov_m = np.array(malign_data.iloc[:,:].cov())

### Posterior Probability 

**`p_of_B_given_A_equal_b`**: Likelihood prob. of B = x, when given that diagnosis is benign

**`p_of_B_given_A_equal_m`**: Likelihood prob. of B = x, when given that diagnosis is benign

**`pp_of_A_equal_b_given_B`**: Posterior prob. of A = 'B' (benign) for given value of B = x

In [None]:
def posterior_probability_cp2(x):
    p_of_B_given_A_equal_b = s.multivariate_normal.pdf(x, mean=mu_b, cov=cov_b)
    p_of_B_given_A_equal_m = s.multivariate_normal.pdf(x, mean=mu_m, cov=cov_m)
    pp_of_A_equal_b_given_B = (pr_pb_b * p_of_B_given_A_equal_b) / ((pr_pb_b * p_of_B_given_A_equal_b) + (pr_pb_m * p_of_B_given_A_equal_m))
    return pp_of_A_equal_b_given_B
    

In [None]:
datacp2_cv

**`posterior_probabilities_cp2`** : posterior probabilities for case 2

In [None]:
posterior_probabilities_cp2 = posterior_probability_cp2(datacp2_cv.iloc[:,2:])

In [None]:
answer_from_algo = (np.uint(posterior_probabilities_cp2 > 0.5))  #predicted answers

In [None]:
answer_from_algo

In [None]:
datacp2_cv.replace(to_replace=['B','M'], value=[1,0],inplace=True)

In [None]:
#datacp2_cv

In [None]:
actual_answers = datacp2_cv['diagnosis'] #actual answers

### Comparing predicted VS actual

In [None]:
accuracy = np.count_nonzero(np.uint(actual_answers == answer_from_algo))/len(answer_from_algo)

In [None]:
accuracy

## Case - Tumor Dataset + all features + all features continous

$$ P(A|B) = \frac {P(B|A)\ .P(A)}{P(B)} $$

Let's see how all the four probabilities are going to be calculated for this case:

1. $\textbf{P(A/B)}$ is the <b>Posterior Prob.</b> i.e. probability of getting a tumor 'B' or 'M' for a given feature vector value.<br>
<br>

2. $\textbf{P(A)}$ is the <b>Prior probability</b> which means no matter what the feature vector what is the probability of tumor being 'B' or 'M'. It is calculated by measuring the relative frequency of 'B' or 'M'(since relative frequency in sample is the best estimate of probability in population).
<br>

3. $\textbf{P(B)}$, the <b>Normalizing Probability</b> is rewritten as $P(B) = P(B/$tumor='B'$).P($tumor='B'$) + P(B/$tumor='M'$).P($tumor='M'$)$. Now since we are taking all the columns into account, B is equal to radius_mean=x $\cap$ texture_mean=y $\cap$ ...$\cap$ last_feature=something.
<br>

4. $\textbf{P(B/A)}$, the <b>Likelihood Probability</b> is probability of given radius_mean = x when A = 'B' or 'M' depending upon what you take A.

### => Reach : NBC-BreastCancerData.ipynb in Assignment-ML repo

# Case - Multivariate + Multinomial data 

### => Reach : NBC-MushroomDataset.ipynb + NBC-MushroomDataset-allColumns.ipynb in Assignment-ML repo