### Likelihood
1) The term likelihood can be defined as the possibility that the parameters under consideration may generate the data.

<b>Likelihood = Y * P(Y=1) + (1-Y) * P(Y=0)</b>

2) The objective is to maximize this likelihood.
This is called as Maximum liklihood Estimation (MLE)
 
3) <b>Log_likelihood =  -[Y * log(P(Y=1)) + (1-Y) * log(P(Y=0))] </b><br>
Log likelihhod needs to be minimised. <br>
This equation is referred to as Cost function for Logistic_Regression.

4) Maximising the likelihood is equivalent to minimising of log likelihood.

5) Negative liklihood is also known as <b>Binary CrossEntropy</b>. This is used in ANN(Artifical Neural Network) in binary classification

#### Mathematical Equations

<img src="log_reg3.png" align="left">
<img src="log_reg4.png" align="left">

Likelihood = Y * P(Y=1) + (1-Y) * P(Y=0)

Log_likelihood =  -[Y * log(P(Y=1)) + (1-Y) * log(P(Y=0))]

<img src="log_reg6.png" align="left">

Log_likelihood =  -[Y * log(P(Y=1)) + (1-Y) * log(P(Y=0))]

<img src="log_reg7.png" align="left">




<img src="log_reg9.png" align="left">
<img src="log_reg10.png" align="left">

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [24]:
# Actual
data = {'y_true' : [0,   0,  1, 0,  1,  0,   1,  1,  0, 1],
       'P1' :      [0.3,0.4,0.7,0.6,0.8,0.2,0.2,0.9,0.7,0.8]}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,y_true,P1
0,0,0.3
1,0,0.4
2,1,0.7
3,0,0.6
4,1,0.8


In [25]:
# Modified
# data = {'y_true' : [0,0,1,0,1,0,1,1,0,1],
#        'P1' : [0.3,0.4,0.7,0.1,0.8,0.7,0.2,0.9,0.7,0.8]}
# df = pd.DataFrame(data)
# df.head()

In [26]:
df['P0'] = 1-df['P1']
df = df[['y_true','P0','P1']]
df.head(7)
# 3,5,6

Unnamed: 0,y_true,P0,P1
0,0,0.7,0.3
1,0,0.6,0.4
2,1,0.3,0.7
3,0,0.4,0.6
4,1,0.2,0.8
5,0,0.8,0.2
6,1,0.8,0.2


In [27]:
df['Likelihood'] = df['y_true']*df['P1'] + (1-df['y_true'])*df['P0']
df['Neg_likelihood'] = -1*df['Likelihood']
# Log_likelihood = -[Y * log(P(Y=1)) + (1-Y) * log(P(Y=0))]
df['Log_likelihood'] =  (-1)*df['y_true']*np.log(df['P1']) - (1-df['y_true'])*np.log(df['P0'])
df.head()

Unnamed: 0,y_true,P0,P1,Likelihood,Neg_likelihood,Log_likelihood
0,0,0.7,0.3,0.7,-0.7,0.356675
1,0,0.6,0.4,0.6,-0.6,0.510826
2,1,0.3,0.7,0.7,-0.7,0.356675
3,0,0.4,0.6,0.4,-0.4,0.916291
4,1,0.2,0.8,0.8,-0.8,0.223144


In [28]:
ll_sum = df['Likelihood'].sum()
nll_sum = df['Neg_likelihood'].sum()
logll_sum = df['Log_likelihood'].sum()
print('likelihood sum',ll_sum)
print('nll_sum',nll_sum)
print('logll_sum',logll_sum)

likelihood sum 6.199999999999999
nll_sum -6.199999999999999
logll_sum 5.728668129878102


In [13]:
# likelihood sum 5.699999999999999
# nll_sum -5.699999999999999
# logll_sum 6.709497382889828

#### Probability vs Likelihood
1) Probability is used to finding the chance of occurrence of a particular situation, whereas Likelihood is used to generally maximizing the chances of a particular situation to occur.

2) The likelihood in very simple terms means to increase the chances of a particular situation to happen/occur by varying the characteristics of the dataset distribution

#### MLE (Maximum Likelihood Estimation)

1) Maximum likelihood estimation involves defining a likelihood function for calculating the conditional probability of observing the data sample given a probability distribution and distribution parameters.

2) The maximum likelihood estimator can readily be generalized to the case where our goal is to estimate a conditional probability P(y | x ; theta) in order to predict y given x.

#### Simple Explanation to likelihhod and MLE :
https://machinelearningmastery.com/what-is-maximum-likelihood-estimation-in-machine-learning/