# Logistic Regression

## Probability, Odds, log(Odds)

In [4]:
# example of converting between probability and log-odds
from math import log
from math import exp

# define our probability of success
prob = 0.8
print('Probability %.1f' % prob)

# convert probability to odds
odds = prob / (1 - prob)
print('Odds %.1f' % odds)

# convert back to probability
#prob = odds / (odds + 1)
prob = 1 / ( 1+ (1/odds))
print('Probability %.1f' % prob)

# convert odds to log-odds
logodds = log(odds)
print('Log-Odds %.1f' % logodds)

# convert log-odds to a probability
prob = 1 / (1 + exp(-logodds))
print('Probability %.1f' % prob)

Probability 0.8
Odds 4.0
Probability 0.8
Log-Odds 1.4
Probability 0.8


## Likelihood Function


L = min $\sum_{i=0}^{i=n} -\left( \hat{y_i} * y_i  + (1 - \hat{y_i})(1 -  y_i) \right)$

logL = min $\sum_{i=0}^{i=n} -\left( log\hat{y_i} * y_i  + log(1 - \hat{y_i})(1 -  y_i) \right)$

cross entropy = -(log(q(class1)) * p(class1) + log(q(class0)) * p(class0)); where q epresents the estimation of the probability distribution and p represents the probability of class 0 or class 1.

https://machinelearningmastery.com/logistic-regression-with-maximum-likelihood-estimation/


In [6]:
# test of Bernoulli likelihood function
 
# likelihood function for Bernoulli distribution
def likelihood(y, yhat):
    return yhat * y + (1 - yhat) * (1 - y)
 
# test for y=1
y, yhat = 1, 0.9
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))
y, yhat = 1, 0.1
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))
# test for y=0
y, yhat = 0, 0.1
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))
y, yhat = 0, 0.9
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))

y=1.0, yhat=0.9, likelihood: 0.900
y=1.0, yhat=0.1, likelihood: 0.100
y=0.0, yhat=0.1, likelihood: 0.900
y=0.0, yhat=0.9, likelihood: 0.100


# Cross Entropy:

In information theory, the cross-entropy between two probability distributions `p` and `q` over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution `q`, rather than the true distribution `p`. 

# Shuffle a deck of cards

1. First, fill the array with the values in order.

2. Go through the array and exchange each element 
   with the randomly chosen element in the range 
   from itself to the end.

In [1]:
import random

def suffle(card,n):
    
    for i in range(0,n-1):  
        #r =  random.randint(0,n)
        #s = i + r%(n-i) 
        s = random.randint(i,n-1)
        #print('i=%.0f, n-i=%.0f, r=%.0f, shuffle: %.0f' % (i, n-i, r, s))
        #tmp=card[i] 
        #card[i]=card[s] 
        #card[s]=tmp 
        card[i], card[s] = card[s], card[i]
    return card

# Driver Code:
if __name__ == '__main__':
  
    #l = sorted(random.sample(range(0, 52), 52)) # O(nlogn)
    #print(l)
    
    a = [i for i in range(0,52)]
    print('input:', a, '\n')
    n = len(a)
    
    #print (random.sample(range(0, 52), n))
    new_a = suffle(a,n)
    print('shuffled:', new_a, '\n')
    
    # cross check
    print(set(new_a))  

input: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51] 

shuffled: [11, 8, 27, 5, 14, 13, 26, 50, 23, 47, 2, 41, 25, 40, 15, 45, 29, 21, 34, 51, 48, 22, 0, 6, 35, 28, 1, 32, 19, 30, 33, 3, 46, 49, 44, 36, 42, 24, 4, 10, 18, 9, 17, 16, 7, 38, 39, 20, 43, 31, 12, 37] 

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51}


# Shuffle a given array using Fisher–Yates shuffle Algorithm

https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Given an array, write a program to generate a random permutation of array elements. This question is also asked as “shuffle a deck of cards” or “randomize a given array”. Here shuffle means that every permutation of array element should equally likely.

Fisher–Yates shuffle Algorithm
0. We'll start by writing out the numbers from 1 to 8 
1. For our first roll, we roll a random number from 1 to 8: this time it is 6, so we swap the 6th and 8th numbers in the list
2. The next random number we roll from 1 to 7, and turns out to be 2. Thus, we swap the 2nd and 7th numbers and move on
3. The next random number we roll is from 1 to 6, and just happens to be 6, which means we leave the 6th number in the list (which, after the swap above, is now number 8) in place and just move to the next step. Again, we proceed the same way until the permutation is complete.

```
Range Roll 	Scratch 	   Result
              1 2 3 4 5 6 7 8
1–8 	6 	1 2 3 4 5 8 7 	 6              
1–7 	2 	1 7 3 4 5 8 	 2 6               
1–6 	6 	1 7 3 4 5 	 8 2 6
1–5 	1 	5 7 3 4 	 1 8 2 6
1–4 	3 	5 7 4 	 3 1 8 2 6
1–3 	3 	5 7 	 4 3 1 8 2 6
1–2 	1 	7 	 5 4 3 1 8 2 6 

-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n−1 do
     j ← random integer such that i ≤ j < n
     exchange a[i] and a[j]

```       

In [169]:
def shuffle(l,n):
    for i in range(0, n-1):
        j = random.randint(i,n-1)
        print('i=%.0f, j=%.0f'% (i, j))
        #j = i + (random.randint(0, n)%(n-i))
        l[i], l[j] = l[j], l[i]
        print('l=', l, '\n')
    return l
    

if __name__ == '__main__':
    a = [i for i in range(1, 4)]
    n = len(a)
    print('input:', a )
    print(shuffle(a,n))
    

input: [1, 2, 3]
i=0, j=1
l= [2, 1, 3] 

i=1, j=2
l= [2, 3, 1] 

[2, 3, 1]


In [34]:
import numpy as np
#import array as arr 
def calc_distance(X1, X2):
    #print(X1[0], X1[1])
    return (sum((X1 - X2)**2))**0.5

X = np.array([[1,5],[5,8],[8,1],[9,5]]) #4x2
print(X.shape)
            
#X2 = np.array([[2,3],
#               [4,5]])

if __name__ == '__main__':
     
        
    distances = []
    for i in range(len(X)-1):
        for j in range(i+1, len(X)): 
            distances += [calc_distance(X[i],X[j])]
            #print(distances)
    
    print (min(distances))

(4, 2)
4.123105625617661


In [33]:
points = [(1,5), (5,8), (8,1), (9,5)]

def euclideanDistance(X1, X2):
    return pow((X1[0] - X2[0]) **2 + (X1[1] - X2[1])** 2, .5)
         
distances = []
for i in range(len(points)-1):
    for j in range(i+1, len(points)):
        distances += [euclideanDistance(points[i],points[j])]
print (min(distances))


4.123105625617661
