# Logistic Regressison
Logistic Regression is a statistical method used for analyzing and predicting binary outcomes in a dataset. It is a type of generalized linear model (GLM) that uses a logistic function to model the probability of an event occurring, such as success or failure, based on one or more predictor variables. By applying a transformation called the logit function to the linear combination of these predictor variables, Logistic Regression estimates the probability of an event happening and classifies observations into two categories. It is widely used in various fields, such as medicine, social sciences, and marketing, for its simplicity, interpretability, and effectiveness in modeling relationships between binary outcomes and predictor variables.

## Common Use-cases
Logistic Regression has a wide range of applications across various domains. Some common use-cases include:

* **Medical Diagnosis**: Predicting the presence or absence of a disease based on patient data, such as age, gender, and medical test results.

* **Credit Scoring**: Assessing the creditworthiness of a borrower by predicting the probability of loan default based on financial and demographic information.

* **Marketing**: Predicting customer behavior, such as the likelihood of purchasing a product, subscribing to a service, or responding to an advertisement, based on customer demographics and past behavior.

* **Spam Detection**: Classifying emails as spam or not spam based on features like email content, sender information, and email metadata.

* **Employee Attrition**: Predicting employee turnover in a company based on factors like job satisfaction, salary, and workload, to help organizations identify and address potential issues.

* **Political Analysis**: Forecasting election results or predicting voter preferences based on demographic, socioeconomic, and political factors.

* **Customer Churn**: Identifying the likelihood of customers discontinuing their use of a service or product, enabling businesses to target retention efforts more effectively.

* **Social Sciences**: Investigating relationships between binary outcomes and explanatory variables in various fields, such as psychology, economics, and sociology.

These are just a few examples, and Logistic Regression can be applied to a multitude of other problems involving binary outcomes and predictor variables.



## Important Variables
* **X**: The input data. The dimension of X is (n, m), where n is the number of input features per example and m is the number of examples in the dataset. The notation [x(1) x(2) ... x(m)] suggests that each column of X represents an example in the dataset, and each row represents a specific input feature. Therefore, the dimension of X is n rows by m columns.


* **Y**: The target (or label values associated with the input data). It is the ground truth or observed outcomes that you want the logistic regression model to predict.  **Y** usually consists of binary values, typically represented as 0 and 1, for a two-class problem. 

* **Z**: The linear output of the function which is calculated as *Z = W.T * X + b*

* **W**: The weight matrix.

* **b**: A scalar constant added to the linear combination of features and weights. It represents the intercept of the logistic function and plays a crucial role in the model's decision boundary.

* **A**: **A** is the output of the activation function. In logistic regression, the activation function used is the sigmoid function. **A** represents the predicted probabilities for each of the m training examples.

* **m**: The number of training examples or the size of the training dataset. It is used to calculate the average cost over all training examples or to compute the average gradient for updating the model parameters during the optimization process.

### Back-Prop Varialbes
* **dZ**: the partial derivative of the cost function with respect to the linear output **Z**. **dZ** is used during back-propagation to compute gradients for updating the models parameters. **dZ** is essential in calculating gradients for **W** and **b** which are subsequently used to update the model's parameters during the optimization process.

* **dA**: the derivative of the cost function is respect to the activation function output **A**

* **dW**: represents the partial derivative of the cost function with respect to the weight parameters **W**. It is the gradient of the weights and provides information about the direction and magnitude by which the weights need to be updated to minimize the cost function. During the training process, you calculate dW and use it to update the weight parameters iteratively to converge to the optimal solution.

* **g(Z)**: The derivative of the activation function (which is a sigmoid function in the case of Logistic Regression)

#### W^T operation
* **W^T** is the transpose of the weight vector **W**. The letter "T" denotes the transpose operation, which means the rows and columns of the matrix or vector are interchanged. In the context of logistic regression, the weight vector W contains the weights associated with each input feature.

* When you see **W^T**, it means that the weight vector has been transposed from a column vector to a row vector (or vice versa). The purpose of this operation is to make matrix multiplication compatible between the weight vector and the input feature vector, X.

In [None]:
import numpy as np
# as Both A and Y are vectors of the same length
# we can subtract the two vectors to implement
# A(1)-Y(1), A(2)-Y(2), for the entire length of the vectors

# our input data set, for this example 100 random numbers
X = np.random(100)
# m is the length of our input data set
m = len(X)
# A is the output of the Activation function
Y = np.zeros(m)

# initialize the bias as 0
b = 0

# W is the weight matrix which contains the coefficients for each
# predictor variable (feature) in the model
""" 
For binary logistic regression, the weight matrix is a column
vector with a shape of (n_features + 1, 1), where n_features 
is the number of predictor variables in the dataset. The "+1"
accounts for the bias term (also known as the intercept), which 
is not associated with any specific feature but represents the 
base output when all feature values are zero.

Initially, the weight matrix is often set to small random values 
or to zeros. Starting with small random values can help break the 
symmetry during the optimization process, while initializing the 
weights to zero results in a simpler, more interpretable model. 

The weight matrix is then iteratively updated during the training 
process using an optimization algorithm, such as gradient descent, 
to minimize the loss function.
"""
# Initialize with small random values
W = np.random.random(m + 1, 1) * 0.01

# Or initialize with zeros
# W = np.zeros((m + 1, 1))

##################################################################################
""" 
Calculating A: Given a feature matrix X with dimensions (n, m), where n is the 
number of features and m is the number of training examples, and a weight vector
W with dimensions (n, 1), you first calculate the weighted sum Z:
Z = W^T * X + b
Z = np.dot(wt, X) + b
next, for Logistic Regression, a sigmoid() function is applied to Z
A = sigmoid(Z)
where the sigmoid function is 
sigmoid(z) = 1/ (1 + exp(-z)) 
"""
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# so A is then equal to 
# This function maps any input value z to a probability value between 0 and 1.
A = sigmoid(Z)

# the partial derivitives of the cost function as a part of the linear output predictions
dZ = A - Y

# the partial derivitaves of cost function as a component of the weights
# calculated as an average of the cost through all weights
dW = (1/w) * np.sum(dZ)

# Now here is where some vectorization magic occurs, instead of using for loops,
# which are huge No No's for Deep Learning we can use our numpy vector operations to
# drastically reduce the runtime of the program

# TODO - lol, actually not sure if this line of code is correct...
W = W - sigmoid(dW) 

# TODO - same with this one...
b = (1/m) * np.sum(dZ)

## Only One For Loop
For the most efficient implementation of gradient descent, the only for loop should be
the one which controls the total number of optimizations are run for the regression routine. The rest of the code should take advantage of vectorization to optimise the calculation times.

## Element wise and Matrix Multi - The Differences

Element-wise multiplication and matrix multiplication are two different types of mathematical operations on arrays in linear algebra.

Element-wise multiplication, also called Hadamard product, is performed by multiplying corresponding elements of two arrays or matrices of the same shape. The resulting array will have the same shape as the input arrays, and each element in the output array is the product of the corresponding elements in the input arrays. In NumPy, this operation is performed using the * operator.

For example, given two arrays:


In [None]:
Copy code
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

c = a * b
c = np.array([4, 10, 18])


Matrix multiplication, on the other hand, is a more complex operation, where two matrices are multiplied together to produce a new matrix. The key requirement is that the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix will have the number of rows of the first matrix and the number of columns of the second matrix. In NumPy, this operation is performed using the dot() function or the @ operator.


In [None]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.dot(A, B)
C = np.array([[19, 22], [43, 50]])



For example, given two matrices:


Element-wise multiplication multiplies corresponding elements of two arrays or matrices, while matrix multiplication multiplies two matrices.
Element-wise multiplication results in an array of the same shape as the input arrays, while matrix multiplication results in a matrix with the number of rows of the first matrix and the number of columns of the second matrix.
Element-wise multiplication is performed using the * operator in NumPy, while matrix multiplication is performed using the dot() function or the @ operator.