<a href="https://colab.research.google.com/github/sandhya111205/Ai-and-ml/blob/main/LR_With_without_Built_In_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Logistic regression** - is a powerful and widely used algorithm in supervised learning within machine learning. It's primarily used for classification tasks, particularly those involving binary outcomes: yes/no, true/false, or 0/1.

**What it does:**
It models the relationship between one or more independent variables (features) and a single binary dependent variable.
It predicts the probability of an event occurring, instead of simply providing a yes/no answer.
This probability output allows for more nuanced interpretations and decision-making.

**How it works:**
Under the hood, it utilizes a sigmoid function that transforms a linear equation into a smooth S-shaped curve.
This curve maps any real-valued input from the feature space to a probability between 0 and 1.
By training the model on labeled data, the coefficients of the linear equation are adjusted to best estimate the true probabilities underlying the relationship between features and the outcome.

**When to use it:**
When you have a binary classification problem.
When you want to understand the influence of each feature on the outcome through the learned coefficients.
When interpretability and explainability of the model are important.

**Advantages:**
Simple to implement and understand.
Efficient and computationally inexpensive.
Provides probabilistic outputs for informed decision-making.
Offers interpretability through feature coefficients.

**Disadvantages:**
Limited to binary classification tasks.
Assumes a linear relationship between features and the outcome.
May not perform well with complex data or highly non-linear relationships.

Examples of applications: **bold text**
Predicting spam emails.
Identifying fraudulent transactions.
Classifying medical images as cancerous or benign.
Assessing creditworthiness of loan applicants.

**Beyond the basics:**
Logistic regression can be extended to handle multi-class classification with modifications like one-vs-rest approach.
Regularization techniques can be used to prevent overfitting and improve model generalization.
Logistic regression can be combined with other algorithms in ensemble methods for better performance.

Other Reference Links: https://developer.ibm.com/articles/implementing-logistic-regression-from-scratch-in-python/

**With_Built-In Function**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from scipy.sparse import csr_array
from sklearn import preprocessing
from sklearn.metrics import accuracy_score

In [None]:
data=pd.read_csv('/content/drive/MyDrive/AI and ML/sample_text - Sheet1.csv')
data

Unnamed: 0,tweet_id,text,task1
0,1123757263427186690,"hate wen females hit ah nigga with tht bro 😂😂,...",HOF
1,1123733301397733380,RT @airjunebug: When you're from the Bay but y...,HOF
2,1123734094108659712,RT @DonaldJTrumpJr: Dear Democrats: The Americ...,NOT
3,1126951188170199049,RT @SheLoveTimothy: He ain’t on drugs he just ...,HOF
4,1126863510447710208,RT @TavianJordan: Summer ‘19 I’m coming for yo...,NOT
...,...,...,...
995,1126798721025544193,"RT @prodnose: Good morning, everyone.\nFollowi...",NOT
996,1126833089190219777,@cheezitking123 this what you get for tryna ge...,NOT
997,1130037092845670400,earphones ko 😭😭😭😭😭😭😭,NOT
998,1127028455651123201,RT @nj_linguist: @realgonegirl @elivalley I th...,NOT


In [None]:
label_encoder = preprocessing.LabelEncoder()
data['text']= label_encoder.fit_transform(data['text'])
data['text'].unique()

array([954, 614, 443, 558, 578,  55, 826, 378, 301,  90, 366, 841, 845,
       218, 330, 274, 313, 868, 116, 519, 743, 487,  16, 226, 865, 143,
       591,  70, 468, 788, 495, 809, 516, 465, 513, 917, 180, 946, 888,
       858, 693, 982, 635,  33, 694, 110,  98, 405, 850, 125, 869, 573,
       664, 299,  65, 160, 983, 522, 786, 436, 191, 387, 866,  92, 233,
       821, 955, 564, 951, 977, 373, 822, 402, 281, 784, 633, 385, 133,
       565, 294, 476, 370,  34, 612, 260,  37, 937, 878, 461, 839, 426,
       439,  96, 474, 121, 607, 997,  62, 646, 762, 734, 806, 978, 947,
       718, 798, 339,  77, 800, 639, 340, 338, 147, 349, 374, 856,  51,
       770, 496, 262, 427, 569, 144, 900, 106, 916, 310,  61, 126, 433,
       748, 802,  24, 289, 508, 293,  59, 790, 747,  66, 155, 566, 532,
        43, 577, 510, 941, 634, 253, 592, 557, 199, 985, 610, 675, 778,
       350, 198, 602, 531,   9, 234, 749, 381, 650, 417, 148, 145, 989,
       285,   4, 724, 936, 597, 980, 604, 547, 717, 814, 714, 79

In [None]:
x = data['task1']
y= data['text']
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=0.2, random_state=1234)

In [None]:
x_train

281    HOF
42     HOF
255    NOT
906    HOF
394    NOT
      ... 
204    HOF
53     HOF
294    HOF
723    NOT
815    NOT
Name: task1, Length: 800, dtype: object

In [None]:
vectorizer = TfidfVectorizer()
train_feature = vectorizer.fit_transform(x_train)
test_feature  = vectorizer.transform(x_test)

In [None]:
print(train_feature)

  (0, 0)	1.0
  (1, 0)	1.0
  (2, 1)	1.0
  (3, 0)	1.0
  (4, 1)	1.0
  (5, 0)	1.0
  (6, 1)	1.0
  (7, 0)	1.0
  (8, 0)	1.0
  (9, 1)	1.0
  (10, 1)	1.0
  (11, 1)	1.0
  (12, 0)	1.0
  (13, 1)	1.0
  (14, 0)	1.0
  (15, 1)	1.0
  (16, 1)	1.0
  (17, 0)	1.0
  (18, 1)	1.0
  (19, 0)	1.0
  (20, 1)	1.0
  (21, 1)	1.0
  (22, 1)	1.0
  (23, 1)	1.0
  (24, 1)	1.0
  :	:
  (775, 1)	1.0
  (776, 0)	1.0
  (777, 0)	1.0
  (778, 1)	1.0
  (779, 1)	1.0
  (780, 0)	1.0
  (781, 0)	1.0
  (782, 1)	1.0
  (783, 1)	1.0
  (784, 1)	1.0
  (785, 0)	1.0
  (786, 0)	1.0
  (787, 0)	1.0
  (788, 1)	1.0
  (789, 0)	1.0
  (790, 0)	1.0
  (791, 1)	1.0
  (792, 1)	1.0
  (793, 0)	1.0
  (794, 1)	1.0
  (795, 0)	1.0
  (796, 0)	1.0
  (797, 0)	1.0
  (798, 1)	1.0
  (799, 1)	1.0


In [None]:
terms_train1= train_feature.toarray()

In [None]:
term_test1=test_feature.toarray()

In [None]:
class LogisticRegression:

    def __init__(self, learning_rate=0.001, n_iters=1000):
        self.lr = learning_rate
        self.n_iters = n_iters
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape

        # init parameters
        self.weights = np.zeros(n_features)
        self.bias = 0

        # gradient descent
        for _ in range(self.n_iters):
            # approximate output variable (y) with linear combination of weights and x, plus bias
            linear_model = np.dot(X, self.weights) + self.bias
            # apply sigmoid function
            y_predicted = self._sigmoid(linear_model)

            # compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y)) #derivative w.r.t weights
            db = (1 / n_samples) * np.sum(y_predicted - y)  #derivative w.r.t bias
            # update parameters
            self.weights -= self.lr * dw
            self.bias -= self.lr * db

    def predict(self, X):
        linear_model = np.dot(X, self.weights) + self.bias
        y_predicted = self._sigmoid(linear_model)
        y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted]
        return np.array(y_predicted_cls)

    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

In [None]:
# class LogisticRegression():
#     def __init__(self):
#         self.losses = []
#         self.train_accuracies = []

#     def fit(self, x, y, epochs):
#         x = self._transform_x(x)
#         y = self._transform_y(y)

#         self.weights = np.zeros(x.shape[1])
#         self.bias = 0

#         for i in range(epochs):
#             x_dot_weights = np.matmul(self.weights, x.transpose()) + self.bias
#             pred = self._sigmoid(x_dot_weights)
#             loss = self.compute_loss(y, pred)
#             error_w, error_b = self.compute_gradients(x, y, pred)
#             self.update_model_parameters(error_w, error_b)

#             pred_to_class = [1 if p > 0.5 else 0 for p in pred]
#             self.train_accuracies.append(accuracy_score(y, pred_to_class))
#             self.losses.append(loss)

#     def compute_loss(self, y_true, y_pred):
#         # binary cross entropy
#         y_zero_loss = y_true * np.log(y_pred + 1e-9)
#         y_one_loss = (1-y_true) * np.log(1 - y_pred + 1e-9)
#         return -np.mean(y_zero_loss + y_one_loss)

#     def compute_gradients(self, x, y_true, y_pred):
#         # derivative of binary cross entropy
#         difference =  y_pred - y_true
#         gradient_b = np.mean(difference)
#         gradients_w = np.matmul(x.transpose(), difference)
#         gradients_w = np.array([np.mean(grad) for grad in gradients_w])

#         return gradients_w, gradient_b

#     def update_model_parameters(self, error_w, error_b):
#         self.weights = self.weights - 0.1 * error_w
#         self.bias = self.bias - 0.1 * error_b

#     def predict(self, x):
#         x_dot_weights = np.matmul(x, self.weights.transpose()) + self.bias
#         probabilities = self._sigmoid(x_dot_weights)
#         return [1 if p > 0.5 else 0 for p in probabilities]

#     def _sigmoid(self, x):
#         return np.array([self._sigmoid_function(value) for value in x])

#     def _sigmoid_function(self, x):
#         if x >= 0:
#             z = np.exp(-x)
#             return 1 / (1 + z)
#         else:
#             z = np.exp(x)
#             return z / (1 + z)

#     def _transform_x(self, x):
#         x = copy.deepcopy(x)
#         return x.values

#     def _transform_y(self, y):
#         y = copy.deepcopy(y)
#         return y.values.reshape(y.shape[0], 1)

In [None]:
itr=[]
acc=[]

In [None]:
regressor = LogisticRegression(learning_rate=0.0001, n_iters=383)#learning_rate=0.0001, n_iters=383

In [None]:
regressor.fit(terms_train1, y_train)

In [None]:
predictions = regressor.predict(term_test1)
itr.append(500)

In [None]:
print("LR classification accuracy:", accuracy(y_test, predictions))
acc.append(accuracy(y_test, predictions))

LR classification accuracy: 0.0


**Without_Built-In Function**

In [None]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()#tol=0.0001, max_iter=383,class_weight=None,random_state=None
clf.fit(train_feature, y_train)

In [None]:
y_pred = clf.predict(test_feature)

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Accuracy: 0.00%
