Q1. Explain the difference between linear regression and logistic regression models. Provide an example of 
a scenario where logistic regression would be more appropriate.
Ans:-Linear Regression:

Linear regression is a supervised machine learning algorithm used for predicting a continuous outcome variable (also known as the dependent variable) based on one or more independent variables. The relationship between the variables is modeled as a linear equation, and the algorithm aims to find the best-fitting line that minimizes the sum of squared error
Key Differences:

Outcome Variable:

Linear regression predicts a continuous outcome variable.
Logistic regression predicts the probability of an instance belonging to a specific class (binary classification).
Equation Form:

Linear regression uses a linear equation.
Logistic regression uses the logistic (sigmoid) function to model probabilities.
Outinfinity to -infinity
−∞ to 
+
∞
+∞.
Logistic regression output ranges from 0 to 1 (representing probabiliies).
Application:

Linear regression is suitable for regression problems, such as predicting house prices, stock prices, or temperature.
Logistic regression is suitable for binary classification problems, like predicting whether an email is spam or not, whether a patient has a particular disease, etc.
Scenario or Logistic Regression:

Let's consider an example scenario where logistic regresion is more appropriate:

Scenario: redicting Student Admission

Problem Type: Binary classification
Objective: Predict whether a student is admitted (1) or not admitted (0) to a university based on their exam scores.
Features: Exam scores (independent variable)
Outcome: Admission status (0 or 1)s.

Q2. What is the cost function used in logistic regression, and how is it optimized?

In [None]:
import numpy as np

# Generate synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = (4 + 3 * X + np.random.randn(100, 1) > 5).astype(int)

# Add bias term to the features
X_b = np.c_[np.ones((100, 1)), X]

# Initialize model parameters
theta = np.random.randn(2, 1)

# Logistic function (sigmoid)
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Logistic loss (binary cross-entropy)
def logistic_loss(y, y_pred):
    return -np.mean(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))

# Gradient descent
def gradient_descent(X, y, theta, learning_rate, epochs):
    m = len(y)
    for epoch in range(epochs):
        logits = X.dot(theta)
        y_pred = sigmoid(logits)
        loss = logistic_loss(y, y_pred)
        
        # Compute gradients
        gradients = X.T.dot(y_pred - y) / m
        
        # Update parameters using gradient descent
        theta -= learning_rate * gradients

        if epoch % 100 == 0:
            print(f'Epoch {epoch}, Loss: {loss}')

    return theta

# Hyperparameters
learning_rate = 0.01
epochs = 1000

# Run gradient descent
theta_optimized = gradient_descent(X_b, y, theta, learning_rate, epochs)

# Display the optimized parameters
print('Optimized Parameters:')
print('Theta_0:', theta_optimized[0][0])
print('Theta_1:', theta_optimized[1][0])


Q3. Explain the concept of regularization in logistic regression and how it helps prevent overfitting.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model with L2 regularization (Ridge)
# The parameter C is the inverse of the regularization strength
# Smaller C values result in stronger regularization
model = LogisticRegression(penalty='l2', C=0.1, random_state=42)

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')


Q4. What is the ROC curve, and how is it used to evaluate the performance of the logistic regression 
model?

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train a logistic regression model
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Predict probabilities on the test set
y_probs = model.predict_proba(X_test)[:, 1]

# Compute ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'AUC = {roc_auc:.2f}')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()


Q5. What are some common techniques for feature selection in logistic regression? How do these 
techniques help improve the model's performance?
Ans:-Feature selection is a process of choosing a subset of relevant features to use in the model. In logistic regression, selecting the most informative features can lead to a simpler and more interpretable model, reduce the risk of overfitting, and potentially improve the model's performance. Here are some common techniques for feature selection in logistic regression:
1. Recursive Feature Elimination (RFE):
RFE recursively removes the least important features based on their weights and evaluates the model performance at each step.

In [None]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Create a logistic regression model
model = LogisticRegression()

# Initialize RFE with the logistic regression model
rfe = RFE(model, n_features_to_select=5)  # Choose the desired number of features

# Fit RFE and get the selected features
X_selected = rfe.fit_transform(X, y)


2. Feature Importance from Trees:
For ensemble methods like Random Forest or Gradient Boosting, you can use the feature importance scores to select the most relevant features.

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Create a Random Forest model
model_rf = RandomForestClassifier()

# Fit the model
model_rf.fit(X, y)

# Get feature importances
feature_importances = model_rf.feature_importances_

# Select top features based on importance scores
top_features = X[:, feature_importances.argsort()[-5:][::-1]]


3. L1 Regularization (Lasso):

In [None]:
from sklearn.linear_model import LogisticRegression

# Create a logistic regression model with L1 regularization
model_lasso = LogisticRegression(penalty='l1', solver='liblinear')

# Fit the model
model_lasso.fit(X, y)

# Get selected features (non-zero coefficients)
selected_features = X[:, np.abs(model_lasso.coef_)[0] > 0]


Q6. How can you handle imbalanced datasets in logistic regression? What are some strategies for dealing 
with class imbalance?
Ans:-Handling imbalanced datasets in logistic regression is crucial to ensure that the model does not become biased toward the majority class. Here are some strategies for dealing with class imbalance in logistic regression:

1. Resampling Techniques:
Oversampling the Minority Clas:

Increase the number of instances in the minority class by duplicating or generating synthetic samples (e.g., using SMOTE - Synthetic Minority Over-sampling Technique).
Undersampling the Majority Class:

Decrease the number of instances in the majority class by randomly removing 
2. Different Thresholds and Evaluation Metrics:
Adjust Classification Threshold:

By default, logistic regression uses a threshold of 0.5 for class prediction. Adjust the threshold based on the desired balance between precision and recall.
Use Appropriate Evaluation Metrics:

Instead of accuracy, use metrics like precision, recall, F1-score, or area under the precision-recall curve (AUC-PR) that are more sensitive to imbalanced cl
4. Anomaly Detection:
Treat the minority class as an anomaly and use anomaly detection techniques.
5. Ensemble Methods:
Use ensemble methods like Random Forest or Gradient Boosting, which can handle class imbalance better than individual models.asses.samples.

In [None]:
Q7. Can you discuss some common issues and challenges that may arise when implementing logistic 
regression, and how they can be addressed? For example, what can be done if there is multicollinearity 
among the independent variables?