In [None]:
(1)In the context of logistic regression, 
the prediction function is the sigmoid function, which maps the input features to a probability between 0 and 1. 
The cost function used in logistic regression is the cross-entropy loss function, 
which measures the difference between the predicted probabilities and the actual labels.

In [7]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def predict(theta, X):
    return sigmoid(np.dot(X, theta))

def cost_function(theta, X, y):
    m = len(y)
    h = predict(theta, X)
    cost = -1/m * np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))
    return cost

In [None]:
(2)There are several types of logistic regression models based on the number of categories in 
the target variable and the problem being solved. Some common types of logistic regression include: 
1. Binary Logistic Regression: This is the most common type of logistic regression where the target variable 
    has only two possible outcomes, typically coded as 0 and 1. It is used for binary classification problems. 
2. Multinomial Logistic Regression: In multinomial logistic regression, the target variable has three or more 
    categories that are not ordered. This type of logistic regression is used for multi-class classification 
    problems where the classes are not ordered. 
3. Ordinal Logistic Regression: In ordinal logistic regression, the target variable has three or more ordered categories. 
    This type of logistic regression is used when the target variable has a natural order or ranking. 
4. Regularized Logistic Regression: Regularized logistic regression includes regularization terms in the cost 
function to prevent overfitting. Common regularization techniques include L1 (Lasso) regularization and 
L2 (Ridge) regularization. 
5. Imbalanced Logistic Regression: Imbalanced logistic regression is used when the target variable in the 
    dataset is imbalanced, meaning that one class is much more prevalent than the other class. 
    Techniques such as oversampling, undersampling, or using class weights can be used to handle imbalanced datasets.
    These are some of the common types of logistic regression models used in different machine learning and 
    statistical applications. The choice of which type of logistic regression to use depends on the nature of the 
    problem and the characteristics of the data.

In [None]:
(3)Linear regression and logistic regression are two different types of regression models used in statistical analysis
and machine learning. Here are some key differences between the two:
    1. Target Variable: - Linear Regression: In linear regression, the target variable is continuous and can take 
            any real value. The goal is to predict a quantitative outcome.
            - Logistic Regression: In logistic regression, the target variable is categorical and is used to predict 
                the probability of a binary outcome (e.g., yes/no, 0/1). 
    2. Output: - Linear Regression: The output of linear regression is a continuous value that represents the 
            predicted outcome. - Logistic Regression: The output of logistic regression is a probability value 
                between 0 and 1, which can be converted into class labels based on a threshold 
                (e.g., if p >= 0.5, class 1; else class 0). 
    3. Model Assumptions: - Linear Regression: Linear regression assumes a linear relationship between the independent
            variables and the target variable. - Logistic Regression: Logistic regression does not assume a linear 
                relationship between the independent variables and the target variable. 
                Instead, it models the log-odds of the probability of the target variable. 
    4. Cost Function: - Linear Regression: The cost function in linear regression is based on the sum of squared errors 
            (SSE) or mean squared error (MSE). - Logistic Regression: The cost function in logistic regression is based
                on the log-likelihood function, which is used to maximize the likelihood of observing the target variable
                given the input features.

In [8]:
(4)import numpy as np

# Dataset
X = np.array([[0.5, 1],
              [1, 2],
              [1.5, 2.5],
              [2, 3]])
y = np.array([0, 0, 1, 1])

# Preprocess the data
m = len(y)
X = np.hstack((np.ones((m, 1)), X))  # Add bias term

# Initialize parameters
theta = np.zeros(X.shape[1])

# Gradient Descent parameters
alpha = 0.01
iterations = 3

# Gradient Descent
for i in range(iterations):
    h = sigmoid(np.dot(X, theta))
    gradient = np.dot(X.T, (h - y)) / m
    theta -= alpha * gradient

# Optimized parameters after 3 iterations
print("Optimized parameters after 3 iterations:", theta)

# Prediction for [1, 1.5] with optimized parameters
new_X = np.array([1, 1.5])
new_X = np.insert(new_X, 0, 1)  # Add bias term
prediction = predict(theta, new_X)
print("Prediction for [1, 1.5] with optimized parameters:", prediction)

Optimized parameters after 3 iterations: [-7.27555640e-05  7.39376323e-03  9.20007767e-03]
Prediction for [1, 1.5] with optimized parameters: 0.505280084756206


In [None]:
(5)The K-Nearest Neighbors (KNN) algorithm is a simple yet powerful supervised machine learning algorithm used 
for both classification and regression tasks. It's a non-parametric and instance-based learning algorithm, 
meaning it doesn't make assumptions about the underlying data distribution and learns directly from the 
training instances themselves.

Here's how the KNN algorithm works:

Store Training Data: KNN stores all available cases and their class labels (for classification) or the output 
    values (for regression).
Choose the Number of Neighbors (K): K is a hyperparameter that represents the number of nearest neighbors to consider 
    when making predictions. It's typically chosen based on experimentation and cross-validation.
Calculate Distance: For a given unseen input instance, the algorithm calculates the distance between that instance 
    and all other instances in the training set. Common distance metrics include Euclidean distance, 
    Manhattan distance, and Minkowski distance.
Find K Nearest Neighbors: It then identifies the K nearest neighbors based on the calculated distances. 
    These are the K instances with the smallest distances to the unseen instance.
Majority Vote (Classification) / Mean (Regression): For classification tasks, the algorithm assigns 
    the class label that is most common among the K nearest neighbors. In other words, 
    it performs a majority vote among the neighbors. For regression tasks, it calculates 
    the mean of the output values of the K nearest neighbors.
Make Prediction: Finally, it assigns the predicted class label or output value to the 
    unseen instance based on the majority vote (classification) or mean (regression) 
    calculated in the previous step.

In [None]:
(6)Choosing the optimal value of k in a K-Nearest Neighbors (KNN) model is an important step in building an effective 
predictive model. Here are some common methods to determine the optimal k value: 
    1. Cross-Validation: One of the most common methods is to use cross-validation techniques such as k-fold cross-validation. 
        This involves splitting the training data into k subsets, training the model on k-1 subsets, and 
        validating it on the remaining subset. This process is repeated for different values of k, and the one that 
        gives the best performance metric (e.g., accuracy, F1 score) is selected. 
    2. Grid Search: Use grid search or 
            random search techniques to search through a range of k values and evaluate the model's performance on a validation
            set. This allows you to systematically explore different k values and select the one that performs the best.
    3. Elbow Method: Plot the performance metric (e.g., accuracy) of the KNN model for different values of k. 
            Look for the point where the performance metric stops decreasing significantly with increasing k. 
            This point is often referred to as the "elbow point" and can be a good indication of the optimal k value.
    4. Domain Knowledge: Consider the nature of your data and the problem you are trying to solve. 
            Some datasets or problems may have an inherent optimal k value based on the underlying patterns in the data. 
            Domain knowledge can help guide the selection of k. 
    5. Experimentation: Lastly, it may be beneficial to 
                experiment with different values of k and evaluate the model's performance on a validation set.this trial and 
            error 
            approach can help you gain insights into how different k values affect the model's performance. 
                

In [9]:
(7)import numpy as np

# Given dataset
X = np.array([[0.5, 0.5],
              [0.5, 1],
              [1, 1],
              [2, 2.5],
              [2.5, 3],
              [3, 3]])
y = np.array([0, 0, 0, 1, 1, 1])

# Input to predict
new_X = np.array([1.5, 1])

# Function to calculate Euclidean distance between two points
def euclidean_distance(point1, point2):
    return np.sqrt(np.sum((point1 - point2)**2))

# Function to predict class for a given input and k
def predict_class(X_train, y_train, new_X, k):
    distances = [euclidean_distance(new_X, x) for x in X_train]
    nearest_indices = np.argsort(distances)[:k]
    nearest_labels = y_train[nearest_indices]
    unique_labels, counts = np.unique(nearest_labels, return_counts=True)
    majority_label = unique_labels[np.argmax(counts)]
    return majority_label

# Predict class for k=2 and k=3
k_values = [2, 3]
for k in k_values:
    prediction = predict_class(X, y, new_X, k)
    print(f"Prediction for [1.5, 1] with k={k}: Class {prediction}")


Prediction for [1.5, 1] with k=2: Class 0
Prediction for [1.5, 1] with k=3: Class 0


In [None]:
(8)Yes, if the dataset is imbalanced, the predictions made by the K-Nearest Neighbors (KNN) algorithm 
can be biased towards the majority class. This bias occurs because KNN relies on the majority class 
to determine the class of a new instance based on the nearest neighbors.