<a href="https://colab.research.google.com/github/mrtyagi07/data-science/blob/main/accept_request_lr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**An online platform that suggests users connect with each other based on certain features. In this scenario, we can use logistic regression to build a model that predicts whether a user is likely to accept a connection request from another user.** ✈

## Generate synthetic data

In [48]:
# import packages
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

In [49]:
# Generate synthetic data
np.random.seed(42)
n_samples=1000

In [50]:
# Features
interest=np.random.randint(0,10,n_samples)
mutual_connections=np.random.randint(0,20,n_samples)
activity_level=np.random.uniform(0,100,n_samples)
profile_completeness=np.random.uniform(0,1,n_samples)
distance=np.random.uniform(0,100,n_samples)

In [51]:
# Target variable: 1 for accepted, 0 for not accepted
acceptance=np.random.choice([0,1],n_samples)

In [52]:
# Create a DataFrame
data=pd.DataFrame(
    {
        'Interest':interest,
        'Mutual Connections':mutual_connections,
        'Activity Level':activity_level,
        'Profile Completeness':profile_completeness,
        'Distance':distance,
        'Acceptance':acceptance,
    }
)
# Print DatFrame
data

Unnamed: 0,Interest,Mutual Connections,Activity Level,Profile Completeness,Distance,Acceptance
0,6,0,10.410965,0.520170,57.931447,1
1,3,14,72.433882,0.142876,48.962446,0
2,7,7,57.838692,0.775346,64.474525,1
3,4,10,27.416067,0.271409,22.982673,0
4,6,11,7.941937,0.496695,55.266227,1
...,...,...,...,...,...,...
995,9,8,24.193168,0.126002,67.170886,0
996,9,16,71.395263,0.196419,72.930980,1
997,7,10,82.253479,0.951445,57.506057,1
998,1,6,80.395851,0.175492,20.681789,1


In [53]:
# Split the data into training and testing sets

# This line creates a new DataFrame X by removing the column labeled 'Acceptance' from the original DataFrame data. (FEATURES)
X=data.drop('Acceptance',axis=1)
# The resulting DataFrame X contains all the features (independent variables) except the 'Acceptance' column.

# TARGET
y=data['Acceptance']
# The 'Acceptance' column contains the target variable (dependent variable) that we want to predict.
# In this case, it likely represents whether a user accepted a connection request (1 for accepted, 0 for not accepted).

# X and y: The feature matrix and target vector.
# test_size=0.2: This parameter determines the proportion of the dataset that should be used for testing. In this case, 20% of the data will be used for testing, and the remaining 80% will be used for training.
# random_state=42: This parameter sets the seed for the random number generator. Setting a seed ensures reproducibility,
# meaning that if you run the code multiple times with the same seed, you will get the same split. This is important for consistency in model evaluation
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

## Build and train the logistic regression model

In [54]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,classification_report

In [55]:
# Create a logistic regression model
model=LogisticRegression()

In [56]:
# Train the model
model.fit(X_train, y_train)

## Make predictions and evaluate the model

In [57]:
# Make predictions on the test set
y_pred=model.predict(X_test)
y_pred


array([1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1,
       0, 1])

In [58]:
# Evaluate the model
accuracy=accuracy_score(y_test,y_pred)
classification_rep=classification_report(y_test,y_pred)

In [59]:
# Print the results
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_rep)

Accuracy: 0.51
Classification Report:
               precision    recall  f1-score   support

           0       0.46      0.18      0.26        95
           1       0.52      0.81      0.63       105

    accuracy                           0.51       200
   macro avg       0.49      0.49      0.45       200
weighted avg       0.49      0.51      0.46       200



## Use the model for user suggestions

In [60]:
# Generate synthetic data for new users
new_users_data = {
    'Interest': [8, 3, 6, 1, 10],
    'Mutual Connections': [15, 5, 8, 3, 18],
    'Activity Level': [0.7, 0.4, 0.6, 0.2, 0.9],
    'Profile Completeness': [0.8, 0.5, 0.7, 0.3, 0.9],
    'Distance': [50, 20, 30, 5, 80],
}

new_users_df = pd.DataFrame(new_users_data)

In [61]:
# Assuming 'model' is your trained logistic regression model
new_users_predictions = model.predict_proba(new_users_df)[:, 1]

# Set a threshold for connection suggestion
suggestion_threshold = 0.5
suggested_connections = new_users_df[new_users_predictions > suggestion_threshold]

# Display suggested connections
print("Suggested Connections:")
print(suggested_connections)

Suggested Connections:
   Interest  Mutual Connections  Activity Level  Profile Completeness  \
4        10                  18             0.9                   0.9   

   Distance  
4        80  
