<a href="https://colab.research.google.com/github/shamiraty/ANOMALIES-DETECTION-AND-TREATMENT/blob/main/SUPERVISED%20MULTIPLE%20REGRESSION%20ANALYSIS%20FOR%20PROJECT%20SUPERVISION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PREDICTIVE ANALYTICS**
## **SUPERVISED MULTIPLE REGRESSION ANALYSIS FOR PROJECT SUPERVISION**
___
## **Key Objectives:**
- To explore the predictive power of supervised multiple regression analysis in project supervision.
- To evaluate the impact of family dynamics on the ability to supervise multiple projects.
- To provide actionable insights for optimizing project management strategies based on regression analysis.

## **Domains:**
- Educational Research
- Operational Research
- Information Systems
- Project Management
- Statistics
- Healthcare and Medicine
- Finance and Economics

## **FINDINGS**

## **KEY METRICS**

### **Negative Correlation:**

- 0: No Correlation
- 0 to 0.4: Weak Negative Correlation
- 0.5: Moderate Negative Correlation
- 0.6 to 0.9: Strong Negative Correlation
- 1: Perfect Negative Correlation

### **Positive Correlation:**

- 0: No Correlation
- 0 to 0.4: Weak Positive Correlation
- 0.5: Moderate Positive Correlation
- 0.6 to 0.9: Strong Positive Correlation
- 1: Perfect Positive Correlation

### **interpretation of the analysis**

> The Coefficient of Determination (R-squared) value of 0.2612 suggests that approximately 26.12% of the variance in the number of projects supervised can be explained by the number of family dependents and wives.

> With an R-squared of 0.2612, it indicates a weak to moderate positive correlation between the number of family dependents, wives, and the number of projects supervised.

> This suggests that while there is some influence of family dynamics (dependents and wives) on the ability to supervise projects, it is not the sole determinant. Other factors not included in the model may also play a significant role.

> Therefore, having one person's wife and dependent may have some influence, but it's not a strong determinant factor in supervising many projects.

## 1. IMPORT LIBRARIES

In [5]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import plotly.express as px
import plotly.graph_objs as go

### 2. LOAD DATASET

In [6]:
# Load the data
df = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vRTGib3CNme9UJVor9GXUEgEJ9jz10epYli0ZiJAR9d_t_zMOgpL2OxIWeWkzi09g/pub?output=csv')

### 3. DETECT DATA ANOMALIES

In [7]:
null_counts = df.isnull().sum()
fig = px.bar(
    x=null_counts.index,
    y=null_counts.values,
    labels={'x': 'Columns', 'y': 'Null Count'},
    title='Count of Null Values by Column',
)
fig.update_layout(plot_bgcolor='rgba(0,0,0,0)',)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
print(df.isnull())


     Name  Dependant  Wives  Projects
0   False      False  False     False
1   False      False  False     False
2   False       True  False      True
3   False      False  False     False
4   False      False  False     False
5   False       True  False     False
6   False      False  False     False
7   False      False   True      True
8   False      False  False     False
9   False       True  False     False
10  False      False  False     False
11  False      False  False     False
12  False      False  False      True
13  False       True  False     False
14  False      False  False     False
15  False      False  False     False
16  False      False  False     False


### 4. REPLACE EMPTY VALUE NEAR POPULATION MEAN

In [8]:
from sklearn.impute import SimpleImputer
# Initialize the imputer
imputer = SimpleImputer(strategy='mean')
# Columns to be imputed
columns_to_impute = ['Dependant', 'Wives', 'Projects']
# Impute missing values in the selected columns
df[columns_to_impute] = imputer.fit_transform(df[columns_to_impute])
# After imputation (check if there are still any missing values)
print(df[df[columns_to_impute].isnull().any(axis=1)])
print(df.isnull())

Empty DataFrame
Columns: [Name, Dependant, Wives, Projects]
Index: []
     Name  Dependant  Wives  Projects
0   False      False  False     False
1   False      False  False     False
2   False      False  False     False
3   False      False  False     False
4   False      False  False     False
5   False      False  False     False
6   False      False  False     False
7   False      False  False     False
8   False      False  False     False
9   False      False  False     False
10  False      False  False     False
11  False      False  False     False
12  False      False  False     False
13  False      False  False     False
14  False      False  False     False
15  False      False  False     False
16  False      False  False     False


### 5. FEATURE SELECTION

In [9]:
X = df[['Dependant', 'Wives']]
Y = df['Projects']

### 6. FIT A LINEAR REGRESSION MODEL

In [10]:
model = LinearRegression()
model.fit(X, Y)

### 7. PREDICTION

In [11]:
predictions = model.predict(X)

### 8. REGRESSION COEFFICIENTS (BO, B1, B2)

In [12]:
intercept = model.intercept_
coefficients = model.coef_

### 9. CALCULATE R-SQUARED COEFFICIENT OF DETERMINATION

In [13]:
r2 = r2_score(Y, predictions)

### 10. CALCULATE ADJUSTED R-SQUARED

In [14]:
n = len(Y)
p = X.shape[1]
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)

### 11. CALCULATE SUM SQUARED ERROR (SSE) AND SSR

In [15]:
sse = np.sum((Y - predictions)**2)
ssr = np.sum((predictions - np.mean(Y)) ** 2)

### 12. DISPLAY REGRESSION COEFFICIENTS

In [16]:
print(f'INTERCEPT (Bo): {intercept:.4f}')
print(f'B1 COEFFICIENT for X1 (number of Dependants): {coefficients[0]:.4f}')
print(f'B2 COEFFICIENT for X2 (number of Wives): {coefficients[1]:.4f}')

INTERCEPT (Bo): 7.5749
B1 COEFFICIENT for X1 (number of Dependants): -0.7130
B2 COEFFICIENT for X2 (number of Wives): 0.2095


### 13. DISPLAY MEASURES OF VARIATIONS

In [17]:
print(f'R-SQUARED (Coefficient of Determination): {r2:.4f}')
print(f'ADJUSTED R-SQUARED: {adjusted_r2:.4f}')
print(f'SUM SQUARED ERROR (SSE): {sse:.4f}')

R-SQUARED (Coefficient of Determination): 0.2612
ADJUSTED R-SQUARED: 0.1556
SUM SQUARED ERROR (SSE): 92.5096


### 14. CREATE PREDICTION TABLE DATAFRAME

In [18]:
# Prediction Table
result_df = pd.DataFrame({'Name': df['Name'],
                          'No of Dependant': df['Dependant'],
                          'No of Wives': df['Wives'],
                          'Projects Actual Y': Y,
                          'Y_predicted': predictions,
                          'SSE': sse,
                          'SSR': ssr})
display(result_df.head())

Unnamed: 0,Name,No of Dependant,No of Wives,Projects Actual Y,Y_predicted,SSE,SSR
0,Kim,1.0,2.0,6.0,7.280923,92.509612,32.704674
1,Sameer,3.0,3.0,5.0,6.064449,92.509612,32.704674
2,Jones,4.230769,4.0,5.357143,5.396421,92.509612,32.704674
3,John,1.0,1.0,11.0,7.071438,92.509612,32.704674
4,Hafsa,2.0,7.0,4.0,7.615371,92.509612,32.704674


### 15. RESIDUALS AND LINE OF BEST FIT

In [19]:
residuals = Y - predictions
residuals_df = pd.DataFrame({'Actual': Y, 'Predicted': predictions, 'Residuals': residuals})
display(residuals_df.head())

Unnamed: 0,Actual,Predicted,Residuals
0,6.0,7.280923,-1.280923
1,5.0,6.064449,-1.064449
2,5.357143,5.396421,-0.039279
3,11.0,7.071438,3.928562
4,4.0,7.615371,-3.615371


### 16. SCATTER PLOT WITH BEST FIT LINE

In [20]:
scatter_fig = go.Figure()
scatter_fig.add_trace(go.Scatter(x=Y, y=predictions, mode='markers', name='Predicted'))
scatter_fig.add_trace(go.Scatter(x=[min(Y), max(Y)], y=[min(Y), max(Y)], mode='lines', name='Best Fit Line', line=dict(color='red', dash='dash')))
scatter_fig.update_layout(title='Actual vs Predicted Projects', xaxis_title='Actual Y | number of Projects', yaxis_title='Predicted Y')
scatter_fig.update_layout(plot_bgcolor='rgba(0,0,0,0)',)
scatter_fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
scatter_fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgray')
scatter_fig.show()

In [21]:
scatter_fig = px.scatter(df, x=df.index, y='Projects', color='Projects',
                         color_discrete_map={1: 'blue', -1: 'red'},
                         labels={'color': 'Anomaly'})

scatter_fig.update_layout(
    showlegend=True,
    plot_bgcolor='rgba(0,0,0,0)',
    xaxis=dict(showgrid=True, gridwidth=1, gridcolor='skyblue'),
    yaxis=dict(showgrid=True, gridwidth=1, gridcolor='skyblue'),
)
scatter_fig.show()

# 🔆 Contact Information

> WhatsApp
- +255675839840
- +255656848274 <hr>

> YouTube
[YouTube Channel](https://www.youtube.com/channel/UCjepDdFYKzVHFiOhsiVVffQ)

> Telegram
- +255656848274
- +255738144353
  
> PlayStore
[PlayStore Developer Page](https://play.google.com/store/apps/dev?id=7334720987169992827&hl=en_US&pli=1)

> GitHub
[GitHub Profile](https://github.com/shamiraty/)