# Unique Segment Risk Score & Action Tiers for each individual client.

## Mechanics

Instead of using group averages like now (Avg dp1, Avg dp3, Avg Arrears), we directly use each client's own values from their profile:


Field	Value Used
- Client's own Arrears Balance	How much money they owe
- Client's own dp1 Score	Their personal credit score
- Client's own dp3 Score	Their personal alternate credit score
- Client's own Mobile Availability	Y/N
- Client's own Email Availability	Y/N

## FORMULA 

Segment Risk Score = 
  (Individual Arrears / 100)
+ +(100 - Individual dp1 Score)
+ +(100 - (Individual dp3 Score) / 10)
+ +(100 - Mobile Flag (1 or 0) * 100)/2
+ +(100 - Email Flag (1 or 0) * 100)/3

- 

- 


Segment_Risk_Score = 
    (Arrears Balance / 100) 
+    + (100 - dp1 Score) 
+    + (100 - (dp3 Score) / 10) 
+    + ((1 - Mobile Availability) * 100 / 2)
+    + ((1 - Email Availability) * 100 / 3)

If Mobile is 1 (available), then (1-1)=0  - > no penalty.
If Mobile is 0 (not available), then (1-0)=1 - > penalty added.

## Good Things


- More precise	Every client gets risk score based on their actual behavior
- Better targeting	No "group dilution" — high-risk people won't get hidden in good groups
- For machine learning - in	Future if we build ML models we can learn better from individual signals

## Challenges 

- Missing values	- Some clients might have missing dp1, dp3, arrears, or mobile/email info. Need to handle it carefully.
- Noise	- Very low data on some fields can create weird scores (outliers).
- Complexity	- Harder to explain to non-technical business people ("Why is this client 135.2 risk and this one 118.4 risk?")

In [3]:
import pandas as pd


df = pd.read_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Modified/cleaned_connected_data_with_zones.csv")

# 1. Clean mobile and email flags
df['Mobile Flag'] = df['Mobile Flag'].str.strip().str.upper()
df['Email Flag'] = df['Email Flag'].str.strip().str.upper()

# 2. Create binary flags
df['Mobile_Available'] = df['Mobile Flag'].apply(lambda x: 1 if x == 'Y' else 0)
df['Email_Available'] = df['Email Flag'].apply(lambda x: 1 if x == 'Y' else 0)

# 3. Drop rows with missing important fields
important_fields = ['Arrears Balance', 'dp1 Score', 'dp3 Score']
df_clean = df.dropna(subset=important_fields)

# 4. Calculate Individual Segment Risk Score
df_clean['Segment_Risk_Score'] = (
    (df_clean['Arrears Balance'] / 100)
    + (100 - df_clean['dp1 Score'])
    + (100 - (df_clean['dp3 Score'] / 10))
    + ((1 - df_clean['Mobile_Available']) * 100 / 2)
    + ((1 - df_clean['Email_Available']) * 100 / 3)
).round(2)

# 5. Save 
df_clean.to_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Unique_Scores_for_Client/clients_with_individual_segment_score.csv", index=False)

print("\n Successfully saved file with individual client risk scores")



 Successfully saved file with individual client risk scores


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_clean['Segment_Risk_Score'] = (


# Assign Action Tiers to Individual Clients based on their Segment Risk Score

In [None]:
import pandas as pd

# Step 1: Load the individual client score file
df = pd.read_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Unique_Scores_for_Client/clients_with_individual_segment_score.csv")

# Step 2: Define a function to assign Action Tier based on Segment_Risk_Score 
def assign_action_tier(score):
    if score < 70:
        return 'Tier 1 - Low Risk'
    elif score < 100:
        return 'Tier 2 - Medium Risk'
    else:
        return 'Tier 3 - High Risk'

# Step 3: Apply the function to each client
df['Action_Tier'] = df['Segment_Risk_Score'].apply(assign_action_tier)

# Step 4: Save the updated file
df.to_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Unique_Scores_for_Client/clients_with_individual_segment_score_with_tiers.csv", index=False)

print("\n Successfully assigned Action Tiers and saved new file!")


In [None]:
import pandas as pd

# Step 1:  dataset
df = pd.read_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Unique_Scores_for_Client/clients_with_individual_segment_score_with_tiers.csv")

# Step 2: Sort clients by Segment Risk Score in ascending order
df_sorted = df.sort_values(by='Segment_Risk_Score', ascending=True)

# Step 3:  sorted file
df_sorted.to_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Unique_Scores_for_Client/clients_sorted_by_risk.csv", index=False)

print("\n Successfully sorted clients by risk and saved to 'clients_sorted_by_risk.csv'!")


##  PPT 

In [1]:
import pandas as pd

df = pd.read_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/Dataset/Unique_Scores_for_Client/clients_with_individual_segment_score_with_tiers.csv")


print(df.head())


         UID Account Open Date  Arrears Balance Last Payment Date  \
0  320509935        1996-07-24             16.0        2025-03-19   
1  920546839        2007-01-10            534.0        2025-03-02   
2  857544841        2005-07-09           2821.0               NaN   
3  741235324        2013-12-04            309.0        2025-03-28   
4  153693439        2016-10-17           3879.0               NaN   

  Last Collections Action Mobile Flag Email Flag dp1 Result Code  \
0                     SMS           Y          N        Deceased   
1                     SMS           Y          N        Resident   
2                   Phone           N          N        Resident   
3                     SMS           Y          N        Resident   
4                   Phone           N          N        Resident   

  dp1 IVA/CCJ Flag  dp1 Score  ... dp3 Match Level dp3 Score dp3 Mobile Flag  \
0                N        0.0  ...             7.0     963.0               Y   
1               

In [2]:
df.head().to_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/sample_client_scores.csv", index=False)


In [4]:
import pandas as pd

df = pd.read_csv("/Users/rg/ACADEMICS/Interview/Connected Data Comapany/MAY/sample_client_scores.csv")


print(df.head())


         UID  dp1 Score  dp3 Score  Arrears Balance  \
0  320509935        0.0      963.0             16.0   
1  920546839       76.0      855.0            534.0   
2  857544841       39.0      665.0           2821.0   
3  741235324       82.0      893.0            309.0   
4  153693439       16.0      296.0           3879.0   

                    Residency Zone  Segment_Risk_Score           Action_Tier  
0   Zone 4 - Neither says Resident              137.19    Tier 3 - High Risk  
1  Zone 2 - Only dp1 says Resident               77.17  Tier 2 - Medium Risk  
2               Zone 1 - Agreement              206.04    Tier 3 - High Risk  
3               Zone 1 - Agreement               65.12     Tier 1 - Low Risk  
4               Zone 1 - Agreement              276.52    Tier 3 - High Risk  


### 🎯 Segment Risk Score & Action Tier Assignment

We engineered a custom **Segment Risk Score** using the following weighted formula:

\[
\text{Segment Risk Score} = \frac{\text{Avg Arrears}}{100} + (100 - \text{dp1}) + \frac{(100 - \text{dp3})}{10} + \frac{(100 - \text{Pct_Mobile})}{2} + \frac{(100 - \text{Pct_Email})}{3}
\]

Each client is then classified into an **Action Tier** based on their score:

| Tier     | Score Range | Suggested Strategy             |
|----------|-------------|--------------------------------|
| Tier 1   | < 70        | Light-touch (email/SMS)        |
| Tier 2   | 70–99       | Moderate effort (calls)        |
| Tier 3   | ≥ 100       | Escalated / high effort        |

🧩 These two new dimensions — `Segment Risk Score` and `Action Tier` — were added to the original dataset.  
This allows for **targeted debt recovery strategies** per client based on quantified risk.
