<a href="https://colab.research.google.com/github/olumideadekunle/-AI-Development-Workflow/blob/main/Jumia_Retail_Churn_Prediction_Lifecycle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##       Designing a Retail Churn Prediction Lifecycle for Jumia

#Introduction: Define churn & why it matters for Jumia.

#Data Science Lifecycle Steps (CRISP-DM or OSEMN):

#Business Understanding – Define churn, objectives, KPIs.

#Data Understanding – What kind of data Jumia might have (purchase history, logins, complaints, delivery issues, etc.).

#Data Preparation – Cleaning, feature engineering (frequency, recency, monetary value, customer complaints, etc.).

#Modeling – Logistic regression, Random Forest, XGBoost.

#Evaluation – Precision, recall, F1 (since false negatives are costly).

#Deployment – Integrate into CRM for retention campaigns.

#Monitoring – Track model drift, update quarterly.

#Recommendations: Loyalty programs, personalized offers, feedback loops.

#Final Touch

##Summary Section at the end.

## Business Funding Data (Hands-On with Colab)

This one requires actual data preprocessing in Python.

In [1]:
from google.colab import files
uploaded = files.upload()


Saving Business Funding Data.csv to Business Funding Data.csv


In [3]:
import pandas as pd
try:
    df = pd.read_csv("Business Funding Data.csv", encoding='utf-8')
except UnicodeDecodeError:
    try:
        df = pd.read_csv("Business Funding Data.csv", encoding='latin-1')
    except UnicodeDecodeError:
        print("Could not decode the file using utf-8 or latin-1 encoding. Please check the file encoding.")

df.head()

Unnamed: 0,Website Domain,Effective date,Found At,Financing Type,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,,2024-03-14T01:00:00+01:00,,,[],,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,,2024-05-31T02:00:00+02:00,,,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,,2024-07-24T02:00:00+02:00,,,"[""private_equity""]",,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,,2024-05-01T02:00:00+02:00,,,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,,2024-04-11T02:00:00+02:00,,,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...


In [4]:
from google.colab import sheets
sheet = sheets.InteractiveSheet(df=df)

https://docs.google.com/spreadsheets/d/1ijTMTnJBJ4Eo6EHWrgbOD2zpO0X3_ewayIJ1npU3pYE/edit#gid=0


In [5]:
df.info()
df.describe()
df.isnull().sum()
df.duplicated().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 11 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     object 
 1   Effective date             6 non-null      object 
 2   Found At                   26 non-null     object 
 3   Financing Type             8 non-null      object 
 4   Financing Type Normalized  8 non-null      object 
 5   Categories                 26 non-null     object 
 6   Investors                  13 non-null     object 
 7   Investors Count            13 non-null     float64
 8   Amount                     26 non-null     object 
 9   Amount Normalized          26 non-null     int64  
 10  Source Urls                26 non-null     object 
dtypes: float64(1), int64(1), object(9)
memory usage: 2.4+ KB


np.int64(0)

In [7]:
df.fillna("Unknown", inplace=True)   # for categorical
df['Amount Normalized'].fillna(df['Amount Normalized'].median(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Amount Normalized'].fillna(df['Amount Normalized'].median(), inplace=True)


In [8]:
df.drop_duplicates(inplace=True)


In [10]:
df['Website Domain'] = df['Found At'].str.strip().str.title()


In [12]:
df = pd.get_dummies(df, columns=['Financing Type', 'Categories'], drop_first=True)

In [14]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['Amount Normalized']] = scaler.fit_transform(df[['Amount Normalized']])

In [15]:
df.to_csv("Cleaned_Business_Funding_Data.csv", index=False)
files.download("Cleaned_Business_Funding_Data.csv")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Answer the Questions (in Markdown/Text cells)

Observations from data exploration

Steps taken to clean/preprocess

Justifications for each step

Reflection on importance of preprocessing

At the end,I have two notebooks:

Jumia_Retail_Churn_Prediction_Lifecycle.ipynb (theory write-up).

Business_Funding_Data_Preprocessing.ipynb (hands on with cleaned dataset).