# Predictive Modeling: Donor Retention in Non-Profit Organizations

## Introduction
Non-Profit Organizations (NPOs) play a vital role in social, educational, and economic development, but they rely heavily on fundraising, which makes understanding donor behavior critical, as one of the major challenges NPOs face is maintaining donor engagement and loyalty over time.


This project aims to tackle this issue by analyzing donor behavior to identify both loyal and at-risk donors. The insights gained will support NPOs in designing more effective fundraising and communication strategies, ultimately enhancing their ability to sustain and grow their initiatives.

To achieve this, we use the [Donor Data](https://www.kaggle.com/datasets/maheshpandey87/donor-data/data)
from Kaggle, which comprises six tables detailing donors, donations, campaigns, project results, and engagement activities. These data provide a comprehensive view of donor interactions and contributions, forming the basis for our analysis.

 ## Methodology
 #### **Exploratory data Analysis**
 - data preparation
 - business understanding
 - data understanding
 #### **Major questions**
These questions deepen our grasp of the business and data context, steering the analysis toward the NPO’s core concerns and enabling a thorough exploration of donor dynamics.

This part is crucial, as it serves as a compass that guides our analysis and helps address stakeholders' concerns or uncover relevant information.
#### **Modeling**
""'''''
#### **Model Evaluation**
### **Recommendations**
### **References**
#### **Contact information**
the contact information of the two data scientist who work on this project is provided.
The scientist colaborated equally on this project.
- name
- email
- phone number
- LinkedIn


## EDA – Exploratory Data Analysis

### Data preparation

**Libraries importation**

In [1]:
import pandas as pd
donors= pd.read_csv('donors.csv')
campaigns= pd.read_csv('campaigns.csv')
donations= pd.read_csv('donations_linked.csv')
engagement_history= pd.read_csv('engagement_history.csv')
engagement_outcomes =pd.read_csv('engagement_outcomes.csv')
impact= pd.read_csv('impact.csv')

**Merging all Tables**

In [2]:
Donors_donations= pd.merge(donors,donations , on='DonorID', how='left')
Donors_donations_Campaigns= pd.merge(Donors_donations, campaigns, on='CampaignID', how='left')
Donors_donations_Campaigns_engagement_history =pd.merge(Donors_donations_Campaigns, engagement_history, on='DonorID',  how='left')
Donors_donations_Campaigns_engagement_history_impacts =pd.merge(Donors_donations_Campaigns_engagement_history,impact, on ='CampaignID', how='left')

comment: there are 7 tablesin the dataset, we merge 6 of them because donors.csv and donations.csv are the same tables.

In [3]:
# all tables are merged into one
df=Donors_donations_Campaigns_engagement_history_impacts

In [4]:
# the merged table
df.head()

Unnamed: 0,DonorID,Age,Gender,Location,JoinDate,DonationID,DonationDate,Amount,CampaignID,CampaignName,StartDate,EndDate,TargetAmount,ActualAmount,Channel,Date,EngagementOutcome,ImpactType,Value,Cost
0,DNR00001,56,Male,QLD,2020-10-03 22:59:27.552825,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Workshops Delivered,31.0,100.0
1,DNR00001,56,Male,QLD,2020-10-03 22:59:27.552825,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Youth Reached,851.0,25.0
2,DNR00001,56,Male,QLD,2020-10-03 22:59:27.552825,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Volunteers Engaged,450.0,25.0
3,DNR00001,56,Male,QLD,2020-10-03 22:59:27.552825,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Training Sessions,100.0,75.0
4,DNR00001,56,Male,QLD,2020-10-03 22:59:27.552825,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Mentorship Hours,406.0,35.0


**Table's shape**

In [5]:
df.shape

(440083, 20)

**Null values**

In [6]:
# drop rows with null values
df[df.isnull().any(axis=1)]
df = df.dropna(subset=['DonationID'])

**Duplicates**

In [7]:
# drop rows with duplicates
duplicates = df[df.duplicated()]
print("Duplicate rows:", duplicates.shape[0])

Duplicate rows: 205


In [8]:
# to verify run the following code
# df.isnull().sum().sort_values(ascending=False)

**Reformating**

In [9]:
# reformat rows for better data management.
df['JoinDate'] = pd.to_datetime(df['JoinDate'])
df['JoinDate'] = df['JoinDate'].dt.date


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['JoinDate'] = pd.to_datetime(df['JoinDate'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['JoinDate'] = df['JoinDate'].dt.date


In [10]:
# after cleaning we save the  merged dataset
df.to_csv('Data.csv',index=0)

### Business understanding

In today’s fast-evolving fundraising landscape, nonprofit organisations face growing pressure to engage supporters meaningfully, demonstrate impact, and build trust through transparency. Traditional, instinct-driven approaches no longer meet modern donor expectations

Adopting a data-informed approach is not just a technical upgrade—it’s a cultural shift that aligns mission, message, and measurement. With the right mindset and existing tools, even small organisations can use data to boost engagement and impact (Analytics for Good Institute, 2020).

Delivering measurable change increasingly depends on how well nonprofits engage and retain supporters. As donor expectations rise and attention spans shrink, sustaining long-term relationships becomes a major challenge (Fundraise, 2024). Data-driven organisations are up to three times more likely to achieve mission-aligned growth and build lasting trust (McKinsey & Company, 2021).

Our objectives focus on reducing donor churn, strengthening relationships, and improving alignment between fundraising and impact. We aim to engage donors and inspire new strategies for attraction and retention and show data inpired Analysis and modeling in the context of NPOs.







### Data Understanding
A synthetic dataset was created to reflect the operational characteristics of a nonprofit focused on youth and community welfare. It includes relational tables for donors, donations, campaigns, impact outcomes, and engagement activities. These tables are structured to support a complete engagement analysis from start to finish.

Although the data is fictitious, it reflects realistic patterns based on sector research. These patterns include donor attrition, variations in donation behavior, and inconsistent campaign performance (Virtuous, 2024). The goal is to encourage the use of data in nonprofit environments and to provide the NPO community with a clear example of what a data-informed approach can accomplish. This simulation allows for open analysis while maintaining privacy and ethical standards.

The structure of the dataset makes it possible to demonstrate how nonprofit engagement can be improved through analytics without relying on sensitive real-world data.


## Major questions
this section is a follow up of the two previous sections. it builds our business and data understanding and allow us to leverage the data to get meaningful and pertinent information.

""" 3 aspects for this analysis

In [13]:
df.head()

Unnamed: 0,DonorID,Age,Gender,Location,JoinDate,DonationID,DonationDate,Amount,CampaignID,CampaignName,StartDate,EndDate,TargetAmount,ActualAmount,Channel,Date,EngagementOutcome,ImpactType,Value,Cost
0,DNR00001,56,Male,QLD,2020-10-03,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Workshops Delivered,31.0,100.0
1,DNR00001,56,Male,QLD,2020-10-03,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Youth Reached,851.0,25.0
2,DNR00001,56,Male,QLD,2020-10-03,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Volunteers Engaged,450.0,25.0
3,DNR00001,56,Male,QLD,2020-10-03,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Training Sessions,100.0,75.0
4,DNR00001,56,Male,QLD,2020-10-03,DNT002656,2015-06-16,124.99,CAMP005,Youth Initiative 005,59:27.6,59:27.6,54063.32,62346.29,Social Media,27/12/2022,Not Reached,Mentorship Hours,406.0,35.0


""" Comparaison of campaign to outcomes.
Comparaison between outcomes and channel
maybe I can do those two stuff with a stack bar chart ?

to understand engagement we should counts donations and responses
look at how frequently donors interact

**How many time a donors is likely to donate**

In [None]:
# Group by DonorID to get donation count
donation_counts = donations.groupby('DonorID')['DonationID'].count()


""" this will later be transformed into a code cell

**Plot distribution**
plt.figure(figsize=(8, 4))
sns.histplot(donation_counts, bins=20, kde=False, color='blue')
plt.title('Number of Donations per Donor')
plt.xlabel('Donation Count')
plt.ylabel('Number of Donors')
plt.tight_layout()
plt.show()

ki pousantak done ki te bay nan premye ane an ki kontinye nan res yo ?
How much donors do we have per year?
year over year ( donor retention rate)

In [None]:
# what are the donor retention rate by year ?
 Convert donation date to datetime
donations['DonationDate'] = pd.to_datetime(donations['DonationDate'])

# Create donor-level summary
today = pd.to_datetime('today')
donor_summary = donations.groupby('DonorID').agg(
    FirstDonationDate=('DonationDate', 'min'),
    LastDonationDate=('DonationDate', 'max'),
    Frequency=('DonationID', 'count'),
    Monetary=('Amount', 'sum')
).reset_index()
donor_summary['Recency'] = (today - donor_summary['LastDonationDate']).dt.days

# Merge with full donor list
donor_full_summary = pd.merge(donors[['DonorID']], donor_summary, on='DonorID', how='left')
donor_full_summary['Recency'] = donor_full_summary['Recency'].fillna(np.inf)
donor_full_summary['Frequency'] = donor_full_summary['Frequency'].fillna(0)
donor_full_summary['Monetary'] = donor_full_summary['Monetary'].fillna(0)

# Assign segments
def assign_segment(row):
    r = row['Recency']
    f = row['Frequency']
    m = row['Monetary']

    if np.isinf(r) or pd.isna(r):
        return 'Never Donated'
    elif r <= 365 and f == 1:
        return 'New Donors'
    elif r <= 365 and f >= 4 and m >= 750:
        return 'Champions'
    elif r <= 1095 and f >= 4:
        return 'Loyal Donors'
    elif r <= 1095 and 2 <= f <= 4 and m >= 750:
        return 'High Value Potentials'
    elif r > 1095 and f >= 2 and m >= 250:
        return 'Lapsed but Valuable'
    elif 365 < r <= 1095 and f >= 2:
        return 'At Risk'
    elif r > 1095 and f == 1:
        return 'Lost or Inactive'
    elif f <= 2 and m < 250:
        return 'Low Frequency'
    else:
        return 'Misc Donors'

donor_full_summary['Segment'] = donor_full_summary.apply(assign_segment, axis=1)

# Return segment summary for confirmation
segment_summary = donor_full_summary['Segment'].value_counts().reset_index()
segment_summary.columns = ['Segment', 'Count']
segment_summary

In [19]:
df.ImpactType.value_counts()

Unnamed: 0_level_0,count
ImpactType,Unnamed: 1_level_1
Workshops Delivered,87889
Youth Reached,87889
Volunteers Engaged,87889
Training Sessions,87889
Mentorship Hours,87889


In [17]:
df.Channel.value_counts()

Unnamed: 0_level_0,count
Channel,Unnamed: 1_level_1
Email,98430
Phone Call,89280
Newsletter,86760
Social Media,86750
Event,78225


## Recommendations

## References
- Analytics for Good Institute. (2020). Data science for social impact: Challenges and strategies in the nonprofit sector. https://analyticsforgood.org/reports

- Analytics for Good Institute. (2020). Data science for social impact: Challenges and strategies in the nonprofit sector. https://analyticsforgood.org/reports

- Funraise. (2024). The state of nonprofit engagement. https://www.funraise.org

- Virtuous. (2024). State of nonprofit donor experience. https://www.virtuous.org/resources

## Contact information

Above: The Contact Information Of the Two Data scientist in charge of the project.

- First Name: Haender Michael

- Last Name: Jean Louis

- Email: michaelhaenderjeanlouis@gmail.com

- Phone Number: +509 41 75 0264

- LinkedIn: https://www.linkedin.com/in/michael-haender-jean-louis-4b7320316?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app


- First Name:
- Last Name :
- Enail
- Phone Number:
- LinkedIn


For further inquiries, feedback, or collaboration on this analysis, feel free to reach out.