<div style="text-align: center; padding: 20px;">
    <img src="austin-distel-744oGeqpxPQ-unsplash.jpeg" alt="​​Subscription Renewal Insights for a SaaS Company" width="450"/>
</div>


<!-- Image source - https://unsplash.com/photos/person-using-macbook-pro-744oGeqpxPQ
-->

    
A SaaS company seeks to uncover what drives its clients to renew subscriptions. They’ve collected data on client details, subscription records, and economic indicators and would like to connect them to better understand its clients’ behavior. 

They’ve tasked you with analyzing these datasets to identify the key factors influencing clients’ decisions to renew their subscriptions. 

Your analysis will provide them with insights into which customers are renewing their products and the reasons behind their renewals. The company can leverage these insights to make informed decisions to increase renewal rates and improve customer loyalty, helping them stay competitive and ensure long-term growth.


## The Data

The company have provided you with three datasets for your analysis. A summary of each data is provided below.

## `client_details.csv`

| Column         | Description|
|----------------|---------------------------------------------------------------|
| `client_id`    | Unique identifier for each client. |
| `company_size` | Size of the company (Small, Medium, Large).|
| `industry`     | Industry to which the client belongs (Fintech, Gaming, Crypto, AI, E-commerce).|
| `location`     | Location of the client (New York, New Jersey, Pennsylvania, Massachusetts, Connecticut).|

## `subscription_records.csv`

| Column             | Description   |
|--------------------|---------------|
| `client_id`        | Unique identifier for each client.|
| `subscription_type`| Type of subscription (Yearly, Monthly).|
| `start_date`       | Start date of the subscription - YYYY-MM-DD.|
| `end_date`         | End date of the subscription - YYYY-MM-DD.|
| `renewed`          | Indicates whether the subscription was renewed (True, False).|

## `economic_indicators.csv`

| Column           | Description                                       |
|------------------|---------------------------------------------------|
| `start_date`     | Start date of the economic indicator (Quarterly) - YYYY-MM-DD.|
| `end_date`       | End date of the economic indicator (Quarterly) - YYYY-MM-DD.|
| `inflation_rate` | Inflation rate in the period.|
| `gdp_growth_rate`| Gross Domestic Product (GDP) growth rate in the period.|


In [2]:
# Re-run this cell
# Import required libraries
import pandas as pd

# Import data
client_details = pd.read_csv('data/client_details.csv')
subscription_records = pd.read_csv('data/subscription_records.csv', parse_dates = ['start_date','end_date'])
economic_indicators = pd.read_csv('data/economic_indicators.csv', parse_dates = ['start_date','end_date'])

In [3]:
# checking the data before we begin any analytics
# want to make sure there are no missing values in the data
print(client_details.info())
print(subscription_records.info())
print(economic_indicators.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   client_id     100 non-null    int64 
 1   company_size  100 non-null    object
 2   industry      100 non-null    object
 3   location      100 non-null    object
dtypes: int64(1), object(3)
memory usage: 3.2+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   client_id          100 non-null    int64         
 1   subscription_type  100 non-null    object        
 2   start_date         100 non-null    datetime64[ns]
 3   end_date           100 non-null    datetime64[ns]
 4   renewed            100 non-null    bool          
dtypes: bool(1), datetime64[ns](2), int64(1), object(1)
memory usage: 3.3+ KB
None
<class 'pandas.core

### Question 1
#### How many total Fintech and Crypto clients does the company have?

In [4]:
# investigate the client information
# determine how to filter for crypto and fintech clients

print(client_details.head())
print(client_details['industry'].unique())

    client_id company_size    industry       location
0  4280387012        Large     Fintech       New York
1  2095513148        Small     Fintech     New Jersey
2  7225516707       Medium     Fintech   Pennsylvania
3  8093537819        Large      Crypto       New York
4  4387541014       Medium  E-commerce  Massachusetts
['Fintech' 'Crypto' 'E-commerce' 'AI' 'Gaming']


In [5]:
# so we know to use 'Fintech' and 'Crypto' as indsutry filters
fin_crypto = client_details[client_details['industry'].isin(['Crypto', 'Fintech'])]

# group and count by - total number of these companies
total_fintech_crypto_clients = fin_crypto.shape[0]
total_fintech_crypto_clients

47

In [13]:
print(f"There are a total of {total_fintech_crypto_clients} Fintech and Crypto clients.")

There are a total of 47 Fintech and Crypto clients.


### Question 2
#### Which industry has the highest renewal rate?

In [6]:
# will need to join the client_details table and the subscription_records table
print(client_details.head())
print(subscription_records.head())

# will join on client id, then group by industry and add renewed

    client_id company_size    industry       location
0  4280387012        Large     Fintech       New York
1  2095513148        Small     Fintech     New Jersey
2  7225516707       Medium     Fintech   Pennsylvania
3  8093537819        Large      Crypto       New York
4  4387541014       Medium  E-commerce  Massachusetts
    client_id subscription_type start_date   end_date  renewed
0  1131383004            Yearly 2020-11-11 2021-11-11    False
1  4309371709           Monthly 2021-05-24 2021-06-23     True
2  3183675157            Yearly 2021-12-25 2022-12-25     True
3  5371694837           Monthly 2020-03-14 2020-04-13     True
4  5157113076           Monthly 2019-11-07 2019-12-07    False


In [7]:
# join the tables on pd.merge
df = pd.merge(subscription_records, client_details, how='inner', left_on='client_id', right_on='client_id')
df.head()

Unnamed: 0,client_id,subscription_type,start_date,end_date,renewed,company_size,industry,location
0,1131383004,Yearly,2020-11-11,2021-11-11,False,Large,Fintech,Massachusetts
1,4309371709,Monthly,2021-05-24,2021-06-23,True,Large,E-commerce,Connecticut
2,3183675157,Yearly,2021-12-25,2022-12-25,True,Small,Gaming,New Jersey
3,5371694837,Monthly,2020-03-14,2020-04-13,True,Large,AI,Pennsylvania
4,5157113076,Monthly,2019-11-07,2019-12-07,False,Medium,Gaming,Massachusetts


In [8]:
# group by industry and count
renewed_count = df.groupby('industry')['renewed'].sum().reset_index().sort_values('renewed', ascending=False)

# extract the top industry for renewals
top_industry = renewed_count.iloc[0,0]
top_industry

'Gaming'

In [15]:
print(f"The top industry for subscription renewals is {top_industry}.")

The top industry for subscription renewals is Gaming.


### Question 3
#### For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed?

In [9]:
# since its the renewal that we are interested in, we will need to compare the end_date from subscription_records
# but only where the client renewed

# we will need to join all three tables for this 

print(client_details.head())
print(subscription_records.head())
print(economic_indicators.head())

    client_id company_size    industry       location
0  4280387012        Large     Fintech       New York
1  2095513148        Small     Fintech     New Jersey
2  7225516707       Medium     Fintech   Pennsylvania
3  8093537819        Large      Crypto       New York
4  4387541014       Medium  E-commerce  Massachusetts
    client_id subscription_type start_date   end_date  renewed
0  1131383004            Yearly 2020-11-11 2021-11-11    False
1  4309371709           Monthly 2021-05-24 2021-06-23     True
2  3183675157            Yearly 2021-12-25 2022-12-25     True
3  5371694837           Monthly 2020-03-14 2020-04-13     True
4  5157113076           Monthly 2019-11-07 2019-12-07    False
   Unnamed: 0 start_date   end_date  inflation_rate  gdp_growth_rate
0           0 2018-01-01 2018-03-31            5.77             3.51
1           1 2018-04-01 2018-06-30            1.17             2.15
2           2 2018-07-01 2018-09-30            1.56             1.82
3           3 2018-10-

In [10]:
# lets take df and filter for renewed
renewed = df[df['renewed'] == True]
renewed = renewed.sort_values('end_date', ascending=True)
renewed

# let's also take the df end date and find where it fits between start and end_date for economic data
renewed_economic = pd.merge_asof(renewed, economic_indicators, left_on='end_date', right_on='start_date', direction='backward')
renewed_economic.head()

Unnamed: 0.1,client_id,subscription_type,start_date_x,end_date_x,renewed,company_size,industry,location,Unnamed: 0,start_date_y,end_date_y,inflation_rate,gdp_growth_rate
0,4519356806,Monthly,2018-03-04,2018-04-03,True,Small,Gaming,Massachusetts,1,2018-04-01,2018-06-30,1.17,2.15
1,3683504527,Monthly,2018-04-12,2018-05-12,True,Medium,Gaming,Pennsylvania,1,2018-04-01,2018-06-30,1.17,2.15
2,7462725203,Monthly,2018-05-21,2018-06-20,True,Medium,E-commerce,Massachusetts,1,2018-04-01,2018-06-30,1.17,2.15
3,6751372012,Monthly,2018-05-29,2018-06-28,True,Large,E-commerce,New York,1,2018-04-01,2018-06-30,1.17,2.15
4,6774252233,Monthly,2018-12-12,2019-01-11,True,Medium,Fintech,New Jersey,4,2019-01-01,2019-03-31,6.91,3.44


In [11]:
# find the mean inflation rate
average_inflation_for_renewals = renewed_economic['inflation_rate'].mean()
average_inflation_for_renewals

4.418909090909092

In [17]:
print(f"The average inflation rate of when clients renewed their subscription is {average_inflation_for_renewals.round(2)}%.")

The average inflation rate of when clients renewed their subscription is 4.42%.
