<div style="text-align: center; padding: 20px;">
    <img src="austin-distel-744oGeqpxPQ-unsplash.jpeg" alt="​​Subscription Renewal Insights for a SaaS Company" width="450"/>
</div>


<!-- Image source - https://unsplash.com/photos/person-using-macbook-pro-744oGeqpxPQ
-->

    
A SaaS company seeks to uncover what drives its clients to renew subscriptions. They’ve collected data on client details, subscription records, and economic indicators and would like to connect them to better understand its clients’ behavior. 

They’ve tasked you with analyzing these datasets to identify the key factors influencing clients’ decisions to renew their subscriptions. 

Your analysis will provide them with insights into which customers are renewing their products and the reasons behind their renewals. The company can leverage these insights to make informed decisions to increase renewal rates and improve customer loyalty, helping them stay competitive and ensure long-term growth.


## The Data

The company have provided you with three datasets for your analysis. A summary of each data is provided below.

## `client_details.csv`

| Column         | Description|
|----------------|---------------------------------------------------------------|
| `client_id`    | Unique identifier for each client. |
| `company_size` | Size of the company (Small, Medium, Large).|
| `industry`     | Industry to which the client belongs (Fintech, Gaming, Crypto, AI, E-commerce).|
| `location`     | Location of the client (New York, New Jersey, Pennsylvania, Massachusetts, Connecticut).|

## `subscription_records.csv`

| Column             | Description   |
|--------------------|---------------|
| `client_id`        | Unique identifier for each client.|
| `subscription_type`| Type of subscription (Yearly, Monthly).|
| `start_date`       | Start date of the subscription - YYYY-MM-DD.|
| `end_date`         | End date of the subscription - YYYY-MM-DD.|
| `renewed`          | Indicates whether the subscription was renewed (True, False).|

## `economic_indicators.csv`

| Column           | Description                                       |
|------------------|---------------------------------------------------|
| `start_date`     | Start date of the economic indicator (Quarterly) - YYYY-MM-DD.|
| `end_date`       | End date of the economic indicator (Quarterly) - YYYY-MM-DD.|
| `inflation_rate` | Inflation rate in the period.|
| `gdp_growth_rate`| Gross Domestic Product (GDP) growth rate in the period.|


A SaaS company seeks to uncover what drives its clients to renew their subscriptions. They've asked you to answer the following questions:

- How many total Fintech and Crypto clients does the company have? Store as an integer variable called total_fintech_crypto_clients.
- Which industry has the highest renewal rate? Store as a string variable called top_industry.
- For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed? Store as a float variable called average_inflation_for_renewals.

In [1]:
# Re-run this cell
# Import required libraries
import pandas as pd

# Import data
client_details = pd.read_csv('data/client_details.csv')
subscription_records = pd.read_csv('data/subscription_records.csv', parse_dates = ['start_date','end_date'])
economic_indicators = pd.read_csv('data/economic_indicators.csv', parse_dates = ['start_date','end_date']) 

In [2]:
client_details.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   client_id     100 non-null    int64 
 1   company_size  100 non-null    object
 2   industry      100 non-null    object
 3   location      100 non-null    object
dtypes: int64(1), object(3)
memory usage: 3.3+ KB


In [3]:
subscription_records.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   client_id          100 non-null    int64         
 1   subscription_type  100 non-null    object        
 2   start_date         100 non-null    datetime64[ns]
 3   end_date           100 non-null    datetime64[ns]
 4   renewed            100 non-null    bool          
dtypes: bool(1), datetime64[ns](2), int64(1), object(1)
memory usage: 3.4+ KB


In [4]:
economic_indicators.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Unnamed: 0       21 non-null     int64         
 1   start_date       21 non-null     datetime64[ns]
 2   end_date         21 non-null     datetime64[ns]
 3   inflation_rate   21 non-null     float64       
 4   gdp_growth_rate  21 non-null     float64       
dtypes: datetime64[ns](2), float64(2), int64(1)
memory usage: 972.0 bytes


### Pytanie 1

How many total Fintech and Crypto clients does the company have? Store as an integer variable called total_fintech_crypto_clients.

In [7]:
client_details['industry'].value_counts()

industry
Crypto        25
Fintech       22
Gaming        22
E-commerce    20
AI            11
Name: count, dtype: int64

In [9]:
def is_fintech_or_crypt(x):
    if x in ['Fintech', 'Crypto']:
        return 1
    return 0

# total_fintech_cypto_clients = 0
# for industry in client_details['industry']:
#     total_fintech_cypto_clients = is_fintech_or_crypt(industry)

total_fintech_crypto_clients = client_details['industry'].apply(lambda x: x in ['Fintech','Crypto']).sum()
total_fintech_crypto_clients

47

### Pytanie 2

Which industry has the highest renewal rate? Store as a string variable called top_industry.


In [10]:
merged_df = pd.merge(client_details, subscription_records, on='client_id')
merged_df

Unnamed: 0,client_id,company_size,industry,location,subscription_type,start_date,end_date,renewed
0,4280387012,Large,Fintech,New York,Yearly,2022-11-25,2023-11-25,True
1,2095513148,Small,Fintech,New Jersey,Monthly,2021-11-03,2021-12-03,False
2,7225516707,Medium,Fintech,Pennsylvania,Yearly,2021-01-19,2022-01-19,True
3,8093537819,Large,Crypto,New York,Monthly,2019-09-14,2019-10-14,False
4,4387541014,Medium,E-commerce,Massachusetts,Monthly,2018-11-08,2018-12-08,False
...,...,...,...,...,...,...,...,...
95,9159056053,Medium,Gaming,Pennsylvania,Yearly,2022-05-28,2023-05-28,False
96,1077708772,Small,Crypto,New York,Yearly,2019-07-06,2020-07-05,False
97,4361672518,Small,AI,Pennsylvania,Monthly,2019-01-24,2019-02-23,False
98,6751372012,Large,E-commerce,New York,Monthly,2018-05-29,2018-06-28,True


In [11]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   client_id          100 non-null    int64         
 1   company_size       100 non-null    object        
 2   industry           100 non-null    object        
 3   location           100 non-null    object        
 4   subscription_type  100 non-null    object        
 5   start_date         100 non-null    datetime64[ns]
 6   end_date           100 non-null    datetime64[ns]
 7   renewed            100 non-null    bool          
dtypes: bool(1), datetime64[ns](2), int64(1), object(4)
memory usage: 5.7+ KB


In [19]:
grouped_df = merged_df.groupby('industry', as_index=False).agg(
    renewal_rate = ('renewed', 'mean')
).sort_values('renewal_rate', ascending=False)
grouped_df


Unnamed: 0,industry,renewal_rate
4,Gaming,0.727273
0,AI,0.636364
3,Fintech,0.545455
2,E-commerce,0.45
1,Crypto,0.44


In [20]:
grouped_df.iloc[0]['industry']

'Gaming'

### Pytanie 3

For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed? Store as a float variable called average_inflation_for_renewals.

In [21]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   client_id          100 non-null    int64         
 1   company_size       100 non-null    object        
 2   industry           100 non-null    object        
 3   location           100 non-null    object        
 4   subscription_type  100 non-null    object        
 5   start_date         100 non-null    datetime64[ns]
 6   end_date           100 non-null    datetime64[ns]
 7   renewed            100 non-null    bool          
dtypes: bool(1), datetime64[ns](2), int64(1), object(4)
memory usage: 5.7+ KB


In [22]:

subscriptions_with_inflation = pd.merge_asof(subscription_records.sort_values(by='end_date'), 
                                             economic_indicators, 
                                             left_on='end_date', 
                                             right_on='start_date', 
                                             direction='backward')


result = subscriptions_with_inflation[subscriptions_with_inflation['renewed'] == True]['inflation_rate'].mean()
result

4.418909090909092

# Solution

In [23]:
# Import required libraries
import pandas as pd

# Import data
client_details = pd.read_csv('data/client_details.csv')
subscription_records = pd.read_csv('data/subscription_records.csv', parse_dates = ['start_date','end_date'])
economic_indicators = pd.read_csv('data/economic_indicators.csv', parse_dates = ['start_date','end_date'])

##### Question 1 - How many total Fintech and Crypto clients does the company have?  ##### 
# Define a function that returns 1 if the input is either 'Fintech' or 'Crypto', otherwise returning 0
def is_fintech_or_crypto(x):
    if x in ['Fintech','Crypto']:
        return 1
    else:
        return 0
    
# Loop through the 'industry' column in client_details and increment the total_fintech_crypto_clients counter for every Fintech or Crpyto client
total_fintech_crypto_clients = 0
for industry in client_details['industry']:
    total_fintech_crypto_clients += is_fintech_or_crypto(industry)

# Alternate approach 1 - Apply the custom function directly to the 'industry' column to calculate the total number of Fintech and Crypto clients
# total_fintech_crypto_clients = client_details['industry'].apply(is_fintech_or_crypto).sum()
    
# Alternate approach 2 - Use a lambda function to calculate the total number of Fintech and Crypto clients
# total_fintech_crypto_clients = client_details['industry'].apply(lambda x: x in ['Fintech','Crypto']).sum()
    
  
##### Question 2 - Which industry has the highest renewal rate?   ##### 
# Merge client details with subscription records
subscriptions = pd.merge(subscription_records, client_details, on = 'client_id', how = 'left')

# Group by industry and calculate renewal rate
industry_renewal_rates = subscriptions.groupby('industry')['renewed'].mean()

# Find the industry with the highest renewal rate, save as variable 'top_industry'
top_industry = industry_renewal_rates.sort_values(ascending = False).index[0]


##### Question 3 -For clients that renewed their subscriptions, what was the average inflation rate when their subscriptions were renewed? #####
# Merge subscription records with economic indicators to get the inflation rate at the subscription end date (i.e., renewal time)
subscriptions_with_inflation = pd.merge_asof(subscription_records.sort_values(by='end_date'), 
                                             economic_indicators, 
                                             left_on='end_date', 
                                             right_on='start_date', 
                                             direction='backward')

# Calculate the average inflation rate for renewed subscriptions
average_inflation_for_renewals = subscriptions_with_inflation[subscriptions_with_inflation['renewed'] == True].inflation_rate.mean()
average_inflation_for_renewals


4.418909090909092