In [1]:
# Importing libraries
import pandas as pd
from datetime import datetime
from pretty_html_table import build_table

# Supportive Notebooks
To calculate the successful sales events from these raw MIS, we need to parse these files from multiple functions. These function are defined in multiple notebooks.

In [2]:
# Importing files for required data cleaning and processing
%run py_base_files/savings_account.ipynb
%run py_base_files/personal_loan.ipynb
%run py_base_files/checker.ipynb
%run py_base_files/mail_trigger.ipynb
%run py_base_files/error_logs.ipynb

# Reading Supportive Files
### Payin-Payout Rules
There are some predefined rules to calculate the successful sale events. Based on these rules ShiftPay pays commision to the SPs. Basically these rules defines that how ShiftPay deducts a commission from sale amount and disburses the remaining funds to the respective Shift Partner.
### Miscellaneous Leads
Leads data which was generated by SPs using ShiftPay's application to sell the listed financial products. When we push this model to the production, this data will come from ShiftPay's database. Here, for simplicity of this project we have this data in a .csv file, in future this is going to be a lot of data and will not be feasiable to parse all misc_leads to the supportive files. For that we'll just need to pull the required leads data as per the MIS.


In [11]:
# Reading supportive files
rules = pd.read_csv('supported_tables\payin_payout_rules.csv')
leads = pd.read_csv('supported_tables\misc_leads.csv')

# Reading Raw MIS
For simplicity of this project, we're reading these .csv files from local machine. But when we push this model to production these files will be dropped to an Azure container by a specific team and this storage event will trigger a pipeline. That leads to run this notebook and we'll get the successful payout events files. Now the successful payout events files will be saved to another Azure container that will trigger another pipeline to upsert these records into database.

In [4]:
# Reading all the raw MIS from financial organization
mis_bank_A_df = pd.read_csv('input_mis\mis_bank_A.csv')
mis_bank_B_df = pd.read_csv('input_mis\mis_bank_B.csv')

# MIS and Supported File Parsing
**Why data cleaning is required for these MIS?**
\
ShiftPay observed in past, there were some instances in which we had the different data type for the success event columns in these MIS. That might lead to an error while processing these MIS. For example we setup our calculation based on YYYY-MM-DD format, but somehow we received the DD-MM-YYYY HH:MM:SS format. To deal with these situations we'll parse every MIS to the checker.ipynb notebook to ensure data cleaning in raw .csv files includes handling null values and dropping duplicates etc.

In [5]:
# Parsing MIS to the checker
mis_bank_A_df = mis_bank_A_cleaner(mis_bank_A_df)
mis_bank_B_df = mis_bank_B_cleaner(mis_bank_B_df)

# Parsing supportive files to the checker
rules = rules_cleaner(rules)
leads = leads_cleaner(leads)

# Payout Calculation
We're going to pass the cleaned MIS to defined product type notebooks for success event calculations. In py_base_files folder we defined multiple notebooks based on the product types. For every product type we need to calculate the success events differently.

In [6]:
# Performing calculations to these MIS
mis_bank_A_success = savings_account(mis_bank_A_df, rules, leads)
mis_bank_B_success = personal_loan(mis_bank_B_df, rules, leads)

# Saving Final Output Files
Now we have the final output for succesful sale events. We're going to save these results to the output folder.
\
\
**Note:**
\
There are some chances we might encounter with an error while calculating these events To deal with these situations we performed the error handling while doing these calculations. In case of any error, our DataFrame will have a error str instead of DataFrame.

In [7]:
# Exporting final results
error_check = [('mis_bank_A_success', mis_bank_A_success), 
    ('mis_bank_B_success', mis_bank_B_success)
    ]

for var_name, var_value in error_check:
    if isinstance(var_value, pd.DataFrame):
        # Saving DataFrame to .csv file
        var_value.to_csv(f'output\{var_name}.csv', index=False)
        print(f'{var_name} saved to .csv file successfully.')

    elif isinstance(var_value, str):
        # Error handling
        error_logs(var_name, var_value)
        print(f'{var_name} encountered with an error. Please check the error_logs.')

mis_bank_A_success saved to .csv file successfully.
mis_bank_B_success saved to .csv file successfully.


In case of any error while iteration of this model, an error log will be appended to the error_log.csv file.
\
\
Now, we have the successful events and their payin-payout respectively. These .csv files will be uploaded to the Azure container in production, and this storage event will trigger a pipeline that will upsert these records into the database and change the Status and SubStatus of misc_leads. Also payout will be updated to the SP's wallet.

# Report Generation
Management would like to have an summary report for every iteration that should be pushed through an automated email. This report should contain the summary for success events and errors.

In [8]:
# Report Summary
success_cases = [('mis_bank_A_success', mis_bank_A_success), 
    ('mis_bank_B_success', mis_bank_B_success)
    ]

success_df = pd.DataFrame({})
current_date = datetime.now().date()

# Concatination all the success events
for var_name, var_value in error_check:
    if isinstance(var_value, pd.DataFrame):
        success_df = pd.concat(
            [var_value, success_df],
            ignore_index=True
        )

# Summary metrics
created_at = current_date

success_summary = (
    success_df
    .groupby(['SourceType', 'MediumType', 'ProductType'])
    .agg(
        UniqueSP = ('SPId', 'nunique'),
        TotalLeads = ('LeadId', 'count'),
        TotalPayin = ('TotalPayin', 'sum'),
        TotalPayout = ('TotalPayout', 'sum')
    )
    .reset_index()
)

success_summary['CreatedAt'] = current_date
success_summary['Revenue'] = success_summary['TotalPayin'] - success_summary['TotalPayout']
success_summary = success_summary[['CreatedAt','SourceType','MediumType','ProductType','UniqueSP','TotalLeads','TotalPayin','TotalPayout','Revenue']]

In [9]:
# Error Summary
errors = pd.read_csv('output\error_logs.csv')
errors = errors_cleaner(errors)

# Filtering the errors for current iteration
errors = (
    errors
    .query('CreatedAt == @current_date & CreatedAt == CreatedAt.max()')
)

In [10]:
# Final HTML summary
body_1 = '''
<p>Hey team,</p>
<p>Please find the below success summary for the current iteration of MIS automation.</p><br>
'''
body_2 = build_table(success_summary, 'blue_light')

# Concatenating the above results 
success_body = body_1 + body_2

# Checking for erros
if len(errors) > 0:
    body_3 = '''
    <br><p>We encountered with an error while processing the below MIS. Please take a look on error_logs to know more about the exact errors.</p><br>
    '''
    body_4 = build_table(errors, 'red_light')
    error_body = body_3 + body_4
else:
    error_body = '''
    <br><p>No error were detected during this iteration.</p><br>
    '''
body_5 = '''
<p>Thanks,</p>
<p>ShiftPay Analytics Team</p>
'''

final_body = success_body + error_body + body_5
email(final_body)

Email sent successfully!
