# Product & Channel Adoption
## Client Channel and Product Usage Analysis

## Objective:
The purpose of this script is to analyze the usage patterns of different channels and products among clients in the database.

In [0]:
from sqlalchemy import create_engine
import pandas as pd 
import numpy as np 



## Libraries Used:
- `sqlalchemy`: Used to create an engine for connecting to databases.
- `pandas`: Used for data manipulation and analysis.
- `numpy`: Optional library often used in conjunction with Pandas for numerical computations.

## Code Details:
1. **Import Statements:**
   - `from sqlalchemy import create_engine`: Imports the `create_engine` function from the `sqlalchemy` library, which is used to create a connection engine for the database.
   - `import pandas as pd`: Imports the `pandas` library using the alias `pd`, which is a popular library for data manipulation and analysis in Python.
   - `import numpy as np`: Imports the `numpy` library using the alias `np`, which is commonly used for numerical computations in Python, often alongside Pandas.

2. **Code Usage:**
   - `create_engine`: The `create_engine` function is used to create an engine object that connects to a database. This engine object can then be used by Pandas or SQLAlchemy to interact with the database.

3. **Data Handling:**
   This code snippet sets up the necessary libraries for data handling but does not perform any specific data operations. However, with the `create_engine` function from SQLAlchemy and the data manipulation capabilities of Pandas and NumPy, various data operations such as querying databases, data cleaning, analysis, and more can be performed efficiently.




In [0]:
import json
with open('/Workspace/Credentials/db_data.json', 'r') as fp:
    data = json.load(fp)


host = data['redshift']['host']
user = data['redshift']['user']
passwd = data['redshift']['passwd']
database = data['redshift']['database']

conn = create_engine(f"postgresql+psycopg2://{user}:{passwd}@{host}:5439/{database}")




In [0]:
pd.set_option('display.float_format', lambda x: '%.2f' % x)


from datetime import datetime, timedelta
today = datetime.today().strftime('%Y-%m-%d')
yesterday =  (datetime.today() - timedelta(days = 1)).strftime('%Y-%m-%d')
print(today)
print(yesterday)


last_2_wks = datetime.today() - timedelta(days = 14)
last_2_wks = last_2_wks.strftime('%Y-%m-%d')
print('------------------------------------')
print(last_2_wks)

print('\n')
now = datetime.today().strftime('%Y-%m-%d %H:%M:%S')

last_30_mins = (datetime.today() - timedelta(days = 1)).strftime('%Y-%m-%d %H:%M:%S')
trunc_last_30_mins = (datetime.today() - timedelta(days = 1)).strftime('%Y-%m-%d %H:%M')
print(last_30_mins, 'to', now)


## Objective:
The purpose of this code snippet is to load database connection credentials from a JSON file and use them to create a connection engine for PostgreSQL using SQLAlchemy.

## Code Details:
1. **Import Statement:**
   - `import json`: Imports the `json` module, which provides functions for working with JSON data in Python.

2. **Loading JSON Data:**
   ```python
   with open('/Workspace/Credentials/db_data.json', 'r') as fp:
       data = json.load(fp)



## Query Details:
The script starts by defining several Common Table Expressions (CTEs) to categorize clients based on their transaction channels and account products:
- `mobile_clients`: Clients using the 'MOBILE' channel.
- `ussd_clients`: Clients using the 'USSD' channel.
- `chatbot_clients`: Clients using the 'CHATBOT' channel.
- `wallet_clients`: Clients using the 'WALLET' channel.
- `bills_clients`: Clients with bill payments.
- `airtime_clients`: Clients buying airtime or data.
- `cardless_clients`: Clients performing cardless transactions.
- `interbank_clients`: Clients with interbank outflows.
- `card_clients`: Clients with card transactions.
- `USA_clients`: Clients with a 'Universal Savings Account'.
- `fd_clients`: Clients with a 'Term Fixed Deposit'.
- `target_savings_clients`: Clients with a 'Target Savings Account'.
- `child_savings_clients`: Clients with a 'Child Savings' account.
- `corp_current_clients`: Clients with a 'Corporate Current Account'.
- `ind_current_clients`: Clients with an 'Individual Current Account'.
- `staff_current_clients`: Clients with a 'Staff Current Account'.
- `credit_clients`: Active clients with loans.

The main query then joins these CTEs with the `dwh_all_clients` table and other relevant tables to generate a report with the following columns:
- `client_id`: Unique identifier for the client.
- `client_category`: Category of the client.
- `uses_mobile`: Flag indicating if the client uses the 'MOBILE' channel.
- `uses_ussd`: Flag indicating if the client uses the 'USSD' channel.
- `uses_chatbot`: Flag indicating if the client uses the 'CHATBOT' channel.
- `uses_wallet`: Flag indicating if the client uses the 'WALLET' channel.
- `pays_bills`: Flag indicating if the client pays bills.
- `buys_airtime_data`: Flag indicating if the client buys airtime or data.
- `does_cardless`: Flag indicating if the client performs cardless transactions.
- `does_interbank_outflow`: Flag indicating if the client has interbank outflows.
- `has_card`: Flag indicating if the client has a card.
- `has_usa`: Flag indicating if the client has a 'Universal Savings Account'.
- `has_fd`: Flag indicating if the client has a 'Term Fixed Deposit'.
- `has_target_savings`: Flag indicating if the client has a 'Target Savings Account'.
- `has_child_savings`: Flag indicating if the client has a 'Child Savings' account.
- `has_corp_current`: Flag indicating if the client has a 'Corporate Current Account'.
- `has_indv_current`: Flag indicating if the client has an 'Individual Current Account'.
- `has_staff_current`: Flag indicating if the client has a 'Staff Current Account'.
- `has_credit`: Flag indicating if the client has an active credit (loan) account.
- `run_date`: Date when the script was executed.

The resulting dataset `rcadop` provides insights into the usage patterns of different channels and products among clients at the time of execution.

## Execution Time:
The `%%time` magic command at the beginning measures the execution time of this script.


In [0]:
%%time

query_data = pd.read_sql_query(f'''


        WITH mobile_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_all_transactions
            WHERE channel = 'MOBILE'
        ),
        ussd_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_all_transactions
            WHERE channel = 'USSD'
        ),
        chatbot_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_all_transactions
            WHERE channel = 'CHATBOT'
        ),
        wallet_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_all_transactions
            WHERE channel = 'WALLET'
        ),
        bills_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_bills_only
        ),
        airtime_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_airtime_only
        ),
        cardless_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_cardless_transactions
        ),
        interbank_clients AS (
            SELECT DISTINCT client_id
            FROM dwh_interbank_outflows
        ),
        card_clients AS (
            select distinct client_id from
            dwh_card_transactions
        ),
        USA_clients as (
            select distinct client_id
            from dwh_all_accounts 
            where product_name = 'Universal Savings Account'
        ),
        fd_clients as (
            select distinct client_id
            from dwh_all_accounts daa  
            where product_name = 'Term Fixed Deposit'
        ),
        target_savings_clients as (
            select distinct client_id
            from dwh_all_accounts 
            where product_name = 'Target Savings Account'
        ),
        child_savings_clients as (
            select distinct client_id
            from dwh_all_accounts
            where product_name = 'Child Savings'
        ),
        corp_current_clients as (
            select distinct client_id
            from dwh_all_accounts daa  
            where product_name = 'Corporate Current Account'
        ),
        ind_current_clients as (
            select distinct client_id
            from dwh_all_accounts 
            where product_name = 'Individual Current Account'
        ),
        staff_current_clients as (
            select distinct client_id
            from dwh_all_accounts
            where product_name = 'Staff Current Account'
        ),
        credit_clients as (
            select distinct client_id from dwh_loan_details 
            where loan_status = 'Active'
        )

            

        SELECT
            distinct 
            dac.client_id,  dac.client_category,
            CASE WHEN mc.client_id IS NOT NULL THEN '1' ELSE '0' END AS uses_mobile,
            CASE WHEN uc.client_id IS NOT NULL THEN '1' ELSE '0' END AS uses_ussd,
            CASE WHEN cc.client_id IS NOT NULL THEN '1' ELSE '0' END AS uses_chatbot,
            CASE WHEN wc.client_id IS NOT NULL THEN '1' ELSE '0' END AS uses_wallet,
            CASE WHEN bc.client_id IS NOT NULL THEN '1' ELSE '0' END AS pays_bills,
            CASE WHEN ac.client_id IS NOT NULL THEN '1' ELSE '0' END AS buys_airtime_data,
            CASE WHEN ccless.client_id IS NOT NULL THEN '1' ELSE '0' END AS does_cardless,
            CASE WHEN ic.client_id IS NOT NULL THEN '1' ELSE '0' END AS does_interbank_outflow,
            CASE WHEN dc.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_card,
            CASE WHEN uc1.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_usa,
            CASE WHEN fd.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_fd,
            CASE WHEN tsc.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_target_savings,
            CASE WHEN csc.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_child_savings,
            CASE WHEN ccc.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_corp_current,
            CASE WHEN icc.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_indv_current,
            CASE WHEN scc.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_staff_current,
            CASE WHEN cc1.client_id IS NOT NULL THEN '1' ELSE '0' END AS has_credit,

            current_date as run_date
            
        FROM dwh_all_clients dac
        left join dwh_all_accounts daa on dac.client_id = daa.client_id 
        LEFT JOIN mobile_clients mc ON dac.client_id = mc.client_id
        LEFT JOIN ussd_clients uc ON dac.client_id = uc.client_id
        LEFT JOIN chatbot_clients cc ON dac.client_id = cc.client_id
        LEFT JOIN wallet_clients wc ON dac.client_id = wc.client_id
        LEFT JOIN bills_clients bc ON dac.client_id = bc.client_id
        LEFT JOIN airtime_clients ac ON dac.client_id = ac.client_id
        LEFT JOIN cardless_clients ccless ON dac.client_id = ccless.client_id
        LEFT JOIN interbank_clients ic ON dac.client_id = ic.client_id
        LEFT JOIN card_clients dc ON daa.client_id = dc.client_id
        left join usa_clients uc1 on dac.client_id  = uc1.client_id
        left join fd_clients fd on fd.client_id = dac.client_id 
        left join target_savings_clients tsc on tsc.client_id = dac.client_id 
        left join child_savings_clients csc on csc.client_id = dac.client_id 
        left join corp_current_clients ccc on ccc.client_id = dac.client_id 
        left join ind_current_clients icc on icc.client_id = dac.client_id 
        left join staff_current_clients scc on scc.client_id = dac.client_id 
        left join credit_clients cc1 on cc1.client_id = dac.client_id 




''', conn)

query_data

In [0]:
'''
from pyspark.sql import SparkSession

df = query_data
display(df)

# Create a SparkSession if not already created
spark = SparkSession.builder.getOrCreate()

# Convert Pandas DataFrame to Spark DataFrame
spark_df = spark.createDataFrame(df)

# Write Spark DataFrame to table in Databricks
spark_df.write \
    .mode("overwrite") \
    .saveAsTable("vfd_databricks.default.prod_chan_adoption")
'''

In [0]:
%%time

df = query_data


# Write DataFrame to Redshift
# Assuming the table name should be 'wh_client_prod_chan_adoption'
table_name = 'dwh_client_prod_chan_adoption'

# Write the DataFrame to the Redshift table
df.to_sql(name=table_name, con=conn, if_exists='replace', index=False, chunksize = 5000, method = 'multi')


print(f"Data successfully written on: {now}\n ")


## resulting tables on redshift public schema includes 

* dwh_client_prod_chan_adoption  - list of product and channels in use by client (this excludes POS)