# LioJotstar Merger: Data Analysis with Python for Strategic Optimization

## 3. Feature Engineering
This notebook utilizes the Pandas library to engineer new features from the loaded DataFrames, aiming to extract more meaningful information for the strategic merger analysis of LioCinema and Jotstar.

### Importing Required Libraries

In [4]:
import pandas as pd

### Loading Required DataFrames from Saved Parquet Files

In [6]:
try:
    jotstar_subscribers_df = pd.read_parquet('Parquet Data Files/02. Data Wrangling/Jotstar_db/subscribers.parquet')
    print("Jotstar - Subscribers table loaded successfully.")
    liocinema_subscribers_df = pd.read_parquet('Parquet Data Files/02. Data Wrangling/LioCinema_db/subscribers.parquet')
    print("LioCinema - Subscribers table loaded successfully.")
    print("\nData Loading Complete.")
    
except FileNotFoundError as e:
       print("Error: One or more Parquet files not found. Please check the file paths.")
       print(f"Details: {e}")
except Exception as e:
       print("An error occurred during data import.")
       print(f"Details: {e}")

Jotstar - Subscribers table loaded successfully.
LioCinema - Subscribers table loaded successfully.

Data Loading Complete.


### Adding a new column - Plan Change Type to Subscribers table for both Jotstar and LioCinema databases
'Plan Change Type' indicates whether a user's subscription plan has been upgraded, downgraded, or remains unchanged.

In [8]:
# Adding to Jotstar Subscribers table
def get_plan_change_type(df):
    if pd.isna(df['Plan Change Date']):
        return "No Change"
        
    if df['New Subscription Plan'] == "Premium" or (df['Subscription Plan'] == "Free" and df['New Subscription Plan'] == "VIP"):
        return "Upgrade"
    else:
        return "Downgrade"

jotstar_subscribers_df['Plan Change Type'] = jotstar_subscribers_df.apply(get_plan_change_type, axis = 1)
jotstar_subscribers_df.head(20)

Unnamed: 0,User ID,Age Group,City Tier,Subscription Date,Subscription Plan,Last Active Date,Plan Change Date,New Subscription Plan,Plan Change Type
0,UIDJS0000751588f,18-24,Tier 1,2024-06-10,Premium,NaT,NaT,Premium,No Change
1,UIDJS000093eeb86,18-24,Tier 1,2024-11-09,Free,NaT,NaT,Free,No Change
2,UIDJS00010d7fa1e,25-34,Tier 1,2024-08-08,Free,NaT,NaT,Free,No Change
3,UIDJS00013411a85,35-44,Tier 2,2024-05-31,VIP,NaT,NaT,VIP,No Change
4,UIDJS0003a3f54cf,35-44,Tier 1,2024-09-20,Premium,NaT,NaT,Premium,No Change
5,UIDJS0003c1e814d,45+,Tier 1,2024-09-29,Free,NaT,NaT,Free,No Change
6,UIDJS0005148a254,18-24,Tier 1,2024-11-10,Premium,NaT,NaT,Premium,No Change
7,UIDJS00053f36fed,25-34,Tier 1,2024-03-07,Premium,NaT,NaT,Premium,No Change
8,UIDJS00054eb2210,25-34,Tier 3,2024-04-12,Premium,NaT,NaT,Premium,No Change
9,UIDJS0005f952957,25-34,Tier 3,2024-09-29,VIP,2024-12-02,NaT,VIP,No Change


In [9]:
# Adding to LioCinema Subscribers table
def get_plan_change_type(df):
    if pd.isna(df['Plan Change Date']):
        return "No Change"
        
    if df['New Subscription Plan'] == "Premium" or (df['Subscription Plan'] == "Free" and df['New Subscription Plan'] == "Basic"):
        return "Upgrade"
    else:
        return "Downgrade"

liocinema_subscribers_df['Plan Change Type'] = liocinema_subscribers_df.apply(get_plan_change_type, axis = 1)
liocinema_subscribers_df.head(20)

Unnamed: 0,User ID,Age Group,City Tier,Subscription Date,Subscription Plan,Last Active Date,Plan Change Date,New Subscription Plan,Plan Change Type
0,UIDLC00000bea68a,25-34,Tier 3,2024-10-24,Free,NaT,NaT,Free,No Change
1,UIDLC00009202848,18-24,Tier 1,2024-09-18,Basic,NaT,NaT,Basic,No Change
2,UIDLC0001086afc3,35-44,Tier 2,2024-03-23,Premium,NaT,2024-04-23,Free,Downgrade
3,UIDLC000186abd93,18-24,Tier 3,2024-09-07,Free,2024-10-23,NaT,Free,No Change
4,UIDLC0002189b09f,18-24,Tier 2,2024-10-07,Premium,NaT,2024-11-07,Free,Downgrade
5,UIDLC000362cba39,25-34,Tier 3,2024-06-09,Free,2024-07-29,NaT,Free,No Change
6,UIDLC000436e2f4d,45+,Tier 3,2024-09-04,Premium,NaT,NaT,Premium,No Change
7,UIDLC00044a5d021,45+,Tier 2,2024-07-22,Free,2024-09-15,NaT,Free,No Change
8,UIDLC00044bbb294,25-34,Tier 2,2024-04-09,Free,NaT,2024-09-09,Basic,Upgrade
9,UIDLC00047bf87da,18-24,Tier 1,2024-10-29,Free,NaT,NaT,Free,No Change


### Adding a new column - Plan Transition to Subscribers table for both Jotstar and LioCinema databases
'Plan Transition' represents the transition of users between subscription plans by combining their old and new plans into a single value. It provides a clear view of how users have moved from one plan to another (e.g., "Free → Premium", "Premium → VIP", "VIP → Premium"). Users who have not changed their subscription plans will be recorded as "No Change", ensuring a comprehensive overview of plan stability and changes.

In [11]:
# Adding to Jotstar Subscribers table
def get_plan_transition(df):
    transition_map = {
        ("Free", "Free"): "Free → Free",
        ("VIP", "VIP"): "VIP → VIP",
        ("Premium", "Premium"): "Premium → Premium",
        ("Free", "VIP"): "Free → VIP",
        ("Free", "Premium"): "Free → Premium",
        ("VIP", "Free"): "VIP → Free",
        ("VIP", "Premium"): "VIP → Premium",
        ("Premium", "Free"): "Premium → Free",
        ("Premium", "VIP"): "Premium → VIP",
    }
    return transition_map.get((df['Subscription Plan'], df['New Subscription Plan']), "Unknown Transition")

jotstar_subscribers_df['Plan Transition'] = jotstar_subscribers_df.apply(get_plan_transition, axis = 1)
jotstar_subscribers_df.head(20)

Unnamed: 0,User ID,Age Group,City Tier,Subscription Date,Subscription Plan,Last Active Date,Plan Change Date,New Subscription Plan,Plan Change Type,Plan Transition
0,UIDJS0000751588f,18-24,Tier 1,2024-06-10,Premium,NaT,NaT,Premium,No Change,Premium → Premium
1,UIDJS000093eeb86,18-24,Tier 1,2024-11-09,Free,NaT,NaT,Free,No Change,Free → Free
2,UIDJS00010d7fa1e,25-34,Tier 1,2024-08-08,Free,NaT,NaT,Free,No Change,Free → Free
3,UIDJS00013411a85,35-44,Tier 2,2024-05-31,VIP,NaT,NaT,VIP,No Change,VIP → VIP
4,UIDJS0003a3f54cf,35-44,Tier 1,2024-09-20,Premium,NaT,NaT,Premium,No Change,Premium → Premium
5,UIDJS0003c1e814d,45+,Tier 1,2024-09-29,Free,NaT,NaT,Free,No Change,Free → Free
6,UIDJS0005148a254,18-24,Tier 1,2024-11-10,Premium,NaT,NaT,Premium,No Change,Premium → Premium
7,UIDJS00053f36fed,25-34,Tier 1,2024-03-07,Premium,NaT,NaT,Premium,No Change,Premium → Premium
8,UIDJS00054eb2210,25-34,Tier 3,2024-04-12,Premium,NaT,NaT,Premium,No Change,Premium → Premium
9,UIDJS0005f952957,25-34,Tier 3,2024-09-29,VIP,2024-12-02,NaT,VIP,No Change,VIP → VIP


In [12]:
# Adding to LioCinema Subscribers table
def get_plan_transition(df):
    transition_map = {
        ("Free", "Free"): "Free → Free",
        ("Basic", "Basic"): "Basic → Basic",
        ("Premium", "Premium"): "Premium → Premium",
        ("Free", "Basic"): "Free → Basic",
        ("Free", "Premium"): "Free → Premium",
        ("Basic", "Free"): "Basic → Free",
        ("Basic", "Premium"): "Basic → Premium",
        ("Premium", "Free"): "Premium → Free",
        ("Premium", "Basic"): "Premium → Basic",
    }
    return transition_map.get((df['Subscription Plan'], df['New Subscription Plan']), "Unknown Transition")

liocinema_subscribers_df['Plan Transition'] = liocinema_subscribers_df.apply(get_plan_transition, axis = 1)
liocinema_subscribers_df.head(20)

Unnamed: 0,User ID,Age Group,City Tier,Subscription Date,Subscription Plan,Last Active Date,Plan Change Date,New Subscription Plan,Plan Change Type,Plan Transition
0,UIDLC00000bea68a,25-34,Tier 3,2024-10-24,Free,NaT,NaT,Free,No Change,Free → Free
1,UIDLC00009202848,18-24,Tier 1,2024-09-18,Basic,NaT,NaT,Basic,No Change,Basic → Basic
2,UIDLC0001086afc3,35-44,Tier 2,2024-03-23,Premium,NaT,2024-04-23,Free,Downgrade,Premium → Free
3,UIDLC000186abd93,18-24,Tier 3,2024-09-07,Free,2024-10-23,NaT,Free,No Change,Free → Free
4,UIDLC0002189b09f,18-24,Tier 2,2024-10-07,Premium,NaT,2024-11-07,Free,Downgrade,Premium → Free
5,UIDLC000362cba39,25-34,Tier 3,2024-06-09,Free,2024-07-29,NaT,Free,No Change,Free → Free
6,UIDLC000436e2f4d,45+,Tier 3,2024-09-04,Premium,NaT,NaT,Premium,No Change,Premium → Premium
7,UIDLC00044a5d021,45+,Tier 2,2024-07-22,Free,2024-09-15,NaT,Free,No Change,Free → Free
8,UIDLC00044bbb294,25-34,Tier 2,2024-04-09,Free,NaT,2024-09-09,Basic,Upgrade,Free → Basic
9,UIDLC00047bf87da,18-24,Tier 1,2024-10-29,Free,NaT,NaT,Free,No Change,Free → Free


### Exporting Processed DataFrames to Parquet Files

In [14]:
# Saving transformed Jotstar Subscribers DataFrame in Parquet Format
jotstar_subscribers_df.to_parquet('Parquet Data Files/03. Feature Engineering/Jotstar_db/subscribers.parquet', index = False)

# Saving transformed LioCinema Subscribers DataFrame in Parquet Format
liocinema_subscribers_df.to_parquet('Parquet Data Files/03. Feature Engineering/LioCinema_db/subscribers.parquet', index = False)

print("Both transformed DataFrames are saved as Parquet files successfully.")

Both transformed DataFrames are saved as Parquet files successfully.


## Next Notebook: "4. Formulating Key Metrics for Data Overview"