## Data loading

### Subtask:
Load the four provided CSV files into pandas DataFrames.


**Reasoning**:
Load the four CSV files into pandas DataFrames, handling potential inconsistencies in column names and data types.



In [21]:
import pandas as pd

try:
    df_dec = pd.read_csv('Budget_FY2024_25 - Dec\'24.csv')
    df_dec = df_dec.rename(columns={'Dec 2024': 'aop_amount'})
    df_dec['month'] = '01-Dec-2024'

    df_jan = pd.read_csv('Budget_FY2024_25 - Jan\'25.csv')
    df_jan = df_jan.rename(columns={'Jan 2025': 'aop_amount'})
    df_jan['month'] = '01-Jan-2025'

    df_feb = pd.read_csv('Budget_FY2024_25 - Feb\'25.csv')
    df_feb = df_feb.rename(columns={'Feb 2025': 'aop_amount'})
    df_feb['month'] = '01-Feb-2025'

    df_mar = pd.read_csv('Budget_FY2024_25 - Mar\'25.csv')
    df_mar = df_mar.rename(columns={'Mar 2025': 'aop_amount'})
    df_mar['month'] = '01-Mar-2025'

    print("DataFrames loaded successfully.")
except FileNotFoundError:
    print("One or more CSV files not found.")
except KeyError as e:
    print(f"Error: Column not found - {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


display(df_dec)


DataFrames loaded successfully.


Unnamed: 0,AccountId,AccountName,Owner,aop_amount,month
0,2637409966,user-management-prod,Shrey Agarwal,200012,01-Dec-2024
1,3079193758,IOT Prod,Sunil Bansal,115010,01-Dec-2024
2,17820698743,risk-devops-tools,Bhaskar Gopalan,0,01-Dec-2024
3,21891570382,risk-lending,Bhaskar Gopalan,4495,01-Dec-2024
4,28914776520,edc_prod_ocl,Sunil Bansal,31000,01-Dec-2024
...,...,...,...,...,...
116,952840895893,User Management,Shrey Agarwal,15004,01-Dec-2024
117,968032760053,Fraud AWS,Bhaskar Gopalan,295988,01-Dec-2024
118,975049957371,tpap-dwh,Sunil Agrawal,0,01-Dec-2024
119,984308723723,commonsaas_ocl,Mukesh Meena,14012,01-Dec-2024


## Data wrangling

### Subtask:
Combine the four individual DataFrames (df_dec, df_jan, df_feb, df_mar) into a single DataFrame.


**Reasoning**:
Combine the four dataframes, convert the 'account_id' column to string, convert the 'aop_amount' column to numeric, and check data types.



In [25]:
from datetime import date
import pandas as pd

try:
    # Concatenate the DataFrames
    df_combined = pd.concat([df_dec, df_jan, df_feb, df_mar], ignore_index=True)
    print(df_combined.columns)

    df_combined = df_combined.rename(columns={'AccountId': 'account_id'})
    df_combined = df_combined.rename(columns={'AccountName': 'account_name'})

    # Convert 'account_id' and 'account_name' to string
    df_combined['account_id'] = df_combined['account_id'].astype(str)
    df_combined['account_name'] = df_combined['account_name'].astype(str)

    # Convert 'month' column to datetime.date
    df_combined['month'] = pd.to_datetime(df_combined['month']).dt.date

    # Convert 'aop_amount' to numeric
    df_combined['aop_amount'] = pd.to_numeric(df_combined['aop_amount'], errors='coerce')

except Exception as e:
    print(f"Error: {e}")

display(df_combined)

Index(['AccountId', 'AccountName', 'Owner', 'aop_amount', 'month'], dtype='object')


Unnamed: 0,account_id,account_name,Owner,aop_amount,month
0,2637409966,user-management-prod,Shrey Agarwal,200012,2024-12-01
1,3079193758,IOT Prod,Sunil Bansal,115010,2024-12-01
2,17820698743,risk-devops-tools,Bhaskar Gopalan,0,2024-12-01
3,21891570382,risk-lending,Bhaskar Gopalan,4495,2024-12-01
4,28914776520,edc_prod_ocl,Sunil Bansal,31000,2024-12-01
...,...,...,...,...,...
482,954976288980,tocom-aisensy,Anand Shankar,0,2025-03-01
483,968032760053,Fraud AWS,Bhaskar Gopalan,313999,2025-03-01
484,975049957371,tpap-dwh,Sunil Agrawal,0,2025-03-01
485,984308723723,commonsaas_ocl,Mukesh Meena,14012,2025-03-01


## Data preparation

### Subtask:
Prepare a template DataFrame to hold the calculated average AOP amounts for all months from April 2024 to April 2026.


**Reasoning**:
Calculate the average aop_amount for each account_id based on available data.

Identify missing months for each account_id in the range from April 2024 to March 2026.

Exclude months that already exist in df_combined for each account_id.

Generate new rows using the calculated average aop_amount.

Append these new rows to df_combined to complete the dataset.



In [27]:
import pandas as pd

# Ensure 'month' column is in datetime format
df_combined['month'] = pd.to_datetime(df_combined['month'])

# Define the full range of months from April 2024 to March 2026
full_months = pd.date_range(start="2024-04-01", end="2026-03-01", freq='MS')

# Calculate average 'aop_amount' per 'account_id' based on available data
account_avg = df_combined.groupby('account_id')['aop_amount'].mean().reset_index()

# Get the first available month for each account_id
first_months = df_combined.groupby('account_id')['month'].min().reset_index()
first_months.columns = ['account_id', 'first_available_month']

# Prepare a list to store new rows
new_rows = []

# Iterate over each account_id
for _, row in account_avg.iterrows():
    account_id = row['account_id']
    avg_amount = row['aop_amount']
    first_available_month = first_months.loc[first_months['account_id'] == account_id, 'first_available_month'].values[0]

    # Generate missing months after first available month
    missing_months = [m for m in full_months if m >= first_available_month]

    # Exclude months that already exist in df_combined for this account_id
    existing_months = df_combined[df_combined['account_id'] == account_id]['month'].tolist()

    final_missing_months = [m for m in missing_months if m not in existing_months]

    # Get other account details (e.g., account_name, Owner) from existing records
    account_details = df_combined[df_combined['account_id'] == account_id].iloc[0]

    # Create new rows for missing months
    for month in final_missing_months:
        new_rows.append({
            'account_id': account_id,
            'account_name': account_details['account_name'],
            'Owner': account_details['Owner'],
            'aop_amount': avg_amount,
            'month': month
        })

# Convert new rows to DataFrame
df_new = pd.DataFrame(new_rows)

# Append to original DataFrame
df_combined = pd.concat([df_combined, df_new], ignore_index=True)

# Sort by account_id and month
df_combined = df_combined.sort_values(by=['account_id', 'month']).reset_index(drop=True)

# Display final DataFrame
print(df_combined)

        account_id         account_name          Owner  aop_amount      month
0     116631508635  fastag_non prod_ocl   Mukesh Meena     3007.00 2024-12-01
1     116631508635  fastag_non prod_ocl   Mukesh Meena     3007.00 2025-01-01
2     116631508635  fastag_non prod_ocl   Mukesh Meena     2996.00 2025-02-01
3     116631508635  fastag_non prod_ocl   Mukesh Meena     3007.00 2025-03-01
4     116631508635  fastag_non prod_ocl   Mukesh Meena     3004.25 2025-04-01
...            ...                  ...            ...         ...        ...
1971  997924011427                 ckyc  Abhinav Gupta     8000.50 2025-11-01
1972  997924011427                 ckyc  Abhinav Gupta     8000.50 2025-12-01
1973  997924011427                 ckyc  Abhinav Gupta     8000.50 2026-01-01
1974  997924011427                 ckyc  Abhinav Gupta     8000.50 2026-02-01
1975  997924011427                 ckyc  Abhinav Gupta     8000.50 2026-03-01

[1976 rows x 5 columns]


## Data loading

### Subtask:
Save the prepared DataFrame `df_template` to a CSV file named 'final_budget_data.csv'.


**Reasoning**:
Save the `df_template` DataFrame to a CSV file named 'final_budget_data.csv' without the index.



In [28]:
try:
    df_combined.to_csv('final_budget_data.csv', index=False)
    print("DataFrame saved to final_budget_data.csv")
except Exception as e:
    print(f"An error occurred while saving the DataFrame: {e}")

DataFrame saved to final_budget_data.csv
