## Data loading

### Subtask:
Load the three CSV files into pandas DataFrames.


**Reasoning**:
Load the three CSV files into pandas DataFrames, ensuring the `account_id` column is treated as a string.



In [51]:
import pandas as pd

df1 = pd.read_csv('act1.csv', dtype={'account_id': str})
df2 = pd.read_csv('act2.csv', dtype={'account_id': str})
df3 = pd.read_csv('act3.csv', dtype={'account_id': str})
df4 = pd.read_csv('act4.csv', dtype={'account_id': str})

## Data preparation

### Subtask:
Prepare the dataframes for merging.


**Reasoning**:
Standardize column names to lowercase and ensure the "account_id" column exists and has a consistent string data type across all three dataframes. Also, handle other relevant columns for merging as per the task description.



In [52]:
import pandas as pd

# Standardize column names to lowercase
df1.columns = df1.columns.str.lower()
df2.columns = df2.columns.str.lower()
df3.columns = df3.columns.str.lower()
df4.columns = df4.columns.str.lower()

# Ensure 'account_id' is string type in all dataframes
for df in [df1, df2, df3, df4]:
    if df['account_id'].dtype != 'object':
        df['account_id'] = df['account_id'].astype(str)

# Display the column names of each dataframe to confirm changes
display(df1.columns)
display(df2.columns)
display(df3.columns)
display(df4.columns)


Index(['account_id', 'account_name', 'owner', 'entity'], dtype='object')

Index(['account_id', 'account_name'], dtype='object')

Index(['account_id', 'account_name', 'owner'], dtype='object')

Index(['account_id', 'account_name'], dtype='object')

## Data wrangling

### Subtask:
Merge the four dataframes (df1, df2, df3, df4) into a single dataframe.


**Reasoning**:
Merge df1 and df2, then merge the result with df3 and df4 , and select the required columns.



In [53]:
# Merge df2 into df1
merged_df = pd.merge(df1, df2, on='account_id', how='outer', suffixes=('_df1', '_df2'))
merged_df['account_name'] = merged_df['account_name_df1'].combine_first(merged_df['account_name_df2'])
merged_df = merged_df[['account_id', 'account_name', 'owner', 'entity']]

# Merge df3 into merged_df
merged_df = pd.merge(merged_df, df3, on='account_id', how='outer', suffixes=('_merged', '_df3'))
merged_df['account_name'] = merged_df['account_name_merged'].combine_first(merged_df['account_name_df3'])
merged_df['owner'] = merged_df['owner_merged'].combine_first(merged_df['owner_df3'])
merged_df = merged_df[['account_id', 'account_name', 'owner', 'entity']]

# Merge df4 into merged_df
merged_df = pd.merge(merged_df, df4, on='account_id', how='outer', suffixes=('_merged', '_df4'))
merged_df['account_name'] = merged_df['account_name_merged'].combine_first(merged_df['account_name_df4'])
merged_df = merged_df[['account_id', 'account_name', 'owner', 'entity']]

# Display the merged dataframe
display(merged_df)



Unnamed: 0,account_id,account_name,owner,entity
0,Premium Support,Premium Support,Parakh Agarwal,OCL
1,002637409966,user-management-prod,Shrey Agarwal,OCL
2,003079193758,IOT Prod,Sunil Bansal,OCL
3,015680176265,PPSL Payer,Parakh Agarwal,PPSL
4,017820698743,risk-devops-tools,Bhaskar Gopalan,OCL
...,...,...,...,...
323,,International Expansion,Sunil Bansal,OCL
324,,International Expansion,Sunil Bansal,OCL
325,,International Expansion,Sunil Bansal,OCL
326,,International Expansion,Sunil Bansal,OCL


In [54]:
filtered_df = merged_df[merged_df['account_id'] == '521026337032']

# Display the result
print(filtered_df)

       account_id     account_name owner entity
137  521026337032  ppblpg2-nonprod   NaN    NaN


## Data cleaning

### Subtask:
Clean the merged dataframe by handling missing values and removing duplicate rows.


**Reasoning**:
Remove duplicate rows based on "account_id", keeping the first occurrence.



In [55]:
# Remove duplicate rows based on 'account_id'
merged_df = merged_df.drop_duplicates(subset='account_id', keep='first')

# Display the cleaned dataframe
display(merged_df)

Unnamed: 0,account_id,account_name,owner,entity
0,Premium Support,Premium Support,Parakh Agarwal,OCL
1,002637409966,user-management-prod,Shrey Agarwal,OCL
2,003079193758,IOT Prod,Sunil Bansal,OCL
3,015680176265,PPSL Payer,Parakh Agarwal,PPSL
4,017820698743,risk-devops-tools,Bhaskar Gopalan,OCL
...,...,...,...,...
288,Equity DR,Equity DR,,
289,New 1,<Place Holder for NOS Account>,,
290,New 2,<Place Holder for PSPL Gratification Account>,,
291,New 3,Confluent Kafka,Mukesh Meena,OCL


## Data preparation

### Subtask:
Reorder the columns in the merged dataframe to match the specified order.


**Reasoning**:
Reorder the columns of the merged dataframe to match the specified order and display the columns to verify the change.



In [56]:
# Reorder the columns
merged_df = merged_df.reindex(columns=['account_id', 'account_name', 'owner', 'entity'])

# Display the reordered columns
display(merged_df.columns)

Index(['account_id', 'account_name', 'owner', 'entity'], dtype='object')

## Data loading

### Subtask:
Save the merged and cleaned dataframe `merged_df` to a new CSV file named "combined_account_details.csv".


**Reasoning**:
Save the merged dataframe `merged_df` to a CSV file named "combined_account_details.csv" without the index.



In [58]:
merged_df.to_csv('account_details.csv', index=False)