# *Project: Vanguard A/B Test Results Analysis*
---

### CONTEXT:
- Company : Vanguard, the US-based investment management company. (website: https://investor.vanguard.com/)- Role : newly hired DATA ANALYST in the Customer Experience (CX) team. 
The team launched an exciting digital experiment, and now, they're eagerly waiting to uncover the results and need your help!
- Task : Analyze the results of the digital experiment conducted by the team.Primary objective:  Decode the experiment's performance.
- The critical question : Would these changes encourage more clients to complete the process?
- Belief : Vanguard believed that a more intuitive, modern UI with timely in-context prompts (cues, messages, or instructions within the user’s current task) could make the online process smoother for clients.
---
An **A/B test** was set **into motion** `from 3/15/2017 to 6/20/2017` by the team.
- Control Group: Clients interacted with Vanguard's traditional online process. (old UI)- Test Group: Clients experienced the new, spruced-up digital interface. (new UI)
---

Both groups navigated through an identical process sequence:
- an initial page (start), 
- three subsequent steps (step 1, step 2, step 3), 
- and finally, a confirmation page signaling process completion.
---
**The goal is to see if the new design leads to a better user experience and higher process completion rates.**

---

### Datasets and CSV files

| CSV Name                        | DataFrame Name                                               | Description                                                                 |
|---------------------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------|
| **Client Profiles**              | `df_final_demo`                                             | Demographics of clients including age, gender, and account details.        |
| **Digital Footprints – Part 1 & 2** | `digital_footprints` (merge of `df_final_web_data_pt_1` and `df_final_web_data_pt_2`) | Detailed trace of client online interactions; parts 1 & 2 should be merged.|
| **Experiment Roster**            | `df_final_experiment_clients`                                | List of clients who participated in the grand experiment.                  |


---
### STEP 01: Merging part 1 and part 2 into df final

In [3]:
import pandas as pd

In [4]:
import pandas as pd
import numpy as np
print(pd.__version__, np.__version__)

3.0.0 2.4.1


In [9]:
import pandas as pd
from IPython.display import display
# Load the txt files (comma-separated) and convert to DataFrames then save to CSV ---------------------------------------------------------------------------------------------
# Load the two parts of the final web data and split by the separator comma ","
df_final_web_data_pt_1 = pd.read_csv("../data/raw/df_final_web_data_pt_1.txt",sep=",")
df_final_web_data_pt_2 = pd.read_csv("../data/raw/df_final_web_data_pt_2.txt",sep=",")

# Display the 2 DataFrames to ensure they loaded correctly --------------------------------------------------------------
display(df_final_web_data_pt_1.head())
display(df_final_web_data_pt_2.head())

# Save each DataFrame to CSV ---------------------------------------------------------------------------------------------
print(f"WEB DATA PART 1 shape: {df_final_web_data_pt_1.shape}")
df_final_web_data_pt_1.to_csv("../data/interim/digital_footprints_pt_1.csv",index=False)

print(f"WEB DATA PART 2 shape: {df_final_web_data_pt_2.shape}")
df_final_web_data_pt_2.to_csv("../data/interim/digital_footprints_pt_2.csv",index=False)

print("Saved both parts as CSV files.")

# Merge the two parts (same columns) --------------------------------------------------------------------------------
df_final_web_data = pd.concat([df_final_web_data_pt_1, df_final_web_data_pt_2],ignore_index=True)
print(f"Final Merged part shape: {df_final_web_data.shape}")

# Save merged CSV ---------------------------------------------------------------------------------------------
df_final_web_data.to_csv('../data/interim/digital_footprints.csv', index=False)

print("Merged digital footprints parts and saved in '../data/interim/digital_footprints.csv'")

# Display the merged DataFrame to ensure it loaded correctly ---------------------------------------------------------------------------------------------
display(df_final_web_data.head())

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04


Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,763412,601952081_10457207388,397475557_40440946728_419634,confirm,2017-06-06 08:56:00
1,6019349,442094451_91531546617,154620534_35331068705_522317,confirm,2017-06-01 11:59:27
2,6019349,442094451_91531546617,154620534_35331068705_522317,step_3,2017-06-01 11:58:48
3,6019349,442094451_91531546617,154620534_35331068705_522317,step_2,2017-06-01 11:58:08
4,6019349,442094451_91531546617,154620534_35331068705_522317,step_1,2017-06-01 11:57:58


WEB DATA PART 1 shape: (343141, 5)
WEB DATA PART 2 shape: (412264, 5)
Saved both parts as CSV files.
Final Merged part shape: (755405, 5)
Merged digital footprints parts and saved in '../data/interim/digital_footprints.csv'


Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04


---
### STEP 02: Merge & Save in CSV files

In [None]:
import pandas as pd

# Save each part as CSV
df_final_web_data_pt_1.to_csv("../data/interim/final_web_data_pt_1.csv",index=False)
df_final_web_data_pt_2.to_csv("../data/interim/final_web_data_pt_2.csv",index=False)

# Merge the two parts (same columns)
df_final_experiment_clients = pd.concat([df_final_web_data_pt_1, df_final_web_data_pt_2],ignore_index=True)
df_final_experiment_clients.to_csv('../data/df_final_web_data.csv', index=False)
# Save merged CSV
df_final_experiment_clients.to_csv("../data/processed/final_web_data_merged.csv",index=False)
print("Date and time columns added and saved in 'df_final_web_data.csv'")
# Display the merged DataFrame to ensure it loaded correctly
display(df_final_experiment_clients.head())


In [None]:
# load csv to verify
digital_footprints = pd.read_csv('../data/df_final_web_data.csv')
digital_footprints.head()

---
### Load the 3 datasets

In [None]:
import pandas as pd

# client_profiles dataset
client_profiles = pd.read_csv("../data/df_final_demo.csv")

# digital_footprints dataset (main dataset for client behaviour)
digital_footprints = pd.read_csv("../data/df_final_web_data.csv")

# experiment_roster dataset
experiment_roster = pd.read_csv("../data/df_final_experiment_clients.csv")


### First View of the 3 datasets

In [None]:
digital_footprints.head()
digital_footprints.shape()
digital_footprints.info()

---
### Dealing with null values in the dataset

In [None]:
# SABINA COMMENT: What do you gather from this chart?

# SABINA COMMENT: Your logons_6months variable is not well-formatted, this
# chart is completely wrong (even though the idea is useful). Please re-do.

# Deal with Null values for Process step composition by variation (normalized 1)

# document in detail for each cleaning step next to its code and plot

# fixes like from this 15K to 15000
# a = "15K" # info correct but can't calculate with it unless corrected

---
### Load the 3 datasets

In [None]:
import pandas as pd

# combine datasets part 1 and part 2

# client_profiles dataset
client_profiles = pd.read_csv("../data/df_final_demo.csv")

# digital_footprints dataset (main dataset)
digital_footprints = pd.read_csv("../data/df_final_web_data.csv")

# experiment_roster dataset
experiment_roster = pd.read_csv("../data/df_final_experiment_clients.csv")


---
### First View of the 3 datasets

In [None]:
digital_footprints.head()
digital_footprints.shape()
digital_footprints.info()