# Tableau Data Preparation

## Objectives

* Prepare and format data for Tableau visualization of all three hypotheses
* Export cleaned datasets in optimal format for Tableau analysis
* Create focused datasets for hypothesis testing in Tableau

## Inputs

* data/inputs/cleaned_bank_data.csv

## Outputs

* **hyp1_points_tableau.csv** - Points vs Attrition analysis
* **hyp2_creditscore_tableau.csv** - Credit Score vs Attrition analysis  
* **hyp3_tenure_tableau.csv** - Tenure vs Attrition analysis

## Additional Comments

* This notebook creates three focused datasets for Tableau dashboard creation
* Each CSV contains the key variables for hypothesis testing with clear labels
* All datasets include Attrition_Status column for better Tableau visualizations

---

---

# Load and Prepare Data for Tableau

Load the cleaned dataset and prepare it for Tableau visualization

In [6]:
# Import necessary libraries
import pandas as pd
import numpy as np
import os

In [7]:
# Load the cleaned dataset
df = pd.read_csv("../data/inputs/cleaned_bank_data.csv")

# Display basic information about the dataset
print(f"Dataset shape: {df.shape}")
df.head()

Dataset shape: (10000, 19)


Unnamed: 0.1,Unnamed: 0,RowNumber,CustomerId,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Complain,SatisfactionScore,CardType,PointEarned,AgeGroup
0,0,1,15598695,619,France,Female,42,2,0.0,1,1,1,101348.88,1,1,2,DIAMOND,464,40-49
1,1,2,15649354,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0,1,3,DIAMOND,456,40-49
2,2,3,15737556,502,France,Female,42,8,159660.8,3,1,0,113931.57,1,1,3,DIAMOND,377,40-49
3,3,4,15671610,699,France,Female,39,1,0.0,2,0,0,93826.63,0,0,5,GOLD,350,30-39
4,4,5,15625092,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0,0,5,GOLD,425,40-49


---

# Data Transformations for Tableau

Apply any additional transformations needed for optimal Tableau visualization

In [8]:
# Hypothesis 1: Points vs Attrition
df_hyp1 = df[['Exited', 'PointEarned']].copy()
df_hyp1['Attrition_Status'] = df_hyp1['Exited'].map({1: 'Attrited', 0: 'Retained'})
df_hyp1.to_csv("../data/outputs/hyp1_points_tableau.csv", index=False)
print("✅ Hypothesis 1 data exported: hyp1_points_tableau.csv")
print(f"   Dataset shape: {df_hyp1.shape}")

# Hypothesis 2: Credit Score vs Attrition  
df_hyp2 = df[['Exited', 'CreditScore']].copy()
df_hyp2['Attrition_Status'] = df_hyp2['Exited'].map({1: 'Attrited', 0: 'Retained'})
df_hyp2.to_csv("../data/outputs/hyp2_creditscore_tableau.csv", index=False)
print("✅ Hypothesis 2 data exported: hyp2_creditscore_tableau.csv")
print(f"   Dataset shape: {df_hyp2.shape}")

# Hypothesis 3: Tenure vs Attrition
df_hyp3 = df[['Exited', 'Tenure']].copy()
df_hyp3['Attrition_Status'] = df_hyp3['Exited'].map({1: 'Attrited', 0: 'Retained'})
df_hyp3.to_csv("../data/outputs/hyp3_tenure_tableau.csv", index=False)
print("✅ Hypothesis 3 data exported: hyp3_tenure_tableau.csv")
print(f"   Dataset shape: {df_hyp3.shape}")

print("\n All three hypothesis datasets ready for Tableau analysis!")

# Display sample of each dataset
print("\n Sample data preview:")
print("Hypothesis 1 (Points):")
display(df_hyp1.head(3))
print("\nHypothesis 2 (Credit Score):")
display(df_hyp2.head(3))
print("\nHypothesis 3 (Tenure):")
display(df_hyp3.head(3))

✅ Hypothesis 1 data exported: hyp1_points_tableau.csv
   Dataset shape: (10000, 3)
✅ Hypothesis 2 data exported: hyp2_creditscore_tableau.csv
   Dataset shape: (10000, 3)
✅ Hypothesis 3 data exported: hyp3_tenure_tableau.csv
   Dataset shape: (10000, 3)

 All three hypothesis datasets ready for Tableau analysis!

 Sample data preview:
Hypothesis 1 (Points):


Unnamed: 0,Exited,PointEarned,Attrition_Status
0,1,464,Attrited
1,0,456,Retained
2,1,377,Attrited



Hypothesis 2 (Credit Score):


Unnamed: 0,Exited,CreditScore,Attrition_Status
0,1,619,Attrited
1,0,608,Retained
2,1,502,Attrited



Hypothesis 3 (Tenure):


Unnamed: 0,Exited,Tenure,Attrition_Status
0,1,2,Attrited
1,0,1,Retained
2,1,8,Attrited


---

# Export Data for Tableau

Export the final dataset in a format optimized for Tableau

In [9]:
# Summary of Exported Tableau Datasets

print("TABLEAU EXPORT SUMMARY")
print("=" * 50)

# Check if all files were created successfully
files_to_check = [
    "../data/outputs/hyp1_points_tableau.csv",
    "../data/outputs/hyp2_creditscore_tableau.csv", 
    "../data/outputs/hyp3_tenure_tableau.csv"
]

for file in files_to_check:
    if os.path.exists(file):
        file_size = os.path.getsize(file)
        print(f"SUCCESS: {file} - {file_size:,} bytes")
    else:
        print(f"ERROR: {file} - NOT FOUND")

print("\nAll datasets are ready for Tableau analysis!")
print("Each file contains 10,000 rows with Attrition_Status labels")
print("Import these files into Tableau to create hypothesis dashboards")

TABLEAU EXPORT SUMMARY
SUCCESS: ../data/outputs/hyp1_points_tableau.csv - 160,050 bytes
SUCCESS: ../data/outputs/hyp2_creditscore_tableau.csv - 160,037 bytes
SUCCESS: ../data/outputs/hyp3_tenure_tableau.csv - 140,522 bytes

All datasets are ready for Tableau analysis!
Each file contains 10,000 rows with Attrition_Status labels
Import these files into Tableau to create hypothesis dashboards
