# ✅ Script Execution Test – Clean Python Scripts

### **Ironhack Data Science and Machine Learning Bootcamp**  
📅 **Date:** December 12, 2024  
📅 **Submission Date:** December 13, 2024  
👩‍💻 **Author:** Ginosca Alejandro Dávila

### 📁 Project: Ironhack Payments – Cohort Analysis  
**Environment:** Google Colab  
**Goal:** Validate that the clean `.py` scripts for data cleaning, exploratory analysis, and future notebooks (cohort analysis and Streamlit app) run successfully from the Terminal using `!python` commands. This ensures the project is fully operationalized for both Colab and local environments.

---

### 🧪 Scripts Tested So Far:
- `1_data_cleaning_ironhack_payments.py`
- `2_eda_ironhack_payments.py`

### ⏳ Scripts to Be Tested:
- `3_cohort_analysis_metrics.py` *(in development)*
- `4_streamlit_app_dev.py` *(optional Streamlit app)*

---

### 📌 Notes:
- Mount Google Drive before running scripts.
- Scripts are located in:  
  `/content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/clean/`
- Outputs such as CSVs and plots are saved in the `cleaned_project_datasets/` and `eda_outputs/` folders within the project base path.


---

## 🗂️ Mount Google Drive and Set Project Path

This step sets up the working environment in **Google Colab** by:

- 📦 Mounting your Google Drive
- 🔍 Attempting to use a default project path
- 📥 Prompting for manual input if the default path is not found

The `project_base_path` will be used to access your cleaned scripts and verify their outputs.

> 📌 **Note:** This notebook is intended for use in **Google Colab only**. Local execution is not supported.


In [1]:
import sys
import os

# ✅ Safe print for encoding compatibility in some terminals
def safe_print(text):
    try:
        print(text)
    except UnicodeEncodeError:
        print(text.encode("ascii", errors="ignore").decode())

# ✅ Mount Google Drive and set project path (Colab only)
from google.colab import drive
drive.mount('/content/drive')

# 📁 Try default project path first
default_path = 'MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en'
full_default_path = os.path.join('/content/drive', default_path)

if os.path.exists(full_default_path):
    project_base_path = full_default_path
    safe_print(f"✅ Colab project path set to: {project_base_path}")
else:
    # 🔄 Prompt user for project path if default fails
    safe_print("\n📂 Default path not found. Please input the relative path to your project inside Google Drive.")
    safe_print("👉 Example: 'MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en'")
    user_path = input("📥 Your path: ").strip()
    project_base_path = os.path.join('/content/drive', user_path)

    if not os.path.exists(project_base_path):
        raise FileNotFoundError(f"❌ Path does not exist: {project_base_path}\nPlease check your input.")

    safe_print(f"✅ Colab project path set to: {project_base_path}")


Mounted at /content/drive
✅ Colab project path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en


---

## 🧼 Step 1: Execute Data Cleaning Script

This step runs the standalone script `1_data_cleaning_ironhack_payments.py`, which processes the raw datasets and saves cleaned versions into the `cleaned_project_datasets/` folder.

The script performs the following operations:

- 📂 Loads and inspects both raw datasets (`cash_df`, `fees_df`)
- 🧠 Classifies variables and converts datetime fields
- 🧹 Handles missing values and resolves user ID inconsistencies
- 🔤 Standardizes categorical variables for reliable grouping
- 💰 Validates monetary columns for outliers or incorrect values
- 🧾 Saves cleaned datasets to disk for use in EDA and cohort analysis

The script is environment-aware (Colab or local), compatible with `.py` execution, and safely prints all outputs for review. Running this cell ensures that the data preparation phase executes independently and as expected.


In [2]:
script_path = os.path.join(project_base_path, "scripts/clean/1_data_cleaning_ironhack_payments.py")
!python "$script_path"


✅ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en
📄 Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/project_dataset/extract - cash request - data analyst.csv
📄 Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/project_dataset/extract - fees - data analyst - .csv
❌ File not found.
🔎 Tried: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/project_dataset/extract - cash request - data analyst.csv
🔎 Tried: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/project_dataset/extract - fees - data analyst - .csv
📌 Make sure the file names and directories are spelled correctly.
Traceback (most recent call last):
  File "/content/drive/MyDrive/Colab Notebooks/I

---

## 📊 Step 2: Execute Exploratory Data Analysis Script

This script loads the cleaned datasets, performs a full exploratory data analysis (EDA), and saves both visual and tabular outputs to the `eda_outputs/` folder.

It includes:
- 📈 Descriptive statistics and categorical breakdowns  
- 📅 Time-based trends in cash requests, fees, and incidents  
- 🧍 User-level behavior analysis (e.g., frequency, MAU)  
- 💸 Revenue breakdowns by fee type and request outcome  
- 🔄 Transfer type evolution (instant vs. regular)  
- 🔗 Merging of cash and fee datasets  
- 💾 Export of key aggregates for cohort analysis:
  - `user_first_request.csv`
  - `monthly_active_users.csv`
  - `transfer_type_share.csv`
  - `merged_cash_fee.csv`


In [3]:
script_path = os.path.join(project_base_path, "scripts/clean/2_eda_ironhack_payments.py")
!python "$script_path"


✅ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en
✅ Cleaned datasets loaded successfully.
cash_df shape: (23970, 17)
fees_df shape: (21057, 13)
🔍 First 5 rows of cash_df:
     id  final_user_id  amount    status transfer_type          reimbursement_date                  created_at                  updated_at  user_id                moderated_at  deleted_account_id cash_request_received_date money_back_date send_at recovery_status reco_creation reco_last_update
0     5          804.0   100.0  rejected       regular  2020-01-09 19:05:21.596363  2019-12-10 19:05:21.596873  2019-12-11 16:47:42.407830    804.0  2019-12-11 16:47:42.405646                 NaN                        NaN             NaN     NaN             NaN           NaN              NaN
1    70          231.0   100.0  rejected       regular  2020-01-09 19:50:12.347780  2019-12-10 19:50:12.347780  2019-12-11 14:24:22.900054  

---

## 📈 Step 3: Execute Cohort Metrics Script

This script calculates key cohort-level performance metrics based on the outputs from the EDA phase. The results are saved as `.csv` files and visual plots inside the `cohort_outputs/` folder.

The script includes:
- 🔢 Cohort assignment by user’s first cash request
- 🔁 Frequency of service usage across cohorts
- 📉 Retention matrix calculation (raw and filtered)
- 🧯 Incident rate evaluation by cohort
- 💰 Revenue per cohort and cumulative trends
- 📊 ARPU and Customer Lifetime Value (CLV) estimates

Each output is saved with an indexed filename for traceability.  
Running this cell ensures the KPIs are generated programmatically and are ready for dashboard integration.


In [4]:
script_path = os.path.join(project_base_path, "scripts/clean/3_cohort_analysis_metrics.py")
!python "$script_path"


✅ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en
📄 Loading EDA output files...

📁 Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/eda_outputs/data/user_first_request.csv
✅ Loaded user_first_request.csv → Shape: (11793, 4)
📁 Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/eda_outputs/data/monthly_active_users.csv
✅ Loaded monthly_active_users.csv → Shape: (13, 3)
📁 Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/eda_outputs/data/transfer_type_share.csv
✅ Loaded transfer_type_share.csv → Shape: (13, 4)
📁 Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/eda_outputs/data/merged_cash_fee.csv
✅ Loaded merged_cash_fee.csv → Sh

---

## 💻 Step 4: Execute Streamlit App Script (Optional)

This script sets up a basic interactive dashboard using **Streamlit** to display cohort analysis results from previous steps.

It includes:
- 🧩 Layout of key metrics and charts
- 📊 Integration of cohort visualizations
- 🔄 Real-time refresh logic for saved outputs
- 🌐 Local or cloud-based app launch using `streamlit run`

> ⚠️ Note: This script is optional and intended for local or web-based execution.  
It may not display correctly in Google Colab unless manually adapted or downloaded.

Running this script will help you test whether the dashboard renders correctly based on the current saved outputs.


In [5]:
script_path = os.path.join(project_base_path, "scripts/clean/4_streamlit_app_dev.py")
!python "$script_path"


✅ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en
✅ Cohort data files loaded successfully.
🖼️ Plot file paths loaded successfully.
Traceback (most recent call last):
  File "/content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/clean/4_streamlit_app_dev.py", line 110, in <module>
    import streamlit as st
ModuleNotFoundError: No module named 'streamlit'
