# 📦 Export Notebooks to Python Scripts

### **Ironhack Data Science and Machine Learning Bootcamp**  
📅 **Date:** December 12, 2024  
📅 **Submission Date:** December 13, 2024  
👩‍💻 **Author:** Ginosca Alejandro Dávila  

---

### 🧾 Project: Ironhack Payments – Cohort Analysis  
📁 **Goal:** Operationalize this project by exporting all key analysis notebooks into `.py` files that can be run programmatically, version-controlled, or submitted for review.

---

### ✅ Notebooks to Export:

📓 `1_data_cleaning_ironhack_payments.ipynb` → Data loading, cleaning, validation, and export  
📓 `2_eda_ironhack_payments.ipynb` → Exploratory Data Analysis and cohort setup  
📓 `3_cohort_analysis_metrics.ipynb` → Metric calculations: usage frequency, incident rate, revenue, retention  
📓 `4_streamlit_app_dev.ipynb` → (Optional) Development of interactive Streamlit app  

> ❌ `export_ironhack_payments_notebooks_to_py.ipynb` will **not** be exported, as it serves only for conversion automation.

---

### 💡 Purpose of This Notebook:

This notebook automates the export of `.ipynb` notebooks into `.py` scripts using **`nbconvert`**, allowing:

- 🔁 Seamless reproducibility of your analysis pipeline  
- ✅ Compliance with submission or deployment requirements  
- 🔍 Easier version control, code review, and editing  
- 🛠️ Support for CLI execution, scheduled runs, and modular reuse

---

### 📂 Export Destinations

All exported Python scripts will be saved to the following subfolders inside the project’s `scripts/` directory:

- 📁 `scripts/annotated/` → Scripts **with markdown comments** (human-readable, ideal for code review and learning)  
- 📁 `scripts/clean/` → Scripts **without markdown comments** (leaner, ideal for production use)

---

### 🧩 Compatibility Notes

- ✅ All `.py` files are designed to work **both in Google Colab** and **locally**.
- 📦 Project paths are dynamically set using an `is_colab()` environment check and relative structure.
- 🚫 Interactive preview features (like `display()`) are gracefully handled to avoid errors when run outside Jupyter/Colab.
- 🔐 Overwrite protection, timestamped logging, and modular function design are incorporated for safe automation.

---


---

## 🗂️ Step 1: Mount Google Drive and Set Project Path

This step ensures the notebook is compatible with both **Google Colab** and **local environments**.

- 📦 If running in **Colab**, the notebook will:
  - Mount Google Drive
  - Attempt to use the default project path
  - Prompt for manual input if the default is not found

- 💻 If running **locally**, the base path will be detected from the script’s location automatically.

The goal is to dynamically assign the `project_base_path`, which points to your project folder:  
`project-1-ironhack-payments-2-en/`

This allows the export script to locate your `notebooks/` folder and save scripts to the correct `scripts/` subdirectories, no matter where it's run.

---


In [1]:
import sys
import os

# ✅ Safe print for cross-platform compatibility
def safe_print(text):
    try:
        print(text)
    except UnicodeEncodeError:
        print(text.encode("ascii", errors="ignore").decode())

# ✅ Check if running in Google Colab
def is_colab():
    return 'google.colab' in sys.modules

# ✅ Determine base project path
if is_colab():
    from google.colab import drive
    drive.mount('/content/drive')

    default_path = 'MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en'
    full_default_path = os.path.join('/content/drive', default_path)

    if os.path.exists(full_default_path):
        project_base_path = full_default_path
        safe_print(f"✅ Colab project path set to: {project_base_path}")
    else:
        safe_print("\n📂 Default path not found. Please input the relative path to your project inside Google Drive.")
        safe_print("👉 Example: 'MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en'")
        user_path = input("📥 Your path: ").strip()
        project_base_path = os.path.join('/content/drive', user_path)

        if not os.path.exists(project_base_path):
            raise FileNotFoundError(f"❌ Path does not exist: {project_base_path}\nPlease check your input.")

        safe_print(f"✅ Colab project path set to: {project_base_path}")
else:
    try:
        script_dir = os.path.dirname(os.path.abspath(__file__))
    except NameError:
        script_dir = os.getcwd()

    project_base_path = os.path.abspath(os.path.join(script_dir, '..', '..'))
    safe_print(f"✅ Local environment detected. Base path set to: {project_base_path}")


Mounted at /content/drive
✅ Colab project path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en


## 📦 Step 2: Export Notebooks to Python Scripts

This step converts all main project notebooks into `.py` script files for operational use.  
Each notebook is exported twice:

- 📁 `scripts/annotated/` → Includes markdown comments (for readability and review)
- 📁 `scripts/clean/` → Excludes markdown (for cleaner execution or production use)

> Only notebooks that exist will be converted. Exported scripts will **overwrite** previous versions.


In [2]:
import os
from nbconvert import PythonExporter
from nbformat import read
import nbformat

# --- 📁 Define notebook and script directories ---
notebook_dir = os.path.join(project_base_path, 'notebooks')
script_base_dir = os.path.join(project_base_path, 'scripts')
annotated_dir = os.path.join(script_base_dir, 'annotated')
clean_dir = os.path.join(script_base_dir, 'clean')
os.makedirs(annotated_dir, exist_ok=True)
os.makedirs(clean_dir, exist_ok=True)

# --- 📓 Notebook headers ---
notebook_headers = {
    "1_data_cleaning_ironhack_payments.ipynb": """\
# 🧼 Data Cleaning Script – Ironhack Payments
# 📓 Source Notebook: 1_data_cleaning_ironhack_payments.ipynb
# 🧠 Description: Loads, inspects, and cleans raw cash request and fee datasets.
# 📅 Date: December 13, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
# 🛠️ Bootcamp: Ironhack Data Science and Machine Learning
""",
    "2_eda_ironhack_payments.ipynb": """\
# 📊 Exploratory Data Analysis Script – Ironhack Payments
# 📓 Source Notebook: 2_eda_ironhack_payments.ipynb
# 🔍 Description: Analyzes cleaned data, visualizes user behavior, and prepares cohort aggregates.
# 📅 Date: December 13, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
# 🛠️ Bootcamp: Ironhack Data Science and Machine Learning
""",
    "3_cohort_analysis_metrics.ipynb": """\
# 📈 Cohort Metrics Script – Ironhack Payments
# 📓 Source Notebook: 3_cohort_analysis_metrics.ipynb
# 📊 Description: Calculates monthly cohort KPIs: user retention, frequency, incidents, and revenue.
# 📅 Date: December 13, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
# 🛠️ Bootcamp: Ironhack Data Science and Machine Learning
""",
    "4_streamlit_app_dev.ipynb": """\
# 💻 Streamlit App Script – Ironhack Payments Dashboard
# 📓 Source Notebook: 4_streamlit_app_dev.ipynb
# 🌐 Description: Prepares a web-based dashboard using Streamlit to visualize cohort KPIs.
# 📅 Date: December 13, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
# 🛠️ Bootcamp: Ironhack Data Science and Machine Learning
"""
}

# --- 📌 Standard Attribution Block ---
attribution_block = """
# ------------------------------------------------------------------------------
# 🛡️ License & Attribution
#
# © 2024 Ginosca Alejandro Dávila
# Project: Ironhack Payments – Cohort Analysis
# Bootcamp: Ironhack Data Science and Machine Learning
#
# This work is provided for educational purposes under the MIT License.
# You may reuse, modify, or redistribute with attribution.
# ------------------------------------------------------------------------------
""".strip()

# --- Exporters ---
annotated_exporter = PythonExporter()
clean_exporter = PythonExporter()
clean_exporter.exclude_markdown = True

# --- 🔁 Convert notebooks ---
for nb_file in notebook_headers:
    nb_path = os.path.join(notebook_dir, nb_file)
    if not os.path.exists(nb_path):
        print(f"❌ {nb_file} not found. Skipping.")
        continue

    with open(nb_path, 'r', encoding='utf-8') as f:
        nb_node = read(f, as_version=4)

    header = notebook_headers[nb_file].strip() + "\n\n"
    footer = "\n\n" + attribution_block + "\n"

    # Annotated version
    annotated_code, _ = annotated_exporter.from_notebook_node(nb_node)
    annotated_code = header + annotated_code + footer
    annotated_out_path = os.path.join(annotated_dir, nb_file.replace('.ipynb', '.py'))
    with open(annotated_out_path, 'w', encoding='utf-8') as f_out:
        f_out.write(annotated_code)
    print(f"✅ Exported (annotated): {annotated_out_path}")

    # Clean version
    clean_code, _ = clean_exporter.from_notebook_node(nb_node)
    clean_code = header + clean_code + footer
    clean_out_path = os.path.join(clean_dir, nb_file.replace('.ipynb', '.py'))
    with open(clean_out_path, 'w', encoding='utf-8') as f_out:
        f_out.write(clean_code)
    print(f"✅ Exported (clean): {clean_out_path}")


✅ Exported (annotated): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/annotated/1_data_cleaning_ironhack_payments.py
✅ Exported (clean): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/clean/1_data_cleaning_ironhack_payments.py
✅ Exported (annotated): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/annotated/2_eda_ironhack_payments.py
✅ Exported (clean): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/clean/2_eda_ironhack_payments.py
✅ Exported (annotated): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/annotated/3_cohort_analysis_metrics.py
✅ Exported (clean): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 2/Week 2 - Day 4/project-1-ironhack-payments-2-en/scripts/cle