# 📦 Export Notebooks to Python Scripts

### **Data Science and Machine Learning Bootcamp – Ironhack Puerto Rico**  
📅 **Date:** December 20, 2024  
👩‍💻 **Author:** Ginosca Alejandro Dávila  

---

### 🧾 Project: Online Retail II – Sales Analysis & Customer Segmentation  
📁 **Goal:** Operationalize this project by exporting Colab notebooks into `.py` scripts for reproducibility, automation, and clean version control.

---

### ✅ Notebooks to Export:

📓 `1_data_cleaning_online_retail_ii.ipynb` → Full cleaning pipeline: loading, validation, normalization, and export-ready output  
📓 `2_eda_online_retail_ii.ipynb` → Exploratory Data Analysis including trends, customers, and product behavior  
📓 `3_sql_analysis_sales_performance_online_retail_ii.ipynb` → SQL-based KPIs and customer segmentation from normalized data  
📓 `4_mysql_real_env_setup_online_retail_ii.ipynb` → MySQL schema creation, CSV loading, and credential handling  

> ❌ `export_notebooks_to_py_online_retail_ii.ipynb` and `test_clean_scripts_colab_online_retail_ii.ipynb` are used only for testing and conversion, and will **not** be exported.

---

### 💡 Purpose of This Notebook:

This notebook automates the export of `.ipynb` files to `.py` using **`nbconvert`**, allowing:

- 🔁 Reproducibility across environments  
- 🧱 Clear structure for production pipelines and modular reuse  
- 📂 Scripted versions for CLI use, reviews, and deployment

---

### 📂 Export Destinations

All generated `.py` scripts will be saved in the following subfolders inside the project’s `scripts/` directory:

- 📁 `scripts/python/annotated/` → Scripts **with markdown comments** (for clarity, review, and collaboration)  
- 📁 `scripts/python/clean/` → Scripts **without markdown comments** (for minimal production use)

---

### 🧩 Compatibility Notes

- ✅ Designed to run in **both Google Colab** and **local environments**
- 🛠️ Environment detection handled via `is_colab()` helper
- 🧼 `safe_print()` ensures encoding-safe output in all terminals
- 🔐 Overwrite-safe exporting logic can be added for repeatable conversions

---


---

## 🗂️ Step 1: Mount Google Drive and Set Project Path

This step ensures that the notebook runs correctly in both **Google Colab** and **local environments**.

- 📦 If running in **Colab**, the notebook will:
  - Mount Google Drive
  - Try to use the default project path:
    `MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql`
  - Prompt the user to input a path if the default is not found

- 💻 If running **locally**, the base path will be detected from the script’s location  
  (this notebook is assumed to live in `scripts/python/`, so it walks **three levels up** to reach the root folder)

We use this logic to dynamically assign `project_base_path`, which points to the root project folder.  
This allows the notebook to:
- Locate `notebooks/` for reading input
- Save `.py` scripts in the correct subdirectories:
  - `scripts/python/annotated/`
  - `scripts/python/clean/`

---


In [1]:
import sys
import os

# ✅ Safe print for encoding issues (especially on Windows or some shells)
def safe_print(text):
    try:
        print(text)
    except UnicodeEncodeError:
        print(text.encode("ascii", errors="ignore").decode())

# ✅ Check if running in Google Colab
def is_colab():
    return 'google.colab' in sys.modules

# ✅ Set base project path dynamically based on environment
if is_colab():
    from google.colab import drive
    drive.mount('/content/drive')

    # 📁 Default project path in Google Drive
    default_path = 'MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql'
    full_default_path = os.path.join('/content/drive', default_path)

    if os.path.exists(full_default_path):
        project_base_path = full_default_path
        safe_print(f"✅ Colab project path set to: {project_base_path}")
    else:
        # 🔁 Prompt user to enter a custom path
        safe_print("\n📂 Default path not found. Please input the relative path to your project inside Google Drive.")
        safe_print("👉 Example: `MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql`")
        user_path = input("📥 Your path: ").strip()
        project_base_path = os.path.join('/content/drive', user_path)

        if not os.path.exists(project_base_path):
            raise FileNotFoundError(f"❌ Path does not exist: {project_base_path}\nPlease check your input.")

        safe_print(f"✅ Colab project path set to: {project_base_path}")
else:
    try:
        script_dir = os.path.dirname(os.path.abspath(__file__))
    except NameError:
        script_dir = os.getcwd()

    # 📁 Project is assumed to be three levels up from the script (e.g., /scripts/python/)
    project_base_path = os.path.abspath(os.path.join(script_dir, '..', '..', '..'))
    safe_print(f"✅ Local environment detected. Base path set to: {project_base_path}")


Mounted at /content/drive
✅ Colab project path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql


---

## 📦 Step 2: Export Notebooks to Python Scripts

This step converts the main project notebooks into `.py` script files for operational use.  
Each notebook is exported twice:

- 📁 `scripts/python/annotated/` → Includes markdown comments (for readability and review)  
- 📁 `scripts/python/clean/` → Excludes markdown (for lean execution or production environments)

> Only existing notebooks will be exported.  
> Scripts with the same name will be **overwritten** to keep versions up to date.

---


In [2]:
import os
from nbconvert import PythonExporter
from nbformat import read

# --- 📁 Define notebook and export directories ---
notebook_dir = os.path.join(project_base_path, 'notebooks')
script_base_dir = os.path.join(project_base_path, 'scripts', 'python')
annotated_dir = os.path.join(script_base_dir, 'annotated')
clean_dir = os.path.join(script_base_dir, 'clean')
os.makedirs(annotated_dir, exist_ok=True)
os.makedirs(clean_dir, exist_ok=True)

# --- 📓 Notebook headers ---
notebook_headers = {
    "1_data_cleaning_online_retail_ii.ipynb": """\
# 🧼 Data Cleaning Script – Online Retail II
# 📓 Source Notebook: 1_data_cleaning_online_retail_ii.ipynb
# 📊 Description: Loads, cleans, and prepares the Online Retail II dataset for analysis.
# 🏫 School: Ironhack Puerto Rico
# 🎓 Bootcamp: Data Science and Machine Learning
# 📅 Date: December 20, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
""",
    "2_eda_online_retail_ii.ipynb": """\
# 📊 Exploratory Data Analysis Script – Online Retail II
# 📓 Source Notebook: 2_eda_online_retail_ii.ipynb
# 🔍 Description: Explores trends in sales, products, customers, and revenue.
# 🏫 School: Ironhack Puerto Rico
# 🎓 Bootcamp: Data Science and Machine Learning
# 📅 Date: December 20, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
""",
    "3_sql_analysis_sales_performance_online_retail_ii.ipynb": """\
# 🧮 SQL-Based Sales Analysis Script – Online Retail II
# 📓 Source Notebook: 3_sql_analysis_sales_performance_online_retail_ii.ipynb
# 🧠 Description: Answers business questions using SQL on normalized retail tables.
# 🏫 School: Ironhack Puerto Rico
# 🎓 Bootcamp: Data Science and Machine Learning
# 📅 Date: December 20, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
""",
    "4_mysql_real_env_setup_online_retail_ii.ipynb": """\
# 🛠️ MySQL Environment Setup Script – Online Retail II
# 📓 Source Notebook: 4_mysql_real_env_setup_online_retail_ii.ipynb
# ⚙️ Description: Automates MySQL DB creation and loads cleaned CSVs into tables.
# 🏫 School: Ironhack Puerto Rico
# 🎓 Bootcamp: Data Science and Machine Learning
# 📅 Date: December 20, 2024
# 👩‍💻 Author: Ginosca Alejandro Dávila
"""
}

# --- 📌 Standard Attribution Block ---
attribution_block = """
# ------------------------------------------------------------------------------
# 🛡️ License & Attribution
#
# © 2024 Ginosca Alejandro Dávila
# Project: Online Retail II – Sales Analysis & Customer Segmentation
# 🏫 School: Ironhack Puerto Rico
# 🎓 Bootcamp: Data Science and Machine Learning
# 📅 Date: December 20, 2024
# This work is provided for educational purposes under the MIT License.
# You may reuse, modify, or redistribute with attribution.
# ------------------------------------------------------------------------------
""".strip()

# --- Exporters ---
annotated_exporter = PythonExporter()
clean_exporter = PythonExporter()
clean_exporter.exclude_markdown = True

# --- 🔁 Convert notebooks ---
for nb_file in notebook_headers:
    nb_path = os.path.join(notebook_dir, nb_file)
    if not os.path.exists(nb_path):
        print(f"❌ {nb_file} not found. Skipping.")
        continue

    with open(nb_path, 'r', encoding='utf-8') as f:
        nb_node = read(f, as_version=4)

    header = notebook_headers[nb_file].strip() + "\n\n"
    footer = "\n\n" + attribution_block + "\n"

    # 📝 Export Annotated Version
    annotated_code, _ = annotated_exporter.from_notebook_node(nb_node)
    annotated_code = header + annotated_code + footer
    annotated_out_path = os.path.join(annotated_dir, nb_file.replace('.ipynb', '.py'))
    with open(annotated_out_path, 'w', encoding='utf-8') as f_out:
        f_out.write(annotated_code)
    print(f"✅ Exported (annotated): {annotated_out_path}")

    # 🧼 Export Clean Version
    clean_code, _ = clean_exporter.from_notebook_node(nb_node)
    clean_code = header + clean_code + footer
    clean_out_path = os.path.join(clean_dir, nb_file.replace('.ipynb', '.py'))
    with open(clean_out_path, 'w', encoding='utf-8') as f_out:
        f_out.write(clean_code)
    print(f"✅ Exported (clean): {clean_out_path}")


✅ Exported (annotated): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/annotated/1_data_cleaning_online_retail_ii.py
✅ Exported (clean): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/clean/1_data_cleaning_online_retail_ii.py
✅ Exported (annotated): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/annotated/2_eda_online_retail_ii.py
✅ Exported (clean): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/clean/2_eda_online_retail_ii.py
✅ Exported (annotated): /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/annotated/3_sql_analysis_sales_performance_online_retail_ii.py
✅ Ex