# ‚úÖ Script Testing Notebook

### **Data Science and Machine Learning Bootcamp ‚Äì Ironhack Puerto Rico**  
üìÖ **December 20, 2024**  
üë©‚Äçüíª **Author:** Ginosca Alejandro D√°vila  

---

## üß™ Purpose

This notebook is designed to test the **clean (`.py`) versions** of the main project scripts for:

üìÅ **Project:** Online Retail II ‚Äì Sales Analysis & Customer Segmentation  
üìÇ **Path:** `retail-sales-segmentation-sql/scripts/python/clean/`

---

## üß© Functionality Checked

- Correct environment setup and project path detection
- Compatibility with Google Colab
- Successful execution of `.py` scripts with no markdown or extra output
- Proper data loading and output generation by each script

---

## üìÑ Scripts Tested

1. `1_data_cleaning_online_retail_ii.py` ‚Äì Loads and cleans raw Excel data
2. `2_eda_online_retail_ii.py` ‚Äì Runs Exploratory Data Analysis on cleaned dataset
3. `3_sql_analysis_sales_performance_online_retail_ii.py` ‚Äì Executes SQL-based analysis and exports results
4. `4_mysql_real_env_setup_online_retail_ii.py` ‚Äì Creates MySQL schema and loads tables

---

> ‚úÖ This notebook ensures that each script runs correctly in Google Colab using the current project structure.

---

---

## üîó Step 1: Mount Google Drive and Set Project Path

This step ensures the notebook can access your project files from Google Drive.  
It will attempt to use the default path:

```
MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql
```

- If the default path exists, it is automatically assigned as `project_base_path`.
- If not found, you will be prompted to enter the path manually.
- This logic makes the notebook portable across machines or collaborators.

Once set, `project_base_path` will be used to locate all Python scripts to test.

---


In [1]:
# ‚úÖ Cross-platform setup for local or Colab environments
import os
import sys
from pathlib import Path

# ‚úÖ Safe print for encoding compatibility
def safe_print(text):
    try:
        print(text)
    except UnicodeEncodeError:
        print(text.encode("ascii", errors="ignore").decode())

# ‚úÖ Detect environment and set base project path
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')

    # Try default path
    default_path = Path('/content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql')
    if default_path.exists():
        project_base_path = default_path
        safe_print(f"‚úÖ Colab project path set to: {project_base_path}")
    else:
        safe_print("\nüìÇ Default path not found. Please input the relative path to your project inside Google Drive.")
        user_path = input("üì• Your path: ").strip()
        user_path = Path('/content/drive') / user_path
        if not user_path.exists():
            raise FileNotFoundError(f"‚ùå Path does not exist: {user_path}\nPlease check your input.")
        project_base_path = user_path
        safe_print(f"‚úÖ Colab project path set to: {project_base_path}")
else:
    # Assume notebook is in /notebooks and set base path one level up
    notebook_path = Path.cwd()
    project_base_path = notebook_path.parent
    safe_print(f"‚úÖ Local project path set to: {project_base_path}")


Mounted at /content/drive
‚úÖ Colab project path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql


---

## üßº Step 2: Test `1_data_cleaning_online_retail_ii.py`

This step executes the clean version of the **data cleaning script**, which performs the following tasks:

- Loads the raw Excel dataset from `/data/`
- Performs data inspection, cleaning, and normalization
- Exports a cleaned flat file and relational tables into `/cleaned_data/`

The script is located at:

```
scripts/python/clean/1_data_cleaning_online_retail_ii.py
```

> ‚úÖ Output messages will confirm successful execution or display any issues for debugging.

---


In [2]:
# ‚úÖ Run 1_data_cleaning_online_retail_ii.py
script_path = os.path.join(project_base_path, "scripts/python/clean/1_data_cleaning_online_retail_ii.py")
!python "$script_path"

‚úÖ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql
üìÑ Looking for: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/data/online_retail_II.xlsx
‚úÖ Excel sheets loaded successfully.
üßæ Combined dataset shape: (1067371, 8)
üßæ Inspecting: Online Retail II - Raw Combined Dataset
üîπ First 5 Rows:
  Invoice StockCode                          Description  Quantity         InvoiceDate  Price  Customer ID         Country
0  489434     85048  15CM CHRISTMAS GLASS BALL 20 LIGHTS        12 2009-12-01 07:45:00   6.95      13085.0  United Kingdom
1  489434    79323P                   PINK CHERRY LIGHTS        12 2009-12-01 07:45:00   6.75      13085.0  United Kingdom
2  489434    79323W                  WHITE CHERRY LIGHTS        12 2009-12-01 07:45:00   6.75      13085.0  United Kingdom
3  489434     22041  

---

## üìä Step 3: Test `2_eda_online_retail_ii.py`

This step executes the clean version of the **Exploratory Data Analysis (EDA)** script.  
It loads the cleaned dataset and generates:

- Descriptive statistics
- Visualizations of monthly trends, product performance, and customer behavior
- CSV and plot exports into `/eda_outputs/`

The script is located at:

```
scripts/python/clean/2_eda_online_retail_ii.py
```

> üìÅ Output files will be saved to `eda_outputs/plots/` and `eda_outputs/data/`.

---


In [3]:
# ‚úÖ Run 2_eda_online_retail_ii.py
script_path = os.path.join(project_base_path, "scripts/python/clean/2_eda_online_retail_ii.py")
!python "$script_path"

‚úÖ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql
‚úÖ Cleaned flat dataset loaded successfully.
cleaned_full_df shape: (766226, 9)
üßæ Inspecting: cleaned_online_retail_II.csv
üîπ First 5 Rows:
   invoice_no stock_code                         description  unit_price  quantity  line_revenue        invoice_date  customer_id         country
0      489434      21232      strawberry ceramic trinket box        1.25        24          30.0 2009-12-01 07:45:00        13085  united kingdom
1      489434      21523  doormat fancy font home sweet home        5.95        10          59.5 2009-12-01 07:45:00        13085  united kingdom
2      489434      21871                 save the planet mug        1.25        24          30.0 2009-12-01 07:45:00        13085  united kingdom
3      489434      22041         record frame 7" single size        2.10        48         100.8 2009-

---

## üßÆ Step 4: Test `3_sql_analysis_sales_performance_online_retail_ii.py`

This step runs the clean version of the **SQL-based sales analysis** script, which:

- Loads cleaned data and answers business questions using SQL
- Joins and queries normalized relational tables
- Saves results and visualizations from SQL analysis

The script is located at:

```
scripts/python/clean/3_sql_analysis_sales_performance_online_retail_ii.py
```

> üìÅ SQL analysis outputs are saved to `sql_outputs/notebook_outputs/`.

---


In [4]:
# ‚úÖ Run 3_sql_analysis_sales_performance_online_retail_ii.py
script_path = os.path.join(project_base_path, "scripts/python/clean/3_sql_analysis_sales_performance_online_retail_ii.py")
!python "$script_path"

‚úÖ Local environment detected. Base path set to: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql
‚úÖ All cleaned data files found in: /content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/cleaned_data
‚úÖ All normalized relational tables loaded successfully.
üìÑ customers.csv ‚Üí (5852, 2)
üìÑ products.csv ‚Üí (4624, 3)
üìÑ invoices.csv ‚Üí (36607, 3)
üìÑ invoice_items.csv ‚Üí (766226, 5)
üìÑ Previewing: customers.csv
üîπ Shape: 5852 rows √ó 2 columns
üîπ Column Names:
['customer_id', 'country']

üîπ First 5 rows:
   customer_id         country
0        13085  united kingdom
1        13078  united kingdom
2        15362  united kingdom
3        18102  united kingdom
4        12682          france

üîπ Data Types and Non-Null Counts:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5852 entries, 0 to 5851
Data columns (total 2 columns):
 #   Co

---

## üõ†Ô∏è Step 5 (Local Only): Test `4_mysql_real_env_setup_online_retail_ii.py`

> ‚ö†Ô∏è This step is intended **only for local environments** where a MySQL server is installed and running.

This script does the following:

- Loads credentials from `.env` in the `config/` folder
- Creates the `retail_sales` MySQL database with 4 tables
- Loads cleaned `.csv` files into the database

**This will not work in Google Colab.**  
Skip this step unless you're running this notebook locally (e.g., Jupyter on your machine).

The script is located at:

```
scripts/python/clean/4_mysql_real_env_setup_online_retail_ii.py
```

---


In [5]:
# ‚úÖ Run 4_mysql_real_env_setup_online_retail_ii.py (Local only)
script_path = os.path.join(project_base_path, "scripts/python/clean/4_mysql_real_env_setup_online_retail_ii.py")
!python "$script_path"

‚ùå Missing files:
 - /cleaned_data/customers.csv
 - /cleaned_data/products.csv
 - /cleaned_data/invoices.csv
 - /cleaned_data/invoice_items.csv
Traceback (most recent call last):
  File "/content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/clean/4_mysql_real_env_setup_online_retail_ii.py", line 59, in <module>
    import dotenv
ModuleNotFoundError: No module named 'dotenv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/drive/MyDrive/Colab Notebooks/Ironhack/Week 3/Week 3 - Day 4/project-2-eda-sql/retail-sales-segmentation-sql/scripts/python/clean/4_mysql_real_env_setup_online_retail_ii.py", line 61, in <module>
    get_ipython().system('pip install -q python-dotenv')
    ^^^^^^^^^^^
NameError: name 'get_ipython' is not defined
