# Data Loading Template
This notebook template provides a standardized way to load data files across all projects using the custom file handler module.

## üìÅ Expected Project Structure
```plaintext
Your Project/
‚îú‚îÄ‚îÄ 01_project_management/
‚îú‚îÄ‚îÄ 02_data/
‚îú‚îÄ‚îÄ 03_notebooks/         ‚Üê Run notebooks from here
‚îÇ   ‚îú‚îÄ‚îÄ src/              ‚Üê Custom modules live here
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ file_handler.py
‚îÇ   ‚îî‚îÄ‚îÄ template.ipynb    ‚Üê This template
‚îú‚îÄ‚îÄ 04_analyses/
‚îî‚îÄ‚îÄ 05_results/
```

## üîß Setup Instructions
**Prerequisites**:

* Run this notebook from the `03_notebooks/` directory
* Ensure `file_handler.py` is saved in `03_notebooks/src/`
* Required packages: `pandas`, `chardet`, `openpyxl` (for Excel files)

---

## üì• Import Custom File Handler
The file handler module provides:

* **Interactive path selection**: Choose input/output folders through prompts
* **Automatic file detection**: Supports CSV, Excel (.xlsx/.xls), and Pickle files
* **Smart encoding detection**: Automatically detects CSV encoding and delimiters
* **Multi-sheet Excel support**: Choose specific sheets from Excel files
* **Error handling**: Graceful handling of file access issues
---

## üöÄ Usage Guide
### Step 1: Set up project paths
The `setup_paths()` function will:

1. Detect your project root directory
2. Show available folders and let you choose input folder
3. Allow selection of subfolders if they exist
4. Let you choose output folder (or use same as input)
5. Create output directory if it doesn't exist

### Step 2: Load your data
The `load_data_with_detection()` function will:

1. Scan the input folder for supported file types
2. Let you choose which file to load
3. Automatically detect file format and load appropriately
4. Return a pandas DataFrame ready for analysis

### Supported file formats:

* **CSV files** (`.csv`) - with encoding and delimiter detection
* **Excel files** (`.xlsx`, `.xls`) - with sheet selection
* **Pickle files** (`.pkl`, `.pickle`) - pandas DataFrame pickles

---
## üí° Pro Tips

* Consistent structure: Always run notebooks from `03_notebooks/` for predictable imports
* File organization: Keep raw data in `02_data/`, processed data in appropriate project folders
* Output management: Use different output folders for different analysis stages
* Version control: The `src/` folder is perfect for custom modules you want to version control
---

In [1]:
# Import the CSV file handler module from src folder
# Make sure this notebook is run from the 03_notebooks/ directory
# and csv_file_handler.py is saved in 03_notebooks/src/
import sys
from pathlib import Path

# Add src folder to Python path
sys.path.append(str(Path.cwd() / 'src'))

# Import custom modules
from file_handler import setup_paths, load_data_with_detection
from data_exporter import export_data_interactive, quick_export

print("‚úÖ All modules imported successfully!")

‚úÖ All modules imported successfully!


In [2]:
# Set up project paths interactively
project_root, input_path, output_path = setup_paths()

# Load data with automatic detection
df = load_data_with_detection(input_path)

üìç Current directory: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\03_notebooks
üìÅ Project root: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML

üì• SELECT INPUT FOLDER

üìã Available folders in project:
   1: 01_roject_management
   2: 02_data
   3: 03_notebooks
   4: 04_analysis
   5: 05_results



>>> Choose input folder number (1-5):  2



‚úÖ Selected: 02_data

----------------------------------------
üìÇ Subfolders in '02_data':
   0: Use '02_data' (parent folder)
   1: Merged_data
   2: Original_data
   3: Processed_data



>>> Choose subfolder (0-3) [Enter for 0]:  2



‚úÖ Input path set to: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\02_data\Original_data


üì§ SELECT OUTPUT FOLDER

üìã Available folders in project:
   1: 01_roject_management
   2: 02_data
   3: 03_notebooks
   4: 04_analysis
   5: 05_results

   üí° Press Enter to use input folder: 02_data\Original_data



>>> Choose output folder number (1-5) [Enter for input folder]:  2



‚úÖ Selected: 02_data

----------------------------------------
üìÇ Subfolders in '02_data':
   0: Use '02_data' (parent folder)
   1: Merged_data
   2: Original_data
   3: Processed_data



>>> Choose subfolder (0-3) [Enter for 0]:  3




‚úÖ PROJECT SETUP COMPLETE!

   üì• Input path:  C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\02_data\Original_data
   üì§ Output path: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\02_data\Processed_data


üìã Available data files:
   1: üìä Dataset-weather-prediction-dataset-processed.csv (CSV)



>>> Choose file number (1-1):  1



‚úÖ Selected file: Dataset-weather-prediction-dataset-processed.csv (CSV)

üîç Detecting encoding for Dataset-weather-prediction-dataset-processed.csv...
   Detected encoding: ascii (confidence: 100.0%)

üîç Analyzing potential delimiters:
   1: Delimiter ',' - Found 170 columns
   2: Delimiter ';' - Found 1 columns
   3: Delimiter TAB - Found 1 columns
   4: Delimiter '|' - Found 1 columns

üí° Suggested: Option 1 (',') with 170 columns



>>> Choose delimiter option (1-4) [Enter for suggested]:  1



‚úÖ Using delimiter: ','

‚úÖ Loaded data: 22950 rows √ó 170 columns


In [3]:
df.head()

Unnamed: 0,DATE,MONTH,BASEL_cloud_cover,BASEL_wind_speed,BASEL_humidity,BASEL_pressure,BASEL_global_radiation,BASEL_precipitation,BASEL_snow_depth,BASEL_sunshine,...,VALENTIA_cloud_cover,VALENTIA_humidity,VALENTIA_pressure,VALENTIA_global_radiation,VALENTIA_precipitation,VALENTIA_snow_depth,VALENTIA_sunshine,VALENTIA_temp_mean,VALENTIA_temp_min,VALENTIA_temp_max
0,19600101,1,7,2.1,0.85,1.018,0.32,0.09,0,0.7,...,5,0.88,1.0003,0.45,0.34,0,4.7,8.5,6.0,10.9
1,19600102,1,6,2.1,0.84,1.018,0.36,1.05,0,1.1,...,7,0.91,1.0007,0.25,0.84,0,0.7,8.9,5.6,12.1
2,19600103,1,8,2.1,0.9,1.018,0.18,0.3,0,0.0,...,7,0.91,1.0096,0.17,0.08,0,0.1,10.5,8.1,12.9
3,19600104,1,3,2.1,0.92,1.018,0.58,0.0,0,4.1,...,7,0.86,1.0184,0.13,0.98,0,0.0,7.4,7.3,10.6
4,19600105,1,6,2.1,0.95,1.018,0.65,0.14,0,5.4,...,3,0.8,1.0328,0.46,0.0,0,5.7,5.7,3.0,8.4


---
## 1. ...

---
## 2. ...

---
## 3. Export processed data

In [21]:
# Set up project paths interactively
project_root, input_path, output_path = setup_paths()

# Load data with automatic detection AND capture filename
df, original_filename = load_data_with_detection_enhanced(input_path)

# Now you can use the original filename in your export
print("üèÅ FINALIZING DATA EXPORT")
print("="*50)

# Summary of what you've accomplished
print(f"üìä Original dataset: {df.shape[0]:,} rows √ó {df.shape[1]} columns")
print(f"‚ö° Scaled dataset: {df_scaled.shape[0]:,} rows √ó {df_scaled.shape[1]} columns")

# Export the final processed data using the captured filename
print(f"\nüì§ Exporting final scaled dataset...")
exported_file = export_data_interactive(
    df=df_scaled,
    output_path=output_path,
    original_filename=original_filename  # Use the captured filename
)

print(f"\nüéâ Data processing and export completed!")
print(f"üìÅ Final file: {exported_file}")

üèÅ FINALIZING DATA EXPORT
üìä Original dataset: 22,950 rows √ó 171 columns
‚ö° Scaled dataset: 22,950 rows √ó 169 columns

üì§ Exporting final scaled dataset...

üì§ INTERACTIVE DATA EXPORT

üìä Data Preview:
   Shape: 22950 rows √ó 169 columns
   Memory usage: 29.59 MB

   First 3 rows:
      MONTH  BASEL_cloud_cover  BASEL_wind_speed  BASEL_humidity  \
0 -1.599964           0.660514          -0.02793        0.826097   
1 -1.599964           0.244897          -0.02793        0.735760   
2 -1.599964           1.076130          -0.02793        1.277781   

   BASEL_pressure  BASEL_global_radiation  BASEL_precipitation  \
0       -0.001949               -1.101066            -0.265148   
1       -0.001949               -1.058108             1.658760   
2       -0.001949               -1.251420             0.155707   

   BASEL_snow_depth  BASEL_sunshine  BASEL_temp_mean  ...  \
0         -0.179228       -0.902918        -0.528623  ...   
1         -0.179228       -0.810126        -0


>>> Choose export format (1-6) [Enter for CSV]:  1



‚úÖ Selected format: CSV - Comma Separated Values (most compatible)

----------------------------------------
üìù SELECT FILENAME
----------------------------------------

üí° Suggested filenames:
   1: Dataset-weather-prediction-dataset-processed_exported_20250526_0732.csv
   2: Dataset-weather-prediction-dataset-processed_processed_20250526_0732.csv
   3: Dataset-weather-prediction-dataset-processed_cleaned_20250526_0732.csv
   4: Dataset-weather-prediction-dataset-processed_scaled_20250526_0732.csv
   5: Dataset-weather-prediction-dataset-processed_final_20250526_0732.csv
   6: Enter custom filename



>>> Choose filename option (1-6) [Enter for 1]:  Dataset-weather-prediction-dataset-processed_scalled.csv


‚ùå Please enter a valid number



>>> Choose filename option (1-6) [Enter for 1]:  Dataset-weather-prediction-dataset_scalled


‚ùå Please enter a valid number



>>> Choose filename option (1-6) [Enter for 1]:  6

>>> Enter custom filename (without extension):  Dataset-weather-prediction-dataset_scalled



üì§ Exporting to: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\02_data\Processed_data\Dataset-weather-prediction-dataset_scalled.csv
üîÑ Exporting to CSV...

üìù Select CSV encoding:
   1: UTF-8 (recommended, universal)
   2: UTF-8 with BOM (Excel compatible)
   3: Latin-1 (Western European)



>>> Choose encoding (1-3) [Enter for UTF-8]:  1


‚úÖ CSV exported with utf-8 encoding

‚úÖ EXPORT SUCCESSFUL!
   üìÅ File: Dataset-weather-prediction-dataset_scalled.csv
   üìç Location: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\02_data\Processed_data
   üìä Size: 73.29 MB
   üìÖ Exported: 2025-05-26 07:34:05

üéâ Data processing and export completed!
üìÅ Final file: C:\Users\User\Dropbox\Personal\CareerFoundry\07 Machine Learning\ML\02_data\Processed_data\Dataset-weather-prediction-dataset_scalled.csv
