### Pipeline Orchestrator

This notebook finds available Finviz data, allows the user to select which date(s) to process, and then executes the main processing pipeline (`run_sequence.py`) for each selected date.

**Workflow:**
1.  **Setup:** Configure paths and define the default date selection rule.
2.  **Find Data Files:** Scan the `Downloads` directory for recent Finviz data files.
3.  **Select Dates:** Extract available dates and apply the default selection rule.
4.  **(Optional) Refine Selection:** Interactively prompt the user to override the default date selection.
5.  **Execute Pipeline:** For each selected date, generate a `config.py` file and run the external processing script.

### Setup and Configuration

In [26]:
import sys
from pathlib import Path
import pandas as pd

# --- Project Path Setup ---
NOTEBOOK_DIR = Path.cwd()
ROOT_DIR = NOTEBOOK_DIR.parent if NOTEBOOK_DIR.name == 'notebooks' else NOTEBOOK_DIR
SRC_DIR = ROOT_DIR / 'src'
if str(ROOT_DIR) not in sys.path: sys.path.append(str(ROOT_DIR))
if str(SRC_DIR) not in sys.path: sys.path.append(str(SRC_DIR))
import utils

# --- Data File Configuration ---
DOWNLOADS_DIR = Path.home() / "Downloads"
DATA_FILE_PREFIX = 'df_finviz'
DATA_FILE_EXTENSION = 'parquet'
DATA_FILES_TO_SCAN = 100

# --- Analysis Run Configuration ---
# Default rule for selecting which dates to process.
# slice(-1, None, None) -> Processes only the most recent date.
DATE_SLICE = slice(-1, None, None)

# --- config.py Generation Parameters ---
DEST_DIR = ROOT_DIR / 'data'
ANNUAL_RISK_FREE_RATE = 0.04
TRADING_DAYS_PER_YEAR = 252

# --- Notebook Setup ---
pd.set_option('display.max_columns', None); pd.set_option('display.width', 1000)
%load_ext autoreload
%autoreload 2

# --- Verification ---
print(f"Project Root Directory: {ROOT_DIR}")
print(f"Scanning for data files in: {DOWNLOADS_DIR}")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Project Root Directory: c:\Users\ping\Files_win10\python\py311\stocks
Scanning for data files in: C:\Users\ping\Downloads


### Step 1: Find and Display Available Data

In [27]:
print("--- Step 1: Finding recent data files ---")

# We assume get_recent_files_in_directory is in utils and now accepts a Path object.
# If it's not yet refactored, the old call would work too.
found_files = utils.get_recent_files_in_directory(
    directory_path=DOWNLOADS_DIR,
    prefix=DATA_FILE_PREFIX,
    extension=DATA_FILE_EXTENSION,
    count=DATA_FILES_TO_SCAN
)

if not found_files:
    print(f"No files matching '{DATA_FILE_PREFIX}*.{DATA_FILE_EXTENSION}' found.")
    available_dates = []
else:
    # Extract dates and display them neatly
    available_dates = utils.extract_and_sort_dates_from_filenames(found_files)
    print(f"\nFound {len(available_dates)} unique dates available for processing:")
    utils.print_list_in_columns(available_dates, num_columns=5)

--- Step 1: Finding recent data files ---

Found 40 unique dates available for processing:
  0    2025-04-25    1    2025-04-28    2    2025-04-29    3    2025-04-30    4    2025-05-01
  5    2025-05-02    6    2025-05-05    7    2025-05-06    8    2025-05-07    9    2025-05-08
  10   2025-05-09    11   2025-05-12    12   2025-05-13    13   2025-05-14    14   2025-05-15
  15   2025-05-16    16   2025-05-19    17   2025-05-20    18   2025-05-21    19   2025-05-22
  20   2025-05-23    21   2025-05-27    22   2025-05-28    23   2025-05-29    24   2025-05-30
  25   2025-06-02    26   2025-06-03    27   2025-06-04    28   2025-06-05    29   2025-06-06
  30   2025-06-09    31   2025-06-10    32   2025-06-11    33   2025-06-12    34   2025-06-13
  35   2025-06-16    36   2025-06-17    37   2025-06-18    38   2025-06-19    39   2025-06-20


### Step 2: Select Dates for Processing (Default)

In [28]:
if available_dates:
    # Apply the default slice defined in the setup cell
    dates_to_process = available_dates[DATE_SLICE]
    print(f"\n--- Step 2: Applying default selection rule ---")
    print(f"Default rule '{DATE_SLICE}' selected {len(dates_to_process)} date(s):")
    print(dates_to_process)
else:
    dates_to_process = []


--- Step 2: Applying default selection rule ---
Default rule 'slice(-1, None, None)' selected 1 date(s):
['2025-06-20']


### Step 3: (OPTIONAL) Interactively Refine Selection

In [29]:
if available_dates:
    # Call the interactive utility function
    NEW_DATE_SLICE = utils.prompt_for_slice_update("DATE_SLICE", DATE_SLICE)
    
    # If the slice was changed, update the list of dates to process
    if NEW_DATE_SLICE != DATE_SLICE:
        DATE_SLICE = NEW_DATE_SLICE
        dates_to_process = available_dates[DATE_SLICE]
        print(f"\nUpdated selection. Now processing {len(dates_to_process)} date(s):")
        print(dates_to_process)

   DATE_SLICE updated.

Updated selection. Now processing 4 date(s):
['2025-06-17', '2025-06-18', '2025-06-19', '2025-06-20']


### Step 4: Execute Pipeline

This cell iterates through the final list of selected dates, generates the `config.py` file for each, and executes the `run_sequence.py` script.

In [30]:
print("\n--- Step 4: Starting processing sequence ---")

if not dates_to_process:
    print("No dates to process. Halting execution.")
else:
    for date_str in dates_to_process:
        print(f"\n{'='*20} PROCESSING DATE: {date_str} {'='*20}")
        
        # 1. Create the config.py file for the current date
        utils.create_pipeline_config_file(
            config_path=ROOT_DIR / 'config.py',
            date_str=date_str,
            downloads_dir=DOWNLOADS_DIR,
            dest_dir=DEST_DIR,
            annual_risk_free_rate=ANNUAL_RISK_FREE_RATE,
            trading_days_per_year=TRADING_DAYS_PER_YEAR
        )

        # 2. Run the external processing script
        print(f"Executing run_sequence.py for {date_str}...")
        %run -i {ROOT_DIR / 'run_sequence.py'}
        print(f"--- Finished processing for {date_str} ---")

    print(f"\n{'='*25} ALL PROCESSING COMPLETE {'='*25}")


--- Step 4: Starting processing sequence ---

Successfully created config file: c:\Users\ping\Files_win10\python\py311\stocks\config.py
Executing run_sequence.py for 2025-06-17...
Starting notebook execution sequence...

--- Running py0_get_yloader_OHLCV_data_v1.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb py0_get_yloader_OHLCV_data_v1.ipynb
Successfully executed py0_get_yloader_OHLCV_data_v1.ipynb
Output saved to: executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb

--- Running py1_clean_df_finviz_v14.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py1_clean_df_finviz_v14.ipynb py1_clean_df_finviz_v14.ipynb
Successfully executed py1_clean_df_finviz_v14.ipynb
Output saved to: executed\executed_py1_clean_df_finviz_v14.ipynb

--- Running py2_clea

SystemExit: 1

--- Finished processing for 2025-06-17 ---

Successfully created config file: c:\Users\ping\Files_win10\python\py311\stocks\config.py
Executing run_sequence.py for 2025-06-18...
Starting notebook execution sequence...

--- Running py0_get_yloader_OHLCV_data_v1.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb py0_get_yloader_OHLCV_data_v1.ipynb
Successfully executed py0_get_yloader_OHLCV_data_v1.ipynb
Output saved to: executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb

--- Running py1_clean_df_finviz_v14.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py1_clean_df_finviz_v14.ipynb py1_clean_df_finviz_v14.ipynb
Successfully executed py1_clean_df_finviz_v14.ipynb
Output saved to: executed\executed_py1_clean_df_finviz_v14.ipynb

--- Running py2_clean_d

SystemExit: 1

--- Finished processing for 2025-06-18 ---

Successfully created config file: c:\Users\ping\Files_win10\python\py311\stocks\config.py
Executing run_sequence.py for 2025-06-19...
Starting notebook execution sequence...

--- Running py0_get_yloader_OHLCV_data_v1.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb py0_get_yloader_OHLCV_data_v1.ipynb
Successfully executed py0_get_yloader_OHLCV_data_v1.ipynb
Output saved to: executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb

--- Running py1_clean_df_finviz_v14.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py1_clean_df_finviz_v14.ipynb py1_clean_df_finviz_v14.ipynb
Successfully executed py1_clean_df_finviz_v14.ipynb
Output saved to: executed\executed_py1_clean_df_finviz_v14.ipynb

--- Running py2_clean_d

SystemExit: 1

--- Finished processing for 2025-06-19 ---

Successfully created config file: c:\Users\ping\Files_win10\python\py311\stocks\config.py
Executing run_sequence.py for 2025-06-20...
Starting notebook execution sequence...

--- Running py0_get_yloader_OHLCV_data_v1.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb py0_get_yloader_OHLCV_data_v1.ipynb
Successfully executed py0_get_yloader_OHLCV_data_v1.ipynb
Output saved to: executed\executed_py0_get_yloader_OHLCV_data_v1.ipynb

--- Running py1_clean_df_finviz_v14.ipynb ---

Running command: c:\Users\ping\Files_win10\python\py311\.venv\Scripts\jupyter nbconvert --to notebook --execute --output executed\executed_py1_clean_df_finviz_v14.ipynb py1_clean_df_finviz_v14.ipynb
Successfully executed py1_clean_df_finviz_v14.ipynb
Output saved to: executed\executed_py1_clean_df_finviz_v14.ipynb

--- Running py2_clean_d

SystemExit: 1

--- Finished processing for 2025-06-20 ---

