### Corrected and Improved Code (Cell 1: Setup)

This first cell now not only sets up the path for your `src` modules but also defines key directory variables (`DATA_DIR`, `ROOT_DIR`) that you can use throughout your notebook.


In [22]:
import sys
from pathlib import Path
import pandas as pd
import os
from IPython.display import display, Markdown  # Assuming you use these for display

# --- 1. PANDAS OPTIONS (No change) ---
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1500)

# --- 2. IPYTHON AUTORELOAD (No change) ---
%load_ext autoreload
%autoreload 2

# --- 3. ROBUST PATH CONFIGURATION (MODIFIED) ---

# Get the current working directory of the notebook
NOTEBOOK_DIR = Path.cwd()

# Find the project ROOT directory by going up from the notebook's location
# This is robust and works even if you move the notebook deeper.
ROOT_DIR = NOTEBOOK_DIR.parent.parent

# Define key project directories relative to the ROOT
DATA_DIR = ROOT_DIR / 'output' / 'selection_results'
SRC_DIR = ROOT_DIR / 'src'
# You could also define an output directory here if needed
OUTPUT_DIR = ROOT_DIR / 'output' 

# Add the 'src' directory to the Python path so you can import 'utils'
if str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))

# --- 4. VERIFICATION (IMPROVED) ---
print(f"✅ Project Root Directory: {ROOT_DIR}")
print(f"✅ Source Directory (for utils): {SRC_DIR}")
print(f"✅ Data Directory (for input): {DATA_DIR}")

# Verify that the key directories exist. This helps catch path errors early.
assert ROOT_DIR.exists(), f"ROOT directory not found at: {ROOT_DIR}"
assert SRC_DIR.exists(), f"Source directory not found at: {SRC_DIR}"
assert DATA_DIR.exists(), f"Data directory not found at: {DATA_DIR}"

# --- 5. IMPORT YOUR CUSTOM MODULE ---
# This will now work correctly
import utils
print("\n✅ Successfully imported 'utils' module.")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
✅ Project Root Directory: c:\Users\ping\Files_win10\python\py311\stocks
✅ Source Directory (for utils): c:\Users\ping\Files_win10\python\py311\stocks\src
✅ Data Directory (for input): c:\Users\ping\Files_win10\python\py311\stocks\output\selection_results

✅ Successfully imported 'utils' module.


### Corrected Code (Cell 2: Execution)

Now, in your second cell, you simply use the `DATA_DIR` variable we defined in the setup cell. This removes the fragile relative path `..\data`.

In [8]:
# Use the DATA_DIR variable defined in the setup cell
selected_file_path, _, file_list = utils.main_processor(
    data_dir=DATA_DIR,  # <-- Use the absolute path variable
    downloads_dir=None,
    downloads_limit=60,
    clean_name_override=None,
    # start_file_pattern='2025-04-25',
    start_file_pattern='2025-06-13',    
    contains_pattern=''
)

print(f'selected_file_path: {selected_file_path}')
# print(f'output_path: {output_path}')
print(f'file_list: {file_list}\n')

df = pd.read_parquet(selected_file_path)
print(f'df:\n{df.head()}\n')

# # The .info() method prints to stdout directly, so no need for an f-string
# print('df_OHLCV.info():')
# df_OHLCV.info()

<span style='color:#00ffff;font-weight:500'>[Downloads] Scanned latest 60 files • Found 0 '2025-06-13' matches</span>

**Available 'starting with '2025-06-13'' files:**

- (1) `[SELECTION_RESULTS]` `2025-06-13_my_selection_run_1.csv` <span style='color:#00ffff'>(0.00 MB, 2025-06-13 20:10)</span>

- (2) `[SELECTION_RESULTS]` `2025-06-13_my_selection_run_1.parquet` <span style='color:#00ffff'>(0.01 MB, 2025-06-13 20:10)</span>

- (3) `[SELECTION_RESULTS]` `2025-06-13_my_selection_run_1_params.json` <span style='color:#00ffff'>(0.00 MB, 2025-06-13 20:11)</span>

- (4) `[SELECTION_RESULTS]` `2025-06-13_short_term_mean_reversion.csv` <span style='color:#00ffff'>(0.00 MB, 2025-06-15 17:22)</span>

- (5) `[SELECTION_RESULTS]` `2025-06-13_short_term_mean_reversion.parquet` <span style='color:#00ffff'>(0.01 MB, 2025-06-15 17:22)</span>

- (6) `[SELECTION_RESULTS]` `2025-06-13_short_term_mean_reversion_params.json` <span style='color:#00ffff'>(0.00 MB, 2025-06-15 17:22)</span>


Input a number to select file (1-6)



    **Selected paths:**
    - Source: `c:\Users\ping\Files_win10\python\py311\stocks\output\selection_results\2025-06-13_my_selection_run_1.parquet`
    - Destination: `c:\Users\ping\Files_win10\python\py311\stocks\output\selection_results\2025-06-13_my_selection_run_1_clean.parquet`
    

selected_file_path: c:\Users\ping\Files_win10\python\py311\stocks\output\selection_results\2025-06-13_my_selection_run_1.parquet
file_list: ['2025-06-13_my_selection_run_1.csv', '2025-06-13_my_selection_run_1.parquet', '2025-06-13_my_selection_run_1_params.json', '2025-06-13_short_term_mean_reversion.csv', '2025-06-13_short_term_mean_reversion.parquet', '2025-06-13_short_term_mean_reversion_params.json']

df:
        ROE %  Rel Volume  ATR/Price %  Change %  Debt/Eq   Price    RSI  Avg Volume, M     z_RSI  z_Change%  z_RelVolume  z_ATR/Price%  final_score  Weight_EW  Weight_IV  Weight_SW
Ticker                                                                                                                                                                               
BEKE     6.53        5.08     3.140227     -2.74     0.32   18.47  42.52           9.33 -0.965888  -0.695387     8.271048      0.351228     2.200533        0.1   0.093222   0.142504
ADBE    52.25        2.79     2.348856   

In [10]:
file_list

['2025-06-13_my_selection_run_1.csv',
 '2025-06-13_my_selection_run_1.parquet',
 '2025-06-13_my_selection_run_1_params.json',
 '2025-06-13_short_term_mean_reversion.csv',
 '2025-06-13_short_term_mean_reversion.parquet',
 '2025-06-13_short_term_mean_reversion_params.json']

In [14]:
df1 = pd.read_parquet(DATA_DIR / file_list[1])
print(f'df1:\n{df1.head()}\n')

df1:
        ROE %  Rel Volume  ATR/Price %  Change %  Debt/Eq   Price    RSI  Avg Volume, M     z_RSI  z_Change%  z_RelVolume  z_ATR/Price%  final_score  Weight_EW  Weight_IV  Weight_SW
Ticker                                                                                                                                                                               
BEKE     6.53        5.08     3.140227     -2.74     0.32   18.47  42.52           9.33 -0.965888  -0.695387     8.271048      0.351228     2.200533        0.1   0.093222   0.142504
ADBE    52.25        2.79     2.348856     -5.32     0.57  391.68  39.31           3.80 -1.252564  -1.908837     3.680317     -0.422064     1.884760        0.1   0.124630   0.122055
BF-B    23.14        1.96     4.160363     -3.11     0.68   26.44  21.13           3.57 -2.876169  -0.869409     2.016428      1.348059     1.579432        0.1   0.070364   0.102282
ONON    15.84        1.75     3.731343     -6.23     0.27   52.26  37.38           5.

In [15]:
df2 = pd.read_parquet(DATA_DIR / file_list[4])
print(f'df2:\n{df2.head()}\n')

df2:
        Avg Volume, M  Debt/Eq  ROE %   Price  Rel Volume    RSI  Change %  ATR/Price %     z_RSI  z_Change%  z_RelVolume  z_ATR/Price%  final_score  Weight_EW  Weight_IV  Weight_SW
Ticker                                                                                                                                                                               
BEKE             9.33     0.32   6.53   18.47        5.08  42.52     -2.74     3.140227 -0.965888  -0.695387     8.271048      0.351228     2.200533        0.1   0.093222   0.142504
ADBE             3.80     0.57  52.25  391.68        2.79  39.31     -5.32     2.348856 -1.252564  -1.908837     3.680317     -0.422064     1.884760        0.1   0.124630   0.122055
BF-B             3.57     0.68  23.14   26.44        1.96  21.13     -3.11     4.160363 -2.876169  -0.869409     2.016428      1.348059     1.579432        0.1   0.070364   0.102282
ONON             5.33     0.27  15.84   52.26        1.75  37.38     -6.23     3.7313

In [16]:
# # Create a df2 with an extra column 'location'
# data3 = {'age': [45, 50],
#          'id': [4, 5],
#          'location': ['NY', 'CA'],
#          'name': ['David', 'Eve']}
# df2 = pd.DataFrame(data3)

print("\nOriginal df2 with an extra column:")
print(df2)
# Original df2 with an extra column:
#    age  id location   name
# 0   45   4       NY  David
# 1   50   5       CA    Eve

# Robust solution to reorder common columns and keep extra ones
# Get columns from df1 that are present in df2
ordered_common_cols = [col for col in df1.columns if col in df2.columns]

# Get columns that are only in df2
extra_cols = [col for col in df2.columns if col not in df1.columns]

# Combine the lists: ordered common columns first, then the extras
new_order = ordered_common_cols + extra_cols

# Apply the new, robust order
df2_reordered = df2[new_order]

print("\nRobustly reordered df2:")
print(df2_reordered)
# Robustly reordered df2:
#    id   name  age location
# 0   4  David   45       NY
# 1   5    Eve   50       CA


Original df2 with an extra column:
        Avg Volume, M  Debt/Eq  ROE %   Price  Rel Volume    RSI  Change %  ATR/Price %     z_RSI  z_Change%  z_RelVolume  z_ATR/Price%  final_score  Weight_EW  Weight_IV  Weight_SW
Ticker                                                                                                                                                                               
BEKE             9.33     0.32   6.53   18.47        5.08  42.52     -2.74     3.140227 -0.965888  -0.695387     8.271048      0.351228     2.200533        0.1   0.093222   0.142504
ADBE             3.80     0.57  52.25  391.68        2.79  39.31     -5.32     2.348856 -1.252564  -1.908837     3.680317     -0.422064     1.884760        0.1   0.124630   0.122055
BF-B             3.57     0.68  23.14   26.44        1.96  21.13     -3.11     4.160363 -2.876169  -0.869409     2.016428      1.348059     1.579432        0.1   0.070364   0.102282
ONON             5.33     0.27  15.84   52.26        1

In [18]:
print(f'df1:\n{df1}\n')

df1:
        ROE %  Rel Volume  ATR/Price %  Change %  Debt/Eq   Price    RSI  Avg Volume, M     z_RSI  z_Change%  z_RelVolume  z_ATR/Price%  final_score  Weight_EW  Weight_IV  Weight_SW
Ticker                                                                                                                                                                               
BEKE     6.53        5.08     3.140227     -2.74     0.32   18.47  42.52           9.33 -0.965888  -0.695387     8.271048      0.351228     2.200533        0.1   0.093222   0.142504
ADBE    52.25        2.79     2.348856     -5.32     0.57  391.68  39.31           3.80 -1.252564  -1.908837     3.680317     -0.422064     1.884760        0.1   0.124630   0.122055
BF-B    23.14        1.96     4.160363     -3.11     0.68   26.44  21.13           3.57 -2.876169  -0.869409     2.016428      1.348059     1.579432        0.1   0.070364   0.102282
ONON    15.84        1.75     3.731343     -6.23     0.27   52.26  37.38           5.

In [12]:
for file in file_list:
    df = pd.read_parquet(DATA_DIR / file)
    print(f'{file} colums:\n{df.columns.tolist()}')
    print('==========================')


2025-04-28_df_finviz_merged_stocks_etfs.parquet colums:
['No.', 'Company', 'Index', 'Sector', 'Industry', 'Country', 'Exchange', 'Info', 'MktCap AUM, M', 'Rank', 'Market Cap, M', 'P/E', 'Fwd P/E', 'PEG', 'P/S', 'P/B', 'P/C', 'P/FCF', 'Book/sh', 'Cash/sh', 'Dividend %', 'Dividend TTM', 'Dividend Ex Date', 'Payout Ratio %', 'EPS', 'EPS next Q', 'EPS this Y %', 'EPS next Y %', 'EPS past 5Y %', 'EPS next 5Y %', 'Sales past 5Y %', 'Sales Q/Q %', 'EPS Q/Q %', 'EPS YoY TTM %', 'Sales YoY TTM %', 'Sales, M', 'Income, M', 'EPS Surprise %', 'Revenue Surprise %', 'Outstanding, M', 'Float, M', 'Float %', 'Insider Own %', 'Insider Trans %', 'Inst Own %', 'Inst Trans %', 'Short Float %', 'Short Ratio', 'Short Interest, M', 'ROA %', 'ROE %', 'ROI %', 'Curr R', 'Quick R', 'LTDebt/Eq', 'Debt/Eq', 'Gross M %', 'Oper M %', 'Profit M %', 'Perf 3D %', 'Perf Week %', 'Perf Month %', 'Perf Quart %', 'Perf Half %', 'Perf Year %', 'Perf YTD %', 'Beta', 'ATR', 'ATR/Price %', 'Volatility W %', 'Volatility M %', 

'ROI %' change to 'ROIC %'

In [26]:
# Get a list of all .parquet files in the directory
# Using .glob('*.parquet') is a safe way to get only the files you want
# Gets all files ending with .parquet that also contain 'df_finviz'
file_list = [f.name for f in DATA_DIR.glob('*.csv')]
file_list

['2025-04-25_my_selection_run_1.csv',
 '2025-04-28_my_selection_run_1.csv',
 '2025-04-29_my_selection_run_1.csv',
 '2025-04-30_my_selection_run_1.csv',
 '2025-05-01_my_selection_run_1.csv',
 '2025-05-02_my_selection_run_1.csv',
 '2025-05-05_my_selection_run_1.csv',
 '2025-05-06_my_selection_run_1.csv',
 '2025-05-07_my_selection_run_1.csv',
 '2025-05-08_my_selection_run_1.csv',
 '2025-05-09_my_selection_run_1.csv',
 '2025-05-12_my_selection_run_1.csv',
 '2025-05-13_my_selection_run_1.csv',
 '2025-05-14_my_selection_run_1.csv',
 '2025-05-15_my_selection_run_1.csv',
 '2025-05-16_my_selection_run_1.csv',
 '2025-05-19_my_selection_run_1.csv',
 '2025-05-20_my_selection_run_1.csv',
 '2025-05-21_my_selection_run_1.csv',
 '2025-05-22_my_selection_run_1.csv',
 '2025-05-23_my_selection_run_1.csv',
 '2025-05-27_my_selection_run_1.csv',
 '2025-05-28_my_selection_run_1.csv',
 '2025-05-29_my_selection_run_1.csv',
 '2025-05-30_my_selection_run_1.csv',
 '2025-06-02_my_selection_run_1.csv',
 '2025-06-03

In [29]:
import pathlib
import os

# # --- Setup: Create a dummy directory and files for a runnable example ---
# # (You can replace this with your actual DATA_DIR)
# DATA_DIR = pathlib.Path('./my_data_folder')
# DATA_DIR.mkdir(exist_ok=True)
# print(f'DATA_DIR: {DATA_DIR}') 

# # Create some example files to rename
# dummy_files = [
#     '2025-06-13_my_selection_run_1.parquet',
#     '2025-06-14_my_selection_run_1.parquet',
#     '2025-06-15_some_other_file.parquet', # This file should be ignored
# ]
# for fname in dummy_files:
#     (DATA_DIR / fname).touch()

# --- Core Renaming Logic ---

# Define the strings for replacement
string_to_find = 'my_selection_run_1'
string_to_replace = 'short_term_mean_reversion'

print(f"Scanning for files in: {DATA_DIR.resolve()}")
print("-" * 20)

# It's better to iterate directly over the Path objects from glob()
for old_file_path in DATA_DIR.glob('*.json'):
    # Check if the string to find is actually in the filename
    if string_to_find in old_file_path.name:
        
        # Create the new filename using string's .replace() method
        new_filename = old_file_path.name.replace(string_to_find, string_to_replace)
        
        # Create the full path for the new file
        # (It's in the same directory as the old file)
        new_file_path = old_file_path.parent / new_filename
        
        # Rename the file on the file system
        old_file_path.rename(new_file_path)
        
        print(f"Renamed: '{old_file_path.name}' -> '{new_file_path.name}'")
    else:
        print(f"Skipped: '{old_file_path.name}' (did not contain '{string_to_find}')")

print("-" * 20)
print("Renaming complete. Current files:")
for f in DATA_DIR.glob('*.parquet'):
    print(f"- {f.name}")

Scanning for files in: C:\Users\ping\Files_win10\python\py311\stocks\output\selection_results
--------------------
Renamed: '2025-04-25_my_selection_run_1_params.json' -> '2025-04-25_short_term_mean_reversion_params.json'
Renamed: '2025-04-28_my_selection_run_1_params.json' -> '2025-04-28_short_term_mean_reversion_params.json'
Renamed: '2025-04-29_my_selection_run_1_params.json' -> '2025-04-29_short_term_mean_reversion_params.json'
Renamed: '2025-04-30_my_selection_run_1_params.json' -> '2025-04-30_short_term_mean_reversion_params.json'
Renamed: '2025-05-01_my_selection_run_1_params.json' -> '2025-05-01_short_term_mean_reversion_params.json'
Renamed: '2025-05-02_my_selection_run_1_params.json' -> '2025-05-02_short_term_mean_reversion_params.json'
Renamed: '2025-05-05_my_selection_run_1_params.json' -> '2025-05-05_short_term_mean_reversion_params.json'
Renamed: '2025-05-06_my_selection_run_1_params.json' -> '2025-05-06_short_term_mean_reversion_params.json'
Renamed: '2025-05-07_my_selec

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'c:\\Users\\ping\\Files_win10\\python\\py311\\stocks\\output\\selection_results\\2025-06-13_my_selection_run_1_params.json' -> 'c:\\Users\\ping\\Files_win10\\python\\py311\\stocks\\output\\selection_results\\2025-06-13_short_term_mean_reversion_params.json'