### Corrected and Improved Code (Cell 1: Setup)

This first cell now not only sets up the path for your `src` modules but also defines key directory variables (`DATA_DIR`, `ROOT_DIR`) that you can use throughout your notebook.


In [3]:
import sys
from pathlib import Path
import pandas as pd
import os
from IPython.display import display, Markdown  # Assuming you use these for display

# --- 1. PANDAS OPTIONS (No change) ---
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1500)

# --- 2. IPYTHON AUTORELOAD (No change) ---
%load_ext autoreload
%autoreload 2

# --- 3. ROBUST PATH CONFIGURATION (MODIFIED) ---

# Get the current working directory of the notebook
NOTEBOOK_DIR = Path.cwd()

# Find the project ROOT directory by going up from the notebook's location
# This is robust and works even if you move the notebook deeper.
ROOT_DIR = NOTEBOOK_DIR.parent.parent

# Define key project directories relative to the ROOT
DATA_DIR = ROOT_DIR / 'data'
SRC_DIR = ROOT_DIR / 'src'
# You could also define an output directory here if needed
OUTPUT_DIR = ROOT_DIR / 'output'

# Add the 'src' directory to the Python path so you can import 'utils'
if str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))

# --- 4. VERIFICATION (IMPROVED) ---
print(f"✅ Project Root Directory: {ROOT_DIR}")
print(f"✅ Source Directory (for utils): {SRC_DIR}")
print(f"✅ Data Directory (for input): {DATA_DIR}")

# Verify that the key directories exist. This helps catch path errors early.
assert ROOT_DIR.exists(), f"ROOT directory not found at: {ROOT_DIR}"
assert SRC_DIR.exists(), f"Source directory not found at: {SRC_DIR}"
assert DATA_DIR.exists(), f"Data directory not found at: {DATA_DIR}"

# --- 5. IMPORT YOUR CUSTOM MODULE ---
# This will now work correctly
import utils
print("\n✅ Successfully imported 'utils' module.")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
✅ Project Root Directory: c:\Users\ping\Files_win10\python\py311\stocks
✅ Source Directory (for utils): c:\Users\ping\Files_win10\python\py311\stocks\src
✅ Data Directory (for input): c:\Users\ping\Files_win10\python\py311\stocks\data

✅ Successfully imported 'utils' module.


### Corrected Code (Cell 2: Execution)

Now, in your second cell, you simply use the `DATA_DIR` variable we defined in the setup cell. This removes the fragile relative path `..\data`.

In [11]:
# Use the DATA_DIR variable defined in the setup cell
path_OHLCV, _, dir_list = utils.main_processor(
    data_dir=DATA_DIR,  # <-- Use the absolute path variable
    downloads_dir=None,
    downloads_limit=60,
    clean_name_override=None,
    # start_file_pattern='2025-04-25',
    start_file_pattern='2025-04-28',    
    contains_pattern=''
)

print(f'path_OHLCV: {path_OHLCV}')
df_OHLCV = pd.read_parquet(path_OHLCV)
print(f'df_OHLCV:\n{df_OHLCV.head()}\n')

# The .info() method prints to stdout directly, so no need for an f-string
print('df_OHLCV.info():')
df_OHLCV.info()

<span style='color:#00ffff;font-weight:500'>[Downloads] Scanned latest 60 files • Found 0 '2025-04-28' matches</span>

**Available 'starting with '2025-04-28'' files:**

- (1) `[DATA]` `2025-04-28_df_common_tickers_stocks_etfs.parquet` <span style='color:#00ffff'>(0.01 MB, 2025-05-14 11:35)</span>

- (2) `[DATA]` `2025-04-28_df_finviz_merged_stocks_etfs.parquet` <span style='color:#00ffff'>(0.85 MB, 2025-05-14 11:35)</span>

- (3) `[DATA]` `2025-04-28_df_finviz_n_ratios_stocks_etfs.parquet` <span style='color:#00ffff'>(0.83 MB, 2025-05-14 11:35)</span>

- (4) `[DATA]` `2025-04-28_df_finviz_stocks_etfs.parquet` <span style='color:#00ffff'>(0.53 MB, 2025-05-14 11:32)</span>

- (5) `[DATA]` `2025-04-28_df_perf_ratios_stocks_etfs.parquet` <span style='color:#00ffff'>(0.32 MB, 2025-05-14 11:34)</span>


Input a number to select file (1-5)



    **Selected paths:**
    - Source: `c:\Users\ping\Files_win10\python\py311\stocks\data\2025-04-28_df_common_tickers_stocks_etfs.parquet`
    - Destination: `c:\Users\ping\Files_win10\python\py311\stocks\data\2025-04-28_df_common_tickers_stocks_etfs_clean.parquet`
    

path_OHLCV: c:\Users\ping\Files_win10\python\py311\stocks\data\2025-04-28_df_common_tickers_stocks_etfs.parquet
df_OHLCV:
Empty DataFrame
Columns: []
Index: [AAPL, MSFT, NVDA, AMZN, GOOGL]

df_OHLCV.info():
<class 'pandas.core.frame.DataFrame'>
Index: 1522 entries, AAPL to FELG
Empty DataFrame


In [12]:
dir_list

['2025-04-28_df_common_tickers_stocks_etfs.parquet',
 '2025-04-28_df_finviz_merged_stocks_etfs.parquet',
 '2025-04-28_df_finviz_n_ratios_stocks_etfs.parquet',
 '2025-04-28_df_finviz_stocks_etfs.parquet',
 '2025-04-28_df_perf_ratios_stocks_etfs.parquet']

In [13]:
for file in dir_list:
    df = pd.read_parquet(DATA_DIR / file)
    print(f'{file} colums:\n{df.columns.tolist()}')
    print('==========================')


2025-04-28_df_common_tickers_stocks_etfs.parquet colums:
[]
2025-04-28_df_finviz_merged_stocks_etfs.parquet colums:
['No.', 'Company', 'Index', 'Sector', 'Industry', 'Country', 'Exchange', 'Info', 'MktCap AUM, M', 'Rank', 'Market Cap, M', 'P/E', 'Fwd P/E', 'PEG', 'P/S', 'P/B', 'P/C', 'P/FCF', 'Book/sh', 'Cash/sh', 'Dividend %', 'Dividend TTM', 'Dividend Ex Date', 'Payout Ratio %', 'EPS', 'EPS next Q', 'EPS this Y %', 'EPS next Y %', 'EPS past 5Y %', 'EPS next 5Y %', 'Sales past 5Y %', 'Sales Q/Q %', 'EPS Q/Q %', 'EPS YoY TTM %', 'Sales YoY TTM %', 'Sales, M', 'Income, M', 'EPS Surprise %', 'Revenue Surprise %', 'Outstanding, M', 'Float, M', 'Float %', 'Insider Own %', 'Insider Trans %', 'Inst Own %', 'Inst Trans %', 'Short Float %', 'Short Ratio', 'Short Interest, M', 'ROA %', 'ROE %', 'ROI %', 'Curr R', 'Quick R', 'LTDebt/Eq', 'Debt/Eq', 'Gross M %', 'Oper M %', 'Profit M %', 'Perf 3D %', 'Perf Week %', 'Perf Month %', 'Perf Quart %', 'Perf Half %', 'Perf Year %', 'Perf YTD %', 'Beta'

'ROI %' change to 'ROIC %'

In [15]:
df = pd.read_parquet(DATA_DIR / dir_list[0])
df

AAPL
MSFT
NVDA
AMZN
GOOGL
...
JAVA
AIRR
TFI
PAAA
FELG
