# 1.Project Title: Market Fear Regime Identification



#### Purpose: Data mining course final project
#### Description:
This is a project that aims to identify periods of market fear using various financial indicators and machine learning techniques. The project involves collecting historical financial data, preprocessing it, and applying clustering algorithms to classify different market regimes based on fear levels.

#### Tools and Technologies:
- Programming Language: Python
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
- Data Sources: Yahoo Finance, Kaggle datasets, Coingecko API
- Environment: Jupyter Notebook
- Visualization: Matplotlib, Seaborn
- Machine Learning Algorithms: K-Means Clustering

#### Timeline and Steps Involved:
- Week 1:
1. Data Collection: Gather historical financial data including coins prices
2. Data Preprocessing: Clean and preprocess the data to handle missing values and normalize features.
- Week 2:
3. Feature Selection: Identify relevant features that indicate market fear, and something TODO.
4. Clustering: Apply K-Means clustering to classify market regimes based on selected features.
- week 3:   
5. Visualization: Visualize the clustering results to interpret different market fear regimes.
6. Analysis: Analyze the identified regimes and their characteristics.
7. Documentation: Document the entire process and draft a report summarizing findings.
8. Presentation: Prepare a presentation to showcase the project results on **8th June 2026**.
9. Submission: Submit the final report and code repository by **6th June 2026**.


  
  

In [18]:
import yfinance as yf
import pandas as pd
import requests
import os
from datetime import datetime

## Data Selection: Coins selected from API

In [19]:
# becareful, here, yfinance need add -USD at the end of coin tickers
core_coins = [
    'BTC-USD', 'ETH-USD', 'BNB-USD', 'SOL-USD', 'XRP-USD', 
    'ADA-USD', 'DOGE-USD', 'DOT-USD', 'LTC-USD', 'TRX-USD',
    'AVAX-USD', 'MATIC-USD', 'LINK-USD', 'ATOM-USD', 'UNI-USD',
    'AAVE-USD', 'XLM-USD', 'ALGO-USD', 'FIL-USD', 'VET-USD' 
]

print(f"total {len(core_coins)}")

total 20


In [20]:
# Time period for data retrieval
start_date = "2018-01-01"
end_date = datetime.now().strftime('%Y-%m-%d')

In [21]:
# data directories

RAW_DATA_PATH = "../data/row"
CLEAN_DATA_PATH = "../data/processed"

# print("cwd:", os.getcwd())
# print("RAW_DATA_PATH (abs):", os.path.abspath(RAW_DATA_PATH))
# print("RAW_DATA_PATH exists:", os.path.exists(RAW_DATA_PATH))
print("files in RAW_DATA_PATH:", os.listdir(RAW_DATA_PATH))



files in RAW_DATA_PATH: []


## Fetching Raw Data and primary cleaning

In [22]:

print("down load...")
data = yf.download(core_coins, start=start_date, end=end_date, progress=False)

# we only need the closing prices
if 'Adj Close' in data.columns.get_level_values(0):
    prices = data['Adj Close']
else:
    prices = data['Close']

# remove -USD suffix, so that column names are cleaner
prices.columns = [c.replace('-USD', '') for c in prices.columns]

# fill missing values with forward fill
prices = prices.ffill()

print("finished！total days:", len(prices))
print(prices.head())

down load...


  data = yf.download(core_coins, start=start_date, end=end_date, progress=False)


finished！total days: 2926
            AAVE       ADA  ALGO  ATOM  AVAX       BNB           BTC  \
Date                                                                   
2018-01-01   NaN  0.728657   NaN   NaN   NaN   8.41461  13657.200195   
2018-01-02   NaN  0.782587   NaN   NaN   NaN   8.83777  14982.099609   
2018-01-03   NaN  1.079660   NaN   NaN   NaN   9.53588  15201.000000   
2018-01-04   NaN  1.114120   NaN   NaN   NaN   9.21399  15599.200195   
2018-01-05   NaN  0.999559   NaN   NaN   NaN  14.91720  17429.500000   

                DOGE  DOT         ETH        FIL      LINK         LTC  MATIC  \
Date                                                                            
2018-01-01  0.008909  NaN  772.640991  19.480200  0.733563  229.033005    NaN   
2018-01-02  0.009145  NaN  884.443970  20.110600  0.673712  255.684006    NaN   
2018-01-03  0.009320  NaN  962.719971  19.827499  0.681167  245.367996    NaN   
2018-01-04  0.009644  NaN  980.921997  20.417801  0.984368  241.

In [23]:
print(type(prices))
print(prices.columns)   
prices.head()
prices.tail()

<class 'pandas.core.frame.DataFrame'>
Index(['AAVE', 'ADA', 'ALGO', 'ATOM', 'AVAX', 'BNB', 'BTC', 'DOGE', 'DOT',
       'ETH', 'FIL', 'LINK', 'LTC', 'MATIC', 'SOL', 'TRX', 'UNI', 'VET', 'XLM',
       'XRP'],
      dtype='object')


Unnamed: 0_level_0,AAVE,ADA,ALGO,ATOM,AVAX,BNB,BTC,DOGE,DOT,ETH,FIL,LINK,LTC,MATIC,SOL,TRX,UNI,VET,XLM,XRP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2025-12-31,146.157852,0.33283,0.110272,1.926895,12.304885,863.257385,87508.828125,0.117294,1.789199,2967.037598,1.295499,12.188008,76.777306,0.216415,124.484467,0.284271,0.000163,0.010411,0.200651,1.839973
2026-01-01,148.690201,0.356212,0.120008,2.065787,13.549633,863.054626,88731.984375,0.12668,1.996028,3000.394287,1.485947,12.586159,79.843811,0.216415,126.761124,0.286523,0.000163,0.011062,0.208512,1.877936
2026-01-02,165.010849,0.393758,0.127333,2.16483,13.79994,880.844177,89944.695312,0.141673,2.161198,3124.422607,1.501941,13.263242,81.82328,0.216415,132.133667,0.288772,0.000163,0.011684,0.218848,2.005817
2026-01-03,163.000946,0.389375,0.127868,2.238555,13.9884,878.639465,90603.1875,0.143058,2.125039,3125.91748,1.494478,13.231463,82.052773,0.216415,133.298477,0.295363,0.000163,0.011707,0.222008,2.017407
2026-01-04,163.916565,0.400091,0.135829,2.337571,14.227369,894.383667,91413.492188,0.149308,2.13985,3140.710449,1.490114,13.39876,82.133148,0.216415,133.899612,0.293874,0.000163,0.012114,0.232537,2.090021


## Fetch Fear & Greed

#### Important Note: Fear & Greed Index vs Regime

I need to clarify here becasuse these two terms are look similar but actually different:

- **Fear & Greed Index (fg_raw)**: A pre-calculated sentiment score (0-100) from alternative.me. This is just one input feature we collect, like price or volatility.
- **Regime**: The market state we IDENTIFY using K-Means clustering. We use multiple features (price, volatility, fg_raw, market breadth) to determine if market is in Fear Regime or Greed Regime.

So, not to predict the sentiment index, but to identify which regime the market is in by combining multiple signals including (but not limited to) the sentiment index.

In [None]:
# get Fear & Greed Index data
url = "https://api.alternative.me/fng/?limit=0"
response = requests.get(url)
fg_json = response.json()['data']

# convert to DataFrame    fear & greed = fg
fg_df = pd.DataFrame(fg_json)
fg_df['value'] = fg_df['value'].astype(float)
fg_df['date'] = pd.to_datetime(fg_df['timestamp'], unit='s')
fg_df = fg_df[['date', 'value']].rename(columns={'value': 'fg_raw'})
fg_df.set_index('date', inplace=True)
fg_df.sort_index(inplace=True)

print(f"finished, total {len(fg_df)} days")
fg_df.tail()

finished, total 2892 days


  fg_df['date'] = pd.to_datetime(fg_df['timestamp'], unit='s')


Unnamed: 0_level_0,fg_raw
date,Unnamed: 1_level_1
2026-01-01,20.0
2026-01-02,28.0
2026-01-03,29.0
2026-01-04,25.0
2026-01-05,26.0


## Combine all data

#### combine all data


In [28]:
data_all = prices.join(fg_df, how='left')

####  Clean

In [None]:
# if some dates misssing, fill with previous day's value
data_all['fg_raw'] = data_all['fg_raw'].ffill()

In [None]:
# save final data
data_all.to_csv("../data/processed/full_market_matrix.csv")
print("save  final data to processed folder")
data_all.tail()

save  final data to processed folder


Unnamed: 0_level_0,AAVE,ADA,ALGO,ATOM,AVAX,BNB,BTC,DOGE,DOT,ETH,...,LINK,LTC,MATIC,SOL,TRX,UNI,VET,XLM,XRP,fg_raw
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-12-31,146.157852,0.33283,0.110272,1.926895,12.304885,863.257385,87508.828125,0.117294,1.789199,2967.037598,...,12.188008,76.777306,0.216415,124.484467,0.284271,0.000163,0.010411,0.200651,1.839973,21.0
2026-01-01,148.690201,0.356212,0.120008,2.065787,13.549633,863.054626,88731.984375,0.12668,1.996028,3000.394287,...,12.586159,79.843811,0.216415,126.761124,0.286523,0.000163,0.011062,0.208512,1.877936,20.0
2026-01-02,165.010849,0.393758,0.127333,2.16483,13.79994,880.844177,89944.695312,0.141673,2.161198,3124.422607,...,13.263242,81.82328,0.216415,132.133667,0.288772,0.000163,0.011684,0.218848,2.005817,28.0
2026-01-03,163.000946,0.389375,0.127868,2.238555,13.9884,878.639465,90603.1875,0.143058,2.125039,3125.91748,...,13.231463,82.052773,0.216415,133.298477,0.295363,0.000163,0.011707,0.222008,2.017407,29.0
2026-01-04,163.916565,0.400091,0.135829,2.337571,14.227369,894.383667,91413.492188,0.149308,2.13985,3140.710449,...,13.39876,82.133148,0.216415,133.899612,0.293874,0.000163,0.012114,0.232537,2.090021,25.0


## Final Data Preparation

In [26]:
print(f"Coins range: {len(data_all.columns) - 1}")  # minus 1 for fg_raw column
print(f"data range: {data_all.index.min().date()} to {data_all.index.max().date()}")
print(f"Total days data: {len(data_all)}")
print("\n:")
data_all[['BTC', 'ETH', 'SOL', 'fg_raw']].tail()

Coins range: 20
data range: 2018-01-01 to 2026-01-04
Total days data: 2926

:


Unnamed: 0_level_0,BTC,ETH,SOL,fg_raw
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-12-31,87508.828125,2967.037598,124.484467,21.0
2026-01-01,88731.984375,3000.394287,126.761124,20.0
2026-01-02,89944.695312,3124.422607,132.133667,28.0
2026-01-03,90603.1875,3125.91748,133.298477,29.0
2026-01-04,91413.492188,3140.710449,133.899612,25.0
