# Chess Opening Recommender: Style Matching & Clustering

## Overview  
In this phase, we take the **per‑player style vectors** computed in Phase 2 and find which **elite players** our target user most closely resembles. We will:

1. Build a style database of reference (elite) players.  
2. Optionally visualize the “style space” with PCA.  
3. Cluster elite players into style archetypes.  
4. Compute distances between the user’s vector and each elite player to identify top stylistic neighbors.

## Goal  
- Identify a small set of **style peers** among stronger players whose openings we will use.

## Purpose  
- Translate raw style metrics into actionable comparisons.  
- Lay the groundwork for personalized opening recommendations based on stylistic similarity.


In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances
from tqdm import tqdm
from pathlib import Path

In [9]:
DATA_DIR = Path("/Users/nicholasvega/Downloads/chess-opening-recommender/src/data")
USER_STYLE_CSV = DATA_DIR / "Chessanonymous1_style_vector.csv"
ELITE_STYLE_CSV = DATA_DIR / "elite_style_vectors.csv"

In [10]:
user_style_df = pd.read_csv(USER_STYLE_CSV, index_col=0)    
user_style_series = user_style_df['value']                

elite_style_vectors = pd.read_csv(ELITE_STYLE_CSV)          

### 3.1 Cluster Elite Style Vectors 

In [15]:
import sys
sys.path.append("/Users/nicholasvega/Downloads/chess-opening-recommender/src")
from recommender.clustering import cluster_styles, reduce_dimensions, find_style_neighbors

In [None]:
clustered_elite, scaler, kmeans_model = cluster_styles(
    elite_style_vectors.drop(columns=['player']),  
    n_clusters=4
)

clustered_elite['player'] = elite_style_vectors['player']

print("Players per cluster:")
print(clustered_elite['cluster'].value_counts())

Players per cluster:
cluster
0    112
1     62
2     58
3     41
Name: count, dtype: int64


### 3.2 Find User's Top Stylistic Neighbors 

In [20]:
top_peers = find_style_neighbors(
    user_vector=user_style_series,
    style_vectors=clustered_elite[['player'] + 
                                   [c for c in clustered_elite.columns if c not in ['player','cluster']]],
    scaler=scaler,
    top_n=5
)

print("Top 5 stylistic peers:")
display(top_peers)

Top 5 stylistic peers:




Unnamed: 0,player,distance
0,Attack2GM,0.677016
1,Neftegor,0.725788
2,Arteler,0.790166
3,Sakh_chess_2,0.823168
4,rtahmass,0.913448
