# 03 · Event Recommender (Learning-to-Rank)

Cíl: doporučit hry vhodné pro konkrétní **akci** (počet hráčů, délka, tagy).

Pipeline:
1) Načíst `data/games_features.parquet`
2) Kandidáti filtrem (players/time) + podobnost tagů (TF-IDF)
3) Trénink LambdaMART (LightGBM Ranker) na syntetických datech
4) `recommend_event(players, duration, tags, k)` vrátí top hry

In [8]:
# (Volitelné) pokud chybí lightgbm, doinstaluj v Anacondě:
# conda install -c conda-forge lightgbm
import os, numpy as np, pandas as pd
import lightgbm as lgb
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

DATA_DIR = "data"
games_path = os.path.join(DATA_DIR, "games_features.parquet")
assert os.path.exists(games_path), "Chybí data/games_features.parquet – spusť 01_feature_engineering.ipynb"

games = pd.read_parquet(games_path)
games["tags"] = (games["categories"].fillna("") + " " + games["mechanics"].fillna("")).str.lower()
print("Počet her:", len(games))

Počet her: 5


## TF-IDF vektorizer pro tagy (kategorie + mechaniky)

In [9]:
vec = TfidfVectorizer(token_pattern=r"[a-zA-Z\-]+")
X = vec.fit_transform(games["tags"])  # řidká matice
X.shape

(5, 101)

## Výběr kandidátů pro akci

In [10]:
def candidates(players:int, duration:int, tags:list|None, topk:int=50, time_tolerance:float=1.2):
    g = games.copy()
    ok = (g["min_players"] <= players) & (g["max_players"] >= players) & (g["playing_time"] <= int(duration * time_tolerance))
    g = g[ok].copy()
    if tags:
        q = " ".join(t.lower() for t in tags)
        qv = vec.transform([q])
        g["sim"] = cosine_similarity(X[g.index], qv).ravel()
    else:
        g["sim"] = 0.0
    g["cand_score"] = 0.7 * g["sim"] + 0.3 * g.get("pop_norm", 0).fillna(0)
    return g.sort_values("cand_score", ascending=False).head(topk)

## Syntetický tréninkový set pro LambdaMART
V praxi by se trénovalo na reálných *plays*; zde jen demo: pro každý dotaz (event)
vezmeme 20 kandidátů a prvního označíme jako pozitivní.

In [11]:
rng = np.random.default_rng(42)
events = []
player_opts = [2,3,4,5,6]
dur_opts = [30,45,60,90,120]
tag_opts = [[], ["cooperative"], ["party","word"], ["card-drafting"], ["strategy"], ["family"], ["deckbuilding"]]

for p in player_opts:
    for dur in dur_opts:
        for tags in tag_opts:
            cands = candidates(p, dur, tags, topk=20)
            if len(cands) == 0:
                continue
            cands = cands.reset_index(drop=False).rename(columns={"index":"rowidx"})
            cands["label"] = 0
            cands.loc[cands.index[0], "label"] = 1  # první jako pozitivní (synteticky)
            cands["players_req"] = p
            cands["duration_req"] = dur
            cands["tags_req"] = " ".join(tags)
            events.append(cands)

data = pd.concat(events, ignore_index=True)
print("Training rows:", len(data), "| #queries:", data.groupby(["players_req","duration_req","tags_req"]).ngroups)
data.head(3)

Training rows: 399 | #queries: 175


Unnamed: 0,rowidx,bgg_id,name,min_players,max_players,playing_time,year,categories,mechanics,description,...,rating_count,weight,tags,pop_norm,sim,cand_score,label,players_req,duration_req,tags_req
0,0,173346,7 Wonders Duel,2,2,30,2015,ancient card game city building civilization e...,end game bonuses income melding and splaying m...,In many ways 7 Wonders Duel resembles its pare...,...,104654,2.2267,ancient card game city building civilization e...,0.68676,0.0,0.206028,1,2,30,
1,3,68448,7 Wonders,2,7,30,2010,ancient card game city building civilization e...,closed drafting end game bonuses hand manageme...,You are the leader of one of the 7 great citie...,...,110220,2.3146,ancient card game city building civilization e...,0.259467,0.0,0.07784,0,2,30,
2,0,173346,7 Wonders Duel,2,2,30,2015,ancient card game city building civilization e...,end game bonuses income melding and splaying m...,In many ways 7 Wonders Duel resembles its pare...,...,104654,2.2267,ancient card game city building civilization e...,0.68676,0.0,0.206028,1,2,30,cooperative


## Featury pro ranker + trénink LightGBM (LambdaMART)

In [12]:
# Penalizace za odchylku hráčů / překročení času
data["players_center_dist"] = np.maximum(0, np.maximum(data["min_players"] - data["players_req"], data["players_req"] - data["max_players"]))
data["time_over"] = np.maximum(0, data["playing_time"] - data["duration_req"])  # jak moc přesahuje

feat_cols = ["sim", "pop_norm", "weight", "players_center_dist", "time_over", "playing_time"]
X_train = data[feat_cols]
y_train = data["label"]
group = data.groupby(["players_req","duration_req","tags_req"]).size().values  # velikosti skupin (dotazy)

train_set = lgb.Dataset(X_train, label=y_train, group=group)
params = dict(objective="lambdarank", metric="ndcg", learning_rate=0.05, num_leaves=63)
model = lgb.train(params, train_set, num_boost_round=120)  # žádné verbose_eval
print("Model natrénován. Featury:", feat_cols)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000075 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 16
[LightGBM] [Info] Number of data points in the train set: 399, number of used features: 3
Model natrénován. Featury: ['sim', 'pop_norm', 'weight', 'players_center_dist', 'time_over', 'playing_time']


## Inference: doporučení pro event

In [13]:
def recommend_event(players:int, duration:int, tags:list|None, k:int=5):
    # vyžaduje natrénovaný 'model'
    try:
        _ = model
    except NameError:
        raise RuntimeError("Model není natrénován – spusť buňku s lgb.train().")

    df = candidates(players, duration, tags, topk=40)
    if df.empty:
        return df
    df = df.copy()
    df["players_center_dist"] = np.maximum(0, np.maximum(df["min_players"] - players, players - df["max_players"]))
    df["time_over"] = np.maximum(0, df["playing_time"] - duration)
    feat_cols = ["sim", "pop_norm", "weight", "players_center_dist", "time_over", "playing_time"]
    df["score"] = model.predict(df[feat_cols])
    cols = ["name","min_players","max_players","playing_time","weight","categories","mechanics","score"]
    return df.sort_values("score", ascending=False).head(k)[cols]

print("Doporučení pro 4 hráče / 60 min / cooperative:")
display(recommend_event(4, 60, ["cooperative"], k=5))
print("\nDoporučení pro 6 hráčů / 30 min / party, word:")
display(recommend_event(6, 30, ["party","word"], k=5))

Doporučení pro 4 hráče / 60 min / cooperative:


Unnamed: 0,name,min_players,max_players,playing_time,weight,categories,mechanics,score
3,7 Wonders,2,7,30,2.3146,ancient card game city building civilization e...,closed drafting end game bonuses hand manageme...,-0.283695
2,Pandemic,2,4,45,2.3956,medical travel,action points chaining contracts cooperative g...,-2.187207
1,Ticket to Ride,2,5,60,1.8216,trains,connections contracts end game bonuses hand ma...,-4.523301



Doporučení pro 6 hráčů / 30 min / party, word:


Unnamed: 0,name,min_players,max_players,playing_time,weight,categories,mechanics,score
3,7 Wonders,2,7,30,2.3146,ancient card game city building civilization e...,closed drafting end game bonuses hand manageme...,-0.283695


### Poznámky
- V praxi místo syntetického tréninku použij logy *plays* z BGG.
- `time_tolerance` v kandidátech lze upravit (výchozí 1.2 = 20% rezerva).
- Lze přidat další featury (např. overlap intervalu hráčů, rok vydání apod.).