# `RankingPipeline`

Dans ce script on teste la pipeline complète, permettant de paramétrer les méthodes de calcul des scores.

## Calcul des scores

La méthode `run` lance le calcul des scores.

In [None]:
import os
from getpass import getpass

cache_dir = input("Indicate path to all Hugging Face caches:")
os.environ["HF_DATASETS_CACHE"] = cache_dir
os.environ["HF_HUB_CACHE"] = cache_dir
os.environ["HF_TOKEN"] = getpass("Enter your HuggingFace token:")

In [None]:
from pathlib import Path
from rank_comparia.pipeline import RankingPipeline

### Paramètres de `RankingPipeline`  

- `method` : Méthode de classement utilisé : `elo_random`, `elo_ordered`, `ml`  
- `include_votes` : Utilisation des données de votes  
- `include_reactions` : Utilisation des données de réactions
- `bootstrap_samples` : Nombres d'échantillons pour cacluler la version *Bootstrap*  
- `batch` : si on batch le nombre de match 
- `export_graphs` : le chemin vers le dossier dans lequel exporter les graphes

In [None]:
pipeline = RankingPipeline(
    method="elo_random",
    include_votes=True,
    include_reactions=True,
    bootstrap_samples=5,
    batch=False,
    export_path=Path("output"),
)

In [None]:
pipeline.matches

In [None]:
pipeline.match_list()

In [None]:
scores = pipeline.run()

In [None]:
scores

### Une autre méthode de calcul 

Ici on utilise uniquement les données de votes.

In [None]:
pipeline = RankingPipeline(
    method="elo_random",
    include_votes=True,
    include_reactions=False,
    bootstrap_samples=5,
    batch=False,
)
scores_votes = pipeline.run()

In [None]:
scores_votes

## Pipeline avec un ranker alternatif

Utilisation du Ranker `MaximumLikelihood`

In [None]:
pipeline = RankingPipeline(method="ml", include_votes=True, include_reactions=True, bootstrap_samples=5, batch=False)

In [None]:
scores_ml = pipeline.run()

In [None]:
pipeline = RankingPipeline(method="ml", include_votes=True, include_reactions=False, bootstrap_samples=5, batch=False)
scores_ml_votes = pipeline.run()

## Comparaison des différentes méthodes

In [None]:
import polars as pl

pl.concat(
    [
        scores.select("model", "median").rename(mapping={"median": "score_elo"}),
        scores_votes.select("model", "median").rename(mapping={"median": "score_elo_votes"}),
        scores_ml.select("model", "median").rename(mapping={"median": "score_ml"}),
        scores_ml_votes.select("model", "median").rename(mapping={"median": "score_ml_votes"}),
    ],
    how="align",
)

In [None]:
import polars as pl
import altair as alt

df_pl = pl.concat(
    [
        scores.select("model", "median").rename(mapping={"median": "score_elo"}),
        scores_votes.select("model", "median").rename(mapping={"median": "score_elo_votes"}),
        scores_ml.select("model", "median").rename(mapping={"median": "score_ml"}),
        scores_ml_votes.select("model", "median").rename(mapping={"median": "score_ml_votes"}),
    ],
    how="align",
).sort("score_elo", descending=True)

df = df_pl.to_pandas()
df_long = df.melt(
    id_vars=["model"],
    value_vars=["score_elo", "score_elo_votes", "score_ml", "score_ml_votes"],
    var_name="score_type",
    value_name="score",
)
legend_labels = {
    "score_elo": "Elo score (all data)",
    "score_elo_votes": "Elo score (votes data)",
    "score_ml": "BT score (all data)",
    "score_ml_votes": "BT score (votes data)",
}
df_long["score_type"] = df_long["score_type"].map(legend_labels)

chart = (
    alt.Chart(df_long)
    .mark_circle(size=80)
    .encode(
        x=alt.X("model:N", sort=df["model"].tolist(), title="Model"),
        y=alt.Y("score:Q", title="Score", scale=alt.Scale(domain=[500, 1300])),
        color=alt.Color("score_type:N", title="Score Type"),
        tooltip=["model", "score", "score_type"],
    )
    .properties(width=600, height=400)
)

chart

In [None]:
chart

## Scores par catégorie

Les méthodes `run_category` et `run_all_categories` permettent de calculer des scores pour une catégorie spécifiée ou pour toutes les catégories (avec un nombre de matchs total supérieur à un seuil).

In [None]:
pipeline = RankingPipeline(
    method="elo_random",
    include_votes=True,
    include_reactions=True,
    bootstrap_samples=5,
    batch=False,
)

In [None]:
pipeline.run_category("Education")

In [None]:
results = pipeline.run_all_categories()