Each row of the dataset is a single game with the following features (in the order in the vector):

- Team won the game (1 or -1)
- Cluster ID (related to location)
- Game mode (eg All Pick)
- Game type (eg. Ranked)
- till end: Each element is an indicator for a hero.
Value of 1 indicates that a player from team '1' played as that hero and '-1' for the other team.
Hero can be selected by only one player each game. This means that each row has five '1' and five '-1' values.

We don't need these columns in futher development, so we dropped them:
- Cluster ID – represents the region of the game.
- Game mode – indicates the mode (e.g., All Pick, Captains Mode).
- Game type – ranked or unranked.


In [25]:
import xgboost as xgb
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import joblib
from tabulate import tabulate
import json
from transformers import pipeline

In [24]:

with open("./heroes.json", "r") as f:
    hero_data = json.load(f)
    hero_id_to_name = {str(hero["id"]): hero["localized_name"] for hero in hero_data["heroes"]}
    
with open("./heroes_skills.json", "r", encoding="utf-8") as f:
    heroes_skills_data = json.load(f)
    hero_tag_to_info = {hero["tag"]: hero for hero in heroes_skills_data}

# Load training dataset
train_dataset_path = "./dota2Train.csv"
df_train = pd.read_csv(train_dataset_path)

test_dataset_path = "./dota2Test.csv"
df_test = pd.read_csv(test_dataset_path)


non_hero_columns = ['winner', 'cluster_id', 'game_mode', 'game_type']
num_heroes = df_train.shape[1] - len(non_hero_columns)
hero_ids = list(range(1, num_heroes + 1))
hero_names = [hero_id_to_name.get(str(hero_id), f"Unknown_Hero_{hero_id}") for hero_id in hero_ids]
hero_columns = [f'hero_{i}' for i in range(num_heroes)]

df_train.columns = non_hero_columns + hero_names
df_train = df_train.drop(columns=['cluster_id', 'game_mode', 'game_type'])

df_test.columns = non_hero_columns + hero_names
df_test = df_test.drop(columns=['cluster_id', 'game_mode', 'game_type'])


# Display dataset with better formatting
print("\n===== Dota 2 Training Dataset Overview =====\n")
print(tabulate(df_train.head(), headers='keys', tablefmt='fancy_grid'))


===== Dota 2 Training Dataset Overview =====

╒════╤══════════╤═════════════╤═══════╤════════╤═══════════════╤══════════════════╤═══════════════╤═══════════════╤══════════════╤══════════╤═════════════╤════════════════╤══════════════════╤════════╤═════════╤═════════╤═════════════╤════════════════╤════════╤════════╤═══════════════════╤══════════════╤════════╤══════════╤═══════════════════╤════════╤════════╤═════════════════╤═══════════╤══════════════╤════════════════╤════════╤════════╤══════════╤══════════╤══════════╤═════════════╤═══════════╤═══════════════╤═════════════════╤══════════════╤═════════════════╤═════════════════╤═════════════════╤════════════════════╤═════════╤════════════════════╤═════════╤════════╤═════════════════╤══════════╤═════════════╤═══════════╤════════════════════╤═══════════════╤═════════════╤══════════╤══════════════╤═══════════════╤══════════╤═════════════════╤═══════════════╤═════════════════╤══════════╤══════════╤════════════╤════════╤═══════════╤═══════════

In [3]:
def filter_valid_drafts(df):
    team_1_heroes = df.iloc[:, 1:].apply(lambda row: (row == 1).sum(), axis=1)
    team_2_heroes = df.iloc[:, 1:].apply(lambda row: (row == -1).sum(), axis=1)
    return df[(team_1_heroes == 5) & (team_2_heroes == 5)]

df = filter_valid_drafts(df_train)

# **Generate Training Data for Draft Stage**
draft_samples = []
labels = []

for _, row in df.iterrows():
    ally_picks = []
    enemy_picks = []

    for hero in hero_names:
        if row[hero] == 1:
            ally_picks.append(hero)
        elif row[hero] == -1:
            enemy_picks.append(hero)

    for i in range(len(ally_picks)):  # Create different draft states
        current_state = {hero: 0 for hero in hero_names}
        for picked_hero in ally_picks[:i]:
            current_state[picked_hero] = 1
        for picked_hero in enemy_picks:
            current_state[picked_hero] = -1
        draft_samples.append(list(current_state.values()))
        labels.append(hero_names.index(ally_picks[i]))  # Next hero to pick

# **Convert to DataFrame**
X = pd.DataFrame(draft_samples, columns=hero_names)
y = np.array(labels)

# **Encode hero labels**
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

# **Train XGBoost Model**
model = xgb.XGBClassifier(
    objective='multi:softprob',
    num_class=len(label_encoder.classes_),
    eval_metric='mlogloss',
    tree_method="hist"  # Faster training
)
model.fit(X, y)

# **Save Model & Label Encoder**
joblib.dump(model, "xgboost_dota_draft_model.pkl")
joblib.dump(label_encoder, "label_encoder.pkl")
print("Model training completed and saved!")

Model training completed and saved!


In [30]:
# **Load Model & Label Encoder**
model = joblib.load("xgboost_dota_draft_model.pkl")
label_encoder = joblib.load("label_encoder.pkl")

df_test = filter_valid_drafts(df_test)

# **Prepare test data**
X_test = df_test.drop(columns=['winner'])
y_test = df_test['winner']

# **Predict heroes**
y_pred_encoded = model.predict(X_test)
y_pred = label_encoder.inverse_transform(y_pred_encoded)  # Convert back to hero names

# **Evaluate Accuracy**
correct_predictions = (y_pred == y_test).sum()
total_predictions = len(y_test)
accuracy = correct_predictions / total_predictions if total_predictions > 0 else 0

print(f"📊 Model Accuracy on Test Data: {accuracy:.4f}")

text_generator = pipeline("text-generation", model="gpt2")

def generate_hero_explanation(hero_name, ally_picks, enemy_picks):
    """
    Uses NLP to generate a custom explanation for why this hero is good for the draft.
    """
    hero_info = next((hero for hero in heroes_skills_data if hero["name"] == hero_name or hero["tag"] == hero_name.lower().replace(" ", "_")), None)
    
    if not hero_info:
        return f"ℹ️ No details available for {hero_name}."
    
    # Get hero details
    hype = hero_info.get("hype", "No hype description available.")
    abilities = ", ".join([ability["name"] for ability in hero_info.get("abilities", [])])
    role = hero_info.get("attributes", {}).get("Role", "Unknown Role")

    # Identify synergy heroes
    ally_names = [hero_id_to_name.get(str(h), f"Hero_{h}") for h in ally_picks]
    enemy_names = [hero_id_to_name.get(str(h), f"Hero_{h}") for h in enemy_picks]

    # Construct an NLP prompt
    prompt = (
        f"Hero: {hero_name}\n"
        f"Abilities: {abilities}\n"
        f"Role: {role}\n"
        f"Why is {hero_name} a good pick here?"
    )

    # Use NLP model to generate explanation
    explanation = text_generator(prompt, max_length=300, num_return_sequences=1, pad_token_id=50256)[0]["generated_text"]

    return f"🌟 **{hero_name}**: {hype}\n🛠 **Abilities**: {abilities}\n🎭 **Role**: {role}\n📝 **Why this pick?**: {explanation}"
  
# **Hero Recommendation Function**
def recommend_next_heroes(current_picks, enemy_picks, top_n=3):
    """
    Given the current draft state (ally picks) and enemy picks,
    predict the best next heroes considering counter picks.
    """
    if len(current_picks) >= 5:
        return "Draft complete: No more heroes can be picked."

    draft_state = {hero: 0 for hero in hero_names}
    for hero in current_picks:
        if hero in draft_state:
            draft_state[hero] = 1
    for hero in enemy_picks:
        if hero in draft_state:
            draft_state[hero] = -1
    
    draft_array = np.array([list(draft_state.values())])
    hero_probs = model.predict_proba(draft_array)[0]
    sorted_heroes = np.argsort(hero_probs)[::-1]  # Sort heroes by probability
    
    recommended_heroes = []
    explanations = []
    
    for recommended_hero in sorted_heroes:
        real_hero = label_encoder.inverse_transform([recommended_hero])[0]  # Convert back to hero ID
        real_hero_name = hero_id_to_name.get(str(real_hero), f"Unknown_Hero_{real_hero}")  # Convert to name
        if real_hero_name not in current_picks and real_hero_name not in enemy_picks:
            recommended_heroes.append(real_hero_name)
            explanations.append(generate_hero_explanation(real_hero_name, current_picks, enemy_picks))
            if len(recommended_heroes) == top_n:
                break
    
    return f"🛡 **Recommended Heroes**: {', '.join(recommended_heroes)}\n\n" + "\n\n".join(explanations)





📊 Model Accuracy on Test Data: 0.0000


Device set to use cuda:0


In [31]:

# **Example Usage**
sample_ally_picks = ["Axe", "Pudge", "Dazzle"]
sample_enemy_picks = ["Invoker", "Juggernaut", "Luna"]
print("🔥 Testing Hero Recommendation with Ally and Enemy Picks...")
print("✅ Ally Picks:", sample_ally_picks)
print("❌ Enemy Picks:", sample_enemy_picks)
print(recommend_next_heroes(sample_ally_picks, sample_enemy_picks))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


🔥 Testing Hero Recommendation with Ally and Enemy Picks...
✅ Ally Picks: ['Axe', 'Pudge', 'Dazzle']
❌ Enemy Picks: ['Invoker', 'Juggernaut', 'Luna']
🛡 **Recommended Heroes**: Enchantress, Nature's Prophet, Gyrocopter

🌟 **Enchantress**: Harmful up close and lethal at a distance, Enchantress skewers foes with attacks imbued to become more damaging the further they fly. Whether inflicting powerful slows on her enemies or charming forest creatures to fight her battles, she is never short of tools to win a fight.
🛠 **Abilities**: Untouchable, Enchant, Nature's Attendants, Impetus
🎭 **Role**: Support,Jungler,Pusher,Durable,Disabler
📝 **Why this pick?**: Hero: Enchantress
Abilities: Untouchable, Enchant, Nature's Attendants, Impetus
Role: Support,Jungler,Pusher,Durable,Disabler
Why is Enchantress a good pick here? She plays a bit like a Midrange Hunter. That means that she's also something of a bruiser, meaning she can easily get a very low HP, with her ability to heal herself and the team w