# LLM Short-List Player Data Concept
---
By shortlisting the players list, we can still generate decent team results for regional and league-specific cases without truncating the dataset. 

Cutting down the dataset to the top 500 overall VLR rating also helps lower the tokens taken up for processing the LLM prompt request.

#### Pros:
---
- Fits into the context length limit with shortened token input (Slightly cheaper per request)


#### Cons:
---
- Works for solely team building and queries for stats on the data
- Does not give player data such as their recent team, latest games played, etc
- Need to add in ability to do more than team building and stats 

- RAG dataset routing needed...
- Does not know limitations or genders of people for each league. Ex: Game Changers being only female -> Assume any Game Changers participant is female
    - Need this for all-female team

In [7]:
import os
import pandas as pd
import json
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer, util
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from IPython.display import Markdown, display

from dotenv import load_dotenv

load_dotenv()

# Import this to simulate the LLM we use
import google.generativeai as genai
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage

## Import Player Data

In [10]:
# Import data
# filename = "../player_stats_splits/league_split/"
filename = "Z:\\VCT HACK\\Parakeet\\"
player_data = pd.read_parquet(filename+"all_leagues_players_stats.parquet").drop(['team_name'], axis=1)
player_data = player_data.sort_values(by='R2.0', ascending=False)
player_data

Unnamed: 0,Rnd,R2.0,ACS,K:D,KAST,ADR,KPR,APR,FKPR,FDPR,...,K,D,A,FK,FD,Player,Agents,Roles,region,league_name
0,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,challengers_na
1363,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,game_changers_na
994,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,game_changers_na
995,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,game_changers_na
996,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,game_changers_na
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
989,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,134,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers_br
990,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,134,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers_br
991,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,134,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers_br
992,245,0.71,143.3,0.66,68%,96.3,0.51,0.41,0.04,0.11,...,126,191,100,11,26,roud,"[omen, viper]",[controller],SEA,challengers_sea_hk_and_tw


In [11]:
# Drop duplicates
player_data = player_data.drop_duplicates(subset=['region', 'league_name', 'Player'])
player_data

Unnamed: 0,Rnd,R2.0,ACS,K:D,KAST,ADR,KPR,APR,FKPR,FDPR,...,K,D,A,FK,FD,Player,Agents,Roles,region,league_name
0,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,challengers_na
1363,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,341,237,70,54,35,florescent,"[jett, raze]",[duelist],,game_changers_na
1366,327,1.36,308.4,1.45,75%,187.3,1.05,0.26,0.31,0.21,...,343,237,84,101,69,Miku,"[yoru, raze, jett]",[duelist],LATAM,game_changers_latam
1368,423,1.34,297.2,1.49,73%,198.6,1.05,0.22,0.26,0.18,...,444,297,91,108,75,Lied,"[raze, iso, gekko]","[initiator, duelist]",LATAM,game_changers_latam
1370,328,1.31,287.9,1.41,74%,182.8,1.03,0.17,0.26,0.17,...,339,241,55,86,57,miNt,"[raze, jett]",[duelist],JP,game_changers_jpn
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
985,217,0.74,158.5,0.70,60%,107.0,0.57,0.18,0.05,0.09,...,123,175,40,10,19,Nay,"[viper, cypher, gekko]","[initiator, controller, sentinel]",SEA,challengers_sea_id
988,203,0.73,162.8,0.66,66%,104.1,0.54,0.29,0.03,0.06,...,109,165,59,7,12,Xty,"[cypher, killjoy, gekko]","[initiator, sentinel]",SEA,challengers_sea_ph
989,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,134,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers_br
992,245,0.71,143.3,0.66,68%,96.3,0.51,0.41,0.04,0.11,...,126,191,100,11,26,roud,"[omen, viper]",[controller],SEA,challengers_sea_hk_and_tw


In [12]:
player_data['region'].value_counts()

region
SEA      185
NA       134
EMEA      81
BR        69
LATAM     55
JP        35
KR        29
INTL      20
LAS       18
VN         7
LAN        4
N/A        3
Name: count, dtype: int64

In [13]:
# Add long names for each region for better vector search
region_long_names = {
    'NA': 'North America',
    'SEA': 'Southeast Asia',
    'EMEA': 'Europe, Middle East, and Africa',
    'BR': 'Brazil',
    'LATAM': 'Latin America',
    'JP': 'Japan',
    'KR': 'South Korea',
    'INTL': 'International',
    'LAS': 'Latin America South',
    'VN': 'Vietnam',
    'LAN': 'Latin America North'
}

player_data['region_long'] = player_data['region'].map(region_long_names)
player_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  player_data['region_long'] = player_data['region'].map(region_long_names)


Unnamed: 0,Rnd,R2.0,ACS,K:D,KAST,ADR,KPR,APR,FKPR,FDPR,...,D,A,FK,FD,Player,Agents,Roles,region,league_name,region_long
0,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,237,70,54,35,florescent,"[jett, raze]",[duelist],,challengers_na,North America
1363,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,237,70,54,35,florescent,"[jett, raze]",[duelist],,game_changers_na,North America
1366,327,1.36,308.4,1.45,75%,187.3,1.05,0.26,0.31,0.21,...,237,84,101,69,Miku,"[yoru, raze, jett]",[duelist],LATAM,game_changers_latam,Latin America
1368,423,1.34,297.2,1.49,73%,198.6,1.05,0.22,0.26,0.18,...,297,91,108,75,Lied,"[raze, iso, gekko]","[initiator, duelist]",LATAM,game_changers_latam,Latin America
1370,328,1.31,287.9,1.41,74%,182.8,1.03,0.17,0.26,0.17,...,241,55,86,57,miNt,"[raze, jett]",[duelist],JP,game_changers_jpn,Japan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
985,217,0.74,158.5,0.70,60%,107.0,0.57,0.18,0.05,0.09,...,175,40,10,19,Nay,"[viper, cypher, gekko]","[initiator, controller, sentinel]",SEA,challengers_sea_id,Southeast Asia
988,203,0.73,162.8,0.66,66%,104.1,0.54,0.29,0.03,0.06,...,165,59,7,12,Xty,"[cypher, killjoy, gekko]","[initiator, sentinel]",SEA,challengers_sea_ph,Southeast Asia
989,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers_br,Brazil
992,245,0.71,143.3,0.66,68%,96.3,0.51,0.41,0.04,0.11,...,191,100,11,26,roud,"[omen, viper]",[controller],SEA,challengers_sea_hk_and_tw,Southeast Asia


In [14]:
# Clean up league names for easier tokenization
player_data['league_name'] = player_data['league_name'].str.replace("_", " ")
player_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  player_data['league_name'] = player_data['league_name'].str.replace("_", " ")


Unnamed: 0,Rnd,R2.0,ACS,K:D,KAST,ADR,KPR,APR,FKPR,FDPR,...,D,A,FK,FD,Player,Agents,Roles,region,league_name,region_long
0,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,237,70,54,35,florescent,"[jett, raze]",[duelist],,challengers na,North America
1363,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,237,70,54,35,florescent,"[jett, raze]",[duelist],,game changers na,North America
1366,327,1.36,308.4,1.45,75%,187.3,1.05,0.26,0.31,0.21,...,237,84,101,69,Miku,"[yoru, raze, jett]",[duelist],LATAM,game changers latam,Latin America
1368,423,1.34,297.2,1.49,73%,198.6,1.05,0.22,0.26,0.18,...,297,91,108,75,Lied,"[raze, iso, gekko]","[initiator, duelist]",LATAM,game changers latam,Latin America
1370,328,1.31,287.9,1.41,74%,182.8,1.03,0.17,0.26,0.17,...,241,55,86,57,miNt,"[raze, jett]",[duelist],JP,game changers jpn,Japan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
985,217,0.74,158.5,0.70,60%,107.0,0.57,0.18,0.05,0.09,...,175,40,10,19,Nay,"[viper, cypher, gekko]","[initiator, controller, sentinel]",SEA,challengers sea id,Southeast Asia
988,203,0.73,162.8,0.66,66%,104.1,0.54,0.29,0.03,0.06,...,165,59,7,12,Xty,"[cypher, killjoy, gekko]","[initiator, sentinel]",SEA,challengers sea ph,Southeast Asia
989,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers br,Brazil
992,245,0.71,143.3,0.66,68%,96.3,0.51,0.41,0.04,0.11,...,191,100,11,26,roud,"[omen, viper]",[controller],SEA,challengers sea hk and tw,Southeast Asia


In [16]:
# Load Large Language Model
"""
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")
"""
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

model = genai.GenerativeModel("gemini-1.5-flash")

KeyError: 'GOOGLE_API_KEY'

In [171]:
# Load vector search model
vector_model = SentenceTransformer("all-MiniLM-L6-v2")



In [None]:
context = """
You are a chatbot to build a team for Valorant E-Sports tournaments. Valorant is a 5-person first-person shooter by Riot Games.
You are provided data
The dataset contains the following columns:

Rnd: Rounds Played
R2.0: Rating
ACS: Average Combat Score
K:D: Kill-Death Ratio
KAST: Kills, Assist, Trade, Survive Percentage
ADR: Average Damage Per Round
KPR: Kills Per Round
APR: Assists Per Round
FKPR: First Kills Per Round
FDPR: First Deaths Per Round
HS%: Headshot Percentage
CL%: Clutch Success Percentage   
CL: Clutches Won-Played Ratio
KMax: Maximum Kills in a single map
K: Kills
D: Deaths
A: Assists
FK: First Kills
FD: First Deaths
Player: The Player's Name
Agents: A list of agents that a player plays in a game

Use these definitions and the dataset(s) provided to answer questions about an optimal Valorant team.

If you build a team. Return:
1. A list of the players for the team, region, league, current team, agents played, role
2. Why this is a good team
3. The text from the prompt after the Note below that did not make the context length

Answer any questions about players and teams.

Note: The system prompt context length is limited to 128,000 tokens. If the prompt exceeds this limit, the excess tokens will be cut off. Please keep this in mind when formulating your questions or requests.
"""
# 

response = model.invoke([
    SystemMessage(content=context),
    HumanMessage(content='Make me a high performing team for the VCT Challengers tournament with international players with this data: '+player_data.to_string())
])


display(Markdown(response.content.replace("\n", "<br>")))

### Step 1 - Query

In [200]:
query = "Make me a high performing team for valorant"

query_embedding = vector_model.encode(query, convert_to_tensor=True)

### Step 2 - Determine if a filter needs to be applied or use the entire dataset

In [173]:
# Create options for determining what to filter for
"""
Prompt engineering to determine which category to filter by if needed

Determine which category should be filtered with pandas and return it as a single line response.

If no specific filter is needed, return a single line response of "No".

Note: The dataset has the league name and the league's region in the league_name column. The vector search should be able to differentiate
"""

""" TEMPORARY UNTIL WE REPLACE WITH LLM REPLY """

filter_prompt = f"""

Here is the user query: {query}

Determine which column should be filtered. The options are: league_name, region, none

Return league_name if user specifies either VCT Challengers, Game Changers, VCT International
Return region if the user specifies a region in the world. Such as: NA, EMEA, Brazil, Vietnam
Return none if they want a team made up of players from anywhere in the world

Return as a single word response only of either league_name or region.
"""

filter_column_select = model.generate_content(filter_prompt)


# Determine filter column
if "league_name" in filter_column_select.text:
    filter_column_select = ""
elif "region" in filter_column_select.text:
    filter_column_select = "region"
else:
    filter_column_select = "none"

filter_column_select

'none'

## Step 3 - Determine what filter to use on column matched

In [174]:
# Merge into string
corpus = pd.DataFrame()

#filter_column_select = "none"    # Manual testing

# Region Filter
if filter_column_select == "league_name":
    corpus['text'] = player_data['league_name']

elif filter_column_select == "region":
    corpus['text'] = player_data.apply(lambda row: ' | '.join([row[col] for col in ['region', 'region_long']]), axis=1)
else:
    corpus['text'] = player_data.apply(lambda row: ' | '.join([f"{col}: {row[col]}" for col in player_data.columns]), axis=1)

corpus

Unnamed: 0,text
0,Rnd: 340 | R2.0: 1.37 | ACS: 283.9 | K:D: 1.44...
1324,Rnd: 340 | R2.0: 1.37 | ACS: 283.9 | K:D: 1.44...
1325,Rnd: 327 | R2.0: 1.3599999999999999 | ACS: 308...
1327,Rnd: 423 | R2.0: 1.34 | ACS: 297.2 | K:D: 1.49...
954,Rnd: 328 | R2.0: 1.31 | ACS: 287.9 | K:D: 1.41...
...,...
942,Rnd: 221 | R2.0: 0.75 | ACS: 186.1 | K:D: 0.77...
1320,Rnd: 224 | R2.0: 0.75 | ACS: 151.2 | K:D: 0.69...
1321,Rnd: 217 | R2.0: 0.74 | ACS: 135.4 | K:D: 0.67...
943,Rnd: 220 | R2.0: 0.73 | ACS: 190.1 | K:D: 0.72...


In [175]:
# Vectorize data
stats_vectors = vector_model.encode(corpus['text'].tolist(), convert_to_tensor=True)
stats_vectors = np.array(stats_vectors.cpu())

In [183]:
# Create vectors index

d = stats_vectors.shape[1]  
index = faiss.IndexFlatL2(d)  # l2 distance
index.add(stats_vectors)  # Add vectors to the index

In [184]:
query_vector = vector_model.encode([query], convert_to_tensor=True).cpu().numpy()

k = 500  # Number of nearest neighbors
D, I = index.search(query_vector, k)  # Search

# Get corresponding rows from the DataFrame
results = player_data.iloc[I[0]]
results['league_name'].value_counts()

league_name
challengers na                 105
challengers br                  51
challengers sea sg and my       36
challengers sea ph              28
challengers jpn                 26
challengers kr                  26
game changers emea              21
game changers sea               18
challengers apac                18
challengers italy               17
game changers latam             17
challengers sea id              16
challengers latam               15
challengers latam s             14
challengers portugal            14
challengers sea hk and tw       13
game changers series brazil     13
game changers na                11
challengers sea th              10
challengers sea vn               7
game changers sa                 7
game changers jpn                7
challengers south asia           6
challengers latam n              4
Name: count, dtype: int64

### Step 4 - Filter dataset and sort by best performing on ranking

### League check if query specifies a league like Challengers, Game Changers, International

In [185]:
# See what the breakdown is for leagues
results['league_name'].value_counts()

league_name
challengers na                 105
challengers br                  51
challengers sea sg and my       36
challengers sea ph              28
challengers jpn                 26
challengers kr                  26
game changers emea              21
game changers sea               18
challengers apac                18
challengers italy               17
game changers latam             17
challengers sea id              16
challengers latam               15
challengers latam s             14
challengers portugal            14
challengers sea hk and tw       13
game changers series brazil     13
game changers na                11
challengers sea th              10
challengers sea vn               7
game changers sa                 7
game changers jpn                7
challengers south asia           6
challengers latam n              4
Name: count, dtype: int64

### Region check if region is specified

In [186]:
# Breakdown for region
results['region'].value_counts()

region
SEA      139
NA       116
BR        64
EMEA      52
JP        33
LATAM     32
KR        26
LAS       14
INTL      13
VN         7
LAN        4
Name: count, dtype: int64

All player Data Num Tokens No Filter Est. -> 172132

Shortlisted by all leagues with 500 matches -> 55070

https://token-counter.app/meta/llama-3.1

In [187]:
results.to_string()

'      Rnd  R2.0    ACS   K:D KAST    ADR   KPR   APR  FKPR  FDPR  HS%  CL%      CL  KMax    K    D    A   FK   FD       Player                         Agents                              Roles region                  league_name                      region_long\n162   276  1.10  235.0  1.13  74%  157.5  0.81  0.24  0.15  0.13  27%   8%    2/25    34  224  199   66   41   36         DaFt            [raze, neon, gekko]               [duelist, initiator]     VN           challengers sea vn                          Vietnam\n278   355  1.06  261.4  1.11  72%  165.3  0.88  0.23  0.23  0.22  22%         0/11    30  312  280   82   81   77       Virtyy             [neon, raze, jett]                          [duelist]  LATAM            challengers latam                    Latin America\n27    546  1.19  264.7  1.24  75%  167.0  0.92  0.23  0.20  0.16  30%  14%    5/35    32  505  408  125  107   87    Dantedeu5            [jett, raze, gekko]               [duelist, initiator]    LAS          c

### Determine if tokens exceed our safety limit. If yes, remove a certain number of rows depending on how many over sorted by rating. Lowest get cut off first

In [188]:
"""
No idea what to do for this part yet that doesn't slow down the code
"""

"\nNo idea what to do for this part yet that doesn't slow down the code\n"

In [189]:
shortlist = results.sort_values(by='R2.0', ascending=False)
shortlist

Unnamed: 0,Rnd,R2.0,ACS,K:D,KAST,ADR,KPR,APR,FKPR,FDPR,...,D,A,FK,FD,Player,Agents,Roles,region,league_name,region_long
1324,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,237,70,54,35,florescent,"[jett, raze]",[duelist],,game changers na,North America
0,340,1.37,283.9,1.44,76%,188.5,1.00,0.21,0.16,0.10,...,237,70,54,35,florescent,"[jett, raze]",[duelist],,challengers na,North America
1325,327,1.36,308.4,1.45,75%,187.3,1.05,0.26,0.31,0.21,...,237,84,101,69,Miku,"[yoru, raze, jett]",[duelist],LATAM,game changers latam,Latin America
1327,423,1.34,297.2,1.49,73%,198.6,1.05,0.22,0.26,0.18,...,297,91,108,75,Lied,"[raze, iso, gekko]","[duelist, initiator]",LATAM,game changers latam,Latin America
954,328,1.31,287.9,1.41,74%,182.8,1.03,0.17,0.26,0.17,...,241,55,86,57,miNt,"[raze, jett]",[duelist],JP,game changers jpn,Japan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
936,217,0.76,159.5,0.73,62%,107.6,0.55,0.13,0.08,0.11,...,165,28,17,23,sherb,"[cypher, killjoy]",[sentinel],,challengers na,North America
938,233,0.76,162.9,0.72,64%,111.4,0.56,0.21,0.08,0.06,...,181,49,18,15,kayle,"[breach, cypher, killjoy]","[initiator, sentinel]",JP,challengers jpn,Japan
942,221,0.75,186.1,0.77,59%,128.1,0.64,0.17,0.13,0.26,...,184,37,29,57,fainz,"[jett, neon, raze]",[duelist],EMEA,challengers portugal,"Europe, Middle East, and Africa"
943,220,0.73,190.1,0.72,69%,122.2,0.61,0.25,0.13,0.21,...,185,54,28,47,RND,"[neon, jett, raze]",[duelist],BR,challengers br,Brazil


In [190]:
shortlist = player_data.sort_values(by='R2.0', ascending=False)
                                    
# Condense players with multiple leagues
shortlist = shortlist.groupby('Player').agg({
    'Rnd': 'max',              
    'R2.0': 'max',
    'ACS': 'max',
    'K:D': 'max',
    'KAST': 'max',
    'ADR': 'max',
    'KPR': 'max',
    'APR': 'max',
    'FKPR': 'max',
    'FDPR': 'max',
    'HS%': 'max',
    'CL%': 'max',
    'CL': 'max',
    'KMax': 'max',
    'K': 'max',
    'D': 'max',
    'A': 'max',
    'FK': 'max',
    'FD': 'max',
    'Agents': 'first',        
    'Roles': 'first',
    'league_name': ', '.join,  # consolidate all leagues the player plays in
    'region': 'first',
    'region_long': 'first'
}).reset_index()


shortlist

Unnamed: 0,Player,Rnd,R2.0,ACS,K:D,KAST,ADR,KPR,APR,FKPR,...,K,D,A,FK,FD,Agents,Roles,league_name,region,region_long
0,123,210,1.09,240.6,1.09,67%,161.6,0.83,0.17,0.14,...,174,160,35,30,33,"[jett, raze, neon]",[duelist],challengers sea hk and tw,SEA,Southeast Asia
1,999kvmil,212,0.89,183.8,0.87,67%,121.4,0.62,0.25,0.08,...,131,151,52,18,22,"[neon, raze, jett]",[duelist],"challengers portugal, challengers italy",EMEA,"Europe, Middle East, and Africa"
2,Absol,208,1.03,233.1,1.04,69%,150.8,0.82,0.24,0.22,...,171,165,49,46,35,"[raze, yoru, phoenix]",[duelist],challengers jpn,JP,Japan
3,Addicted,406,1.06,182.7,1.04,69%,125.9,0.66,0.29,0.06,...,266,256,118,26,21,"[sova, gekko, deadlock]","[controller, initiator]","challengers portugal, challengers br",EMEA,"Europe, Middle East, and Africa"
4,Adenina,326,0.99,180.6,1.00,78%,113.4,0.64,0.46,0.04,...,208,208,150,14,24,"[gekko, skye, kayo]",[initiator],game changers latam,LATAM,Latin America
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
484,xenom,536,1.28,217.4,1.37,74%,146.4,0.82,0.32,0.08,...,439,321,172,42,24,"[omen, viper, brimstone]",[controller],challengers br,BR,Brazil
485,xeus,426,1.26,274.1,1.33,70%,178.6,0.96,0.19,0.20,...,411,308,79,86,61,"[raze, jett]",[duelist],game changers emea,EMEA,"Europe, Middle East, and Africa"
486,xffero,234,1.03,209.4,1.02,72%,139.0,0.72,0.26,0.06,...,168,164,62,14,18,"[omen, killjoy, viper]","[controller, sentinel]",challengers sea ph,SEA,Southeast Asia
487,yoman,398,1.01,187.5,1.02,71%,124.3,0.65,0.36,0.08,...,258,252,142,31,33,"[fade, kayo, sova]",[initiator],challengers kr,KR,South Korea


## Note: Current configuration does not take into account team role imbalance. The LLM's reasoning expertise will figure it out

## Step 5 - Pass information to LLM with data from each player to compile report

In [193]:
# Convert data to string - Smallest amount of tokens generated
data_pass = shortlist.to_string()
data_pass

'          Player  Rnd  R2.0    ACS   K:D KAST    ADR   KPR   APR  FKPR  FDPR  HS%  CL%      CL  KMax    K    D    A   FK   FD                         Agents                              Roles                                                                                                league_name region                      region_long\n0            123  210  1.09  240.6  1.09  67%  161.6  0.83  0.17  0.14  0.16  28%  17%    4/23    33  174  160   35   30   33             [jett, raze, neon]                          [duelist]                                                                                  challengers sea hk and tw    SEA                   Southeast Asia\n1       999kvmil  212  0.89  183.8  0.87  67%  121.4  0.62  0.25  0.08  0.10  26%   5%    1/21    16  131  151   52   18   22             [neon, raze, jett]                          [duelist]                                                                    challengers portugal, challengers italy   EMEA  Europe, Midd

In [201]:
def prompt(user_query):
    prompt = f"""
        You are a Valorant team building chatbot that builds teams and answers team composition questions

        Here is the user query: {user_query}

        The dataset contains the following columns:
        Rnd: Rounds Played
        R2.0: Overall Rating
        ACS: Average Combat Score
        K:D: Kill-Death Ratio
        KAST: Kills, Assist, Trade, Survive Percentage
        ADR: Average Damage Per Round
        KPR: Kills Per Round
        APR: Assists Per Round
        FKPR: First Kills Per Round
        FDPR: First Deaths Per Round
        HS%: Headshot Percentage
        CL%: Clutch Success Percentage   
        CL: Clutches Won-Played Ratio
        KMax: Maximum Kills in a single map
        K: Kills
        D: Deaths
        A: Assists
        FK: First Kills
        FD: First Deaths
        Player: Player's Name
        Agents: List of agents that a player plays in a game
        league_name: Leagues that player plays in
        region: Region code the player plays in
        region_long: Full region name of where the player plays in

        Task: Use this information to analyze and interpret the data. You may be asked to:

        Create a Valorant E-Sports team based on this data
        Answer specific questions about why you selected certain players for the team based on the data
        Answer questions about each player on the team
        Answer questions about players in general

        If no specific league or region is defined by the player, just factor in all leagues, regions, and players.
        
        Here is the dataset: {data_pass}
        """
    return prompt

In [202]:
response = model.generate_content(prompt(query))

In [203]:
# Prettify Response
display(Markdown(response.text))

Here is a high-performing Valorant team based on the provided dataset:

**Duelist:**

* **Player:**  "Zanatsu" 
* **Reason:** "Zanatsu" has a high ACS (266.2) and a strong K:D ratio (1.22) which indicates consistent kills and good survival. They also have a high KMax (36), demonstrating the ability to carry rounds in clutches. They excel with various duelists, suggesting adaptability.  

**Sentinel:**

* **Player:**  "Zellsis"
* **Reason:** "Zellsis" shows a strong KAST (74%) and a solid ACS (176.9), demonstrating their ability to survive rounds, provide assists, and contribute to wins. Their experience with Sentinel agents like Cypher, Breach, and Viper makes them a well-rounded sentinel with versatile map control options.

**Initiator:**

* **Player:** "Sacy"
* **Reason:** "Sacy" has a high KAST (74%) which indicates strong round contribution through kills, assists, trades, and survival. They also have a high number of FKPR (0.10) meaning they often start rounds with a kill and control the tempo. "Sacy" has played a range of Initiators, showcasing adaptability and versatility in supporting the team's pushes.  

**Controller:**

* **Player:** "gMd" 
* **Reason:** "gMd" possesses an exceptional overall rating (1.08), a solid ACS (213.1), and good KAST (76%), signifying consistent and impactful gameplay.  Their proficiency with Omen, Brimstone, and Viper allows for tactical map control and effective use of smokes and ultimate abilities.  

**Flex:**

* **Player:** "xavi8k"
* **Reason:** "xavi8k" is a high-performing sentinel with good ACS (220.7) and a decent KAST (75%). They show a strong headshot percentage (27%), indicating accuracy and precision. "xavi8k" is adept with both Cypher and Killjoy, demonstrating strong skills with both sentinels. 

**Team Breakdown:**

This team boasts strong individual performances and a good mix of agents with complementary skills. 

* The duelist "Zanatsu" will create entry opportunities and lead aggressive pushes.
* "Zellsis" will provide crucial information and map control with their Sentinel abilities.
* "Sacy" will control the pace of the rounds with their initiators, gathering information, and creating space for their team. 
* "gMd" will create the necessary vision and smokes for the team's pushes, allowing them to control the map. 
* "xavi8k" will provide additional support and map control as a flex player, filling in wherever needed with their sentinel skills.

This team composition features a strong foundation of strong individual players with diverse skills and experience, combined with a balanced agent mix for excellent map control and versatility. 
