# Imports and constants

In [1]:
from getpass import getpass
import os

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.core import Settings, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

from IPython.display import display, Markdown

In [None]:
HF_TOKEN = getpass()
os.environ['HUGGINGFACEHUB_API_TOKEN'] = HF_TOKEN
MODEL_NAME = "HuggingFaceH4/zephyr-7b-beta"

 ········


In [4]:
docs = SimpleDirectoryReader("data").load_data()

  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


In [5]:
embed_model = FastEmbedEmbedding()

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/66.5M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

In [6]:
llm = HuggingFaceInferenceAPI(model_name=MODEL_NAME, token=HF_TOKEN)

In [7]:
Settings.llm = llm
Settings.embed_model = embed_model

# Index your documents

In [8]:
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents=docs, storage_context=storage_context, embed_model=embed_model)

# Load index

In [9]:
mydb = chromadb.PersistentClient("./chroma_db")
chroma_collection = mydb.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, embed_model=embed_model)

In [10]:
query_engine = index.as_query_engine()
resp = query_engine.query("What are the types of football data normally avialable?")

In [11]:
resp.response

"\n\nThe context information provided mentions the availability of football data for tracking and analyzing football matches and players. However, the specific types of football data are not explicitly mentioned. Based on the context, it can be inferred that the football data may include statistics such as the number of goals scored, assists, passes, shots on target, possession, and fouls committed by individual players and teams during a match. It may also include information about the location and timing of these events, as well as details about the teams' formations, lineups, and tactics."

# Hybrid Search
text retrieval + vector similarity search

In [12]:
query_engine = index.as_query_engine(similarity_top_k=2, sparse_top_k=12, vector_store_query_mode="hybrid")

In [13]:
resp = query_engine.query("Which player is closely similar to Messi and why?")

display(Markdown(str(resp)))



According to the results presented in the text, Paulo Dybala is closely similar to Messi in terms of their playing style. This claim is supported by the fact that Dybala was ranked as the fourth most similar player to Messi based on the Player2Vec vector analysis. This analysis compares the playing styles of all the players in the dataset, and Dybala's style was found to be most similar to Messi's.

In [14]:
resp = query_engine.query("Who helped in Liverpool's success and how?")

display(Markdown(str(resp)))



Fabinho played a crucial role in Liverpool's success during the 2018/19 season. He joined Liverpool in 2018 after the departure of Emre Can, and since then, Liverpool was unbeaten in 21 games of the first 2 ½ matches played by Fabinho. His performance as a holding midfielder was exceptional, and he was considered the best holding midfielder in the English Premier League for that season. Fabinho's presence in the team helped Liverpool maintain their high-pressing, counter-attacking style of play, which was a crucial factor in their success.

In [15]:
resp = query_engine.query("How can player2vec vector help in analysis?")

display(Markdown(str(resp)))



The player2vec vector can help in analysis by capturing both playing styles of teams and players, but it has a downside in choosing the optimum grid resolution. The vector is formed by overlaying a grid over the football field and counting the number of actions performed in each grid cell. However, a coarse grid may miss key variations between locations, and a finer-grained grid substantially increases data sparsity because then fewer actions happen in a single grid cell. The high-dimensional nature of the vector prevents immediate interpretation, making it challenging to separate certain traits from the generated vector. While player2vec alone may not detect a player's shot actions, it can still help in analysis by capturing both playing styles of teams and players, but more data and research are required to statistically prove its robustness. The vector's length of 32 dimensions makes it difficult for straightforward observation or comparison, necessitating the intervention of sports scientists to link the outcomes of real performance and the practical influence on players' play styles.

In [16]:
resp = query_engine.query("The context you have right now is my Masters thesis. I am applying to a university for a PHD in sports analytics in football. Using the context, give an abstract of the thesis")

display(Markdown(str(resp)))



This thesis explores the application of reinforcement learning algorithms in football team selection and player replacement strategies. The study aims to address the challenges of team selection and player replacement in football, which are complex and multifaceted problems. The proposed approach involves the development of a decision-making framework that utilizes historical match data and machine learning techniques to predict the expected outcome of a match based on the selected team and the replacement of injured or suspended players. The framework is evaluated using real-world match data from the English Premier League, and the results demonstrate the potential of reinforcement learning algorithms in improving team selection and player replacement strategies in football. The study also highlights the importance of contextual information, such as team form, player fitness, and opposition analysis, in decision-making processes. Overall, this thesis contributes to the growing body of research in sports analytics and provides insights into the application of reinforcement learning algorithms in football team selection and player replacement strategies.