# RAG system based on football articles

In [1]:
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
OPEN_AI_API_KEY = os.getenv("OPEN_AI_API_KEY")
OPEN_AI_MODEL = os.getenv("OPEN_AI_MODEL")

# instantiate the openAI client
client = OpenAI(api_key=OPEN_AI_API_KEY)

## Creation of the knowledge base
We first need to upload our file, specifying its purpose. The file can then be used across various endpoints.

Once the file is uploaded, it should be added to a [Vector Store](https://platform.openai.com/docs/api-reference/vector-stores), where it gets broken into smaller chunks. The chunked-up file consitutes the knowledge base of our RAG system.

In [2]:
def upload_file_to_vector_store(path):
    # upload a file with an "assistants" purpose
    file = client.files.create(file=open(path, "rb"), purpose="assistants")
    # create a vector store
    vector_store = client.vector_stores.create(
        name="Some Fake Football Articles"
    )
    # add the file to the vector store just created
    client.vector_stores.files.create(
        vector_store_id=vector_store.id,
        file_id=file.id
    )

    # return the vector store (we'll need the ID)
    return vector_store

vector_store = upload_file_to_vector_store("data/fake_football_articles.json")

## Build a very simple Q&A flow

Once our knowledge base is created, we can start interacting with our RAG, using the [Responses API](https://platform.openai.com/docs/api-reference/responses).

The function `generate_response()` receives a prompt from the user as a parameter, and returns an answer using the [model](https://platform.openai.com/docs/models) specified in the `.env` file based on the knowledge base just created.

It does so by using `OpenAI.responses.create()` ([official docs](https://platform.openai.com/docs/api-reference/responses/create)). This function can be extended to:
- set how many chunks are used for the response with the `max_num_results` in the `file_search` tool. Setting it to a lower number may prevent hallucinations.
- add support for web search by specifying a new tool type called `web_search_preview`.
- specify additional output data to include in the model response with the `include` parameter. For example, `include=["output[*].file_search_call.search_results"]` will show what chunks were used to generate the answer and what confidence score was given to a particular chunk.

In [3]:
def generate_response(prompt):
    response = client.responses.create(
        model=OPEN_AI_MODEL,
        input=prompt,
        instructions="You need to perform a file search of the provided document, and summarise all the articles you can find about the player provided by the user, if there's any (otherwise, simply say 'No articles on the provided player'). Focus on strengths and weaknesses, as well as the evolution of the judgement over time in case of multiple articles.",
        tools=[
            {
                "type": "file_search",
                "vector_store_ids": [vector_store.id], # vector store containing our file
            },
        ]
    )
    # create the string message
    new_message = ' '.join([r.content[0].text for r in response.output if r.type == 'message'])

    return new_message

In [4]:
response = generate_response("Is Thierry Doumbia a good player?")

In [5]:
print(response)

### Evaluation of Thierry Doumbia

**Strengths:**
1. **Acceleration:** Doumbia has remarkable speed, making him a significant threat to defenders. His acceleration is highlighted as a key aspect of his game, which allows him to create opportunities and exploit weaknesses in defensive setups.
   
2. **Potential for Development:** There is interest from higher leagues, suggesting that clubs recognize his raw pace as an asset, especially in more competitive environments like the Zandora League.

**Weaknesses:**
1. **Technical Limitations:** Despite his athleticism, Doumbia lacks technical finesse, which results in turnovers when he is pressured in tight spaces. This deficiency may limit his effectiveness, especially when facing more organized defenses.

2. **Lack of Tactical Discipline:** As Doumbia is considered for moves to larger clubs, some scouts express concerns about his tactical awareness, indicating that he may need to improve in that area to succeed at a higher level.

### Overa

In [8]:
response = generate_response("What aspects of the game should Marco Bianchi work on?")

In [9]:
print(response)

Marco Bianchi has been the subject of several articles that highlight both his strengths and areas for improvement:

1. **Strengths**:
   - **Passing Accuracy and Tempo Control**: Bianchi is praised for his passing accuracy and his ability to dictate the tempo of games, making him a pivotal player for his team, Vercelli United. His style has even drawn comparisons to classic deep-lying playmakers.
   - **Vision**: His vision on the field is regarded as impressive, allowing him to create opportunities and manage plays effectively.

2. **Weaknesses**:
   - **Defensive Work Rate**: Critics have raised concerns about Bianchi's defensive contributions. His work rate in defense is often questioned, as opponents have been able to exploit the spaces he leaves behind.
   - **Athleticism**: Some analysts feel that Bianchi lacks the necessary athleticism to compete at the highest levels of football. This concern is particularly pertinent with suggestions that a move abroad could expose these weak