# MLB Editorial Crew: Mixture of Agents with Crew AI and Groq

In this notebook, we will showcase the concept of [Mixture of Agents (MoA)](https://arxiv.org/pdf/2406.04692) using [Crew AI](https://www.crewai.com/) and [Groq API](https://console.groq.com/playground). The Mixture of Agents approach involves leveraging multiple AI agents, each equipped with different language models, to collaboratively complete a task. By combining the strengths and perspectives of various models, we can achieve a more robust and nuanced result. In our project, multiple MLB Writer agents, each utilizing a different language model (`llama3-8b-8192`, `gemma2-9b-it`, and `mixtral-8x7b-32768`), will independently generate game recap articles based on game data collected from other Crew AI agents. These diverse outputs will then be aggregated by an MLB Editor agent, which will synthesize the best elements from each article to create a final, polished game recap. This process not only demonstrates the power of collaborative AI but also highlights the effectiveness of integrating multiple models to enhance the quality of the generated content.

### Setup

In [1]:
# Import packages
import os
import statsapi
import datetime
from datetime import date, timedelta, datetime
import pandas as pd
import numpy as np
from crewai_tools import tool
from crewai import Agent, Task, Crew, Process
from langchain_groq import ChatGroq

We will configure multiple LLMs using [Langchain Groq](https://python.langchain.com/v0.2/docs/integrations/chat/groq/), each requiring a Groq API Key for access which you can create [here](https://console.groq.com/keys). These models include different versions of Meta's LLaMA 3 and other specialized models like Google's Gemma 2 and Mixtral. Each model will be used by different agents to generate diverse outputs for the MLB game recap.

In [3]:
llm_llama70b=ChatGroq(model_name="llama3-70b-8192")
llm_llama8b=ChatGroq(model_name="llama3-8b-8192")
llm_gemma2=ChatGroq(model_name="gemma2-9b-it")
llm_mixtral = ChatGroq(model_name="mixtral-8x7b-32768")

### Define Tools

First, we will define specialized tools to equip some of the agents with to assist in gathering and processing MLB game data. These tools are designed to fetch game information and player boxscores via the [MLB-Stats API](https://github.com/toddrob99/MLB-StatsAPI). By providing these tools to our agents, they can call them with relevant information provided by the user prompt and infuse our MLB game recaps with accurate, up-to-date external information.

- **get_game_info**: Fetches high-level information about an MLB game, including teams, scores, and key players.
- **get_batting_stats**: Retrieves detailed player batting statistics for a specified MLB game.
- **get_pitching_stats**: Retrieves detailed player pitching statistics for a specified MLB game.

For more information on tool use/function calling with Groq, check out this [Function Calling 101 cookbook post](https://github.com/groq/groq-api-cookbook/blob/main/tutorials/function-calling-101-ecommerce/Function-Calling-101-Ecommerce.ipynb).

In [4]:
@tool
def get_game_info(game_date: str, team_name: str) -> str:
    """Gets high-level information on an MLB game.
    
    Params:
    game_date: The date of the game of interest, in the form "yyyy-mm-dd". 
    team_name: MLB team name. Both full name (e.g. "New York Yankees") or nickname ("Yankees") are valid. If multiple teams are mentioned, use the first one
    """
    sched = statsapi.schedule(start_date=game_date,end_date=game_date)
    sched_df = pd.DataFrame(sched)
    game_info_df = sched_df[sched_df['summary'].str.contains(team_name, case=False, na=False)]

    game_id = str(game_info_df.game_id.tolist()[0])
    home_team = game_info_df.home_name.tolist()[0]
    home_score = game_info_df.home_score.tolist()[0]
    away_team = game_info_df.away_name.tolist()[0]
    away_score = game_info_df.away_score.tolist()[0]
    winning_team = game_info_df.winning_team.tolist()[0]
    series_status = game_info_df.series_status.tolist()[0]

    game_info = '''
        Game ID: {game_id}
        Home Team: {home_team}
        Home Score: {home_score}
        Away Team: {away_team}
        Away Score: {away_score}
        Winning Team: {winning_team}
        Series Status: {series_status}
    '''.format(game_id = game_id, home_team = home_team, home_score = home_score, 
               away_team = away_team, away_score = away_score, \
                series_status = series_status, winning_team = winning_team)

    return game_info


@tool 
def get_batting_stats(game_id: str) -> str:
    """Gets player boxscore batting stats for a particular MLB game
    
    Params:
    game_id: The 6-digit ID of the game
    """
    boxscores=statsapi.boxscore_data(game_id)
    player_info_df = pd.DataFrame(boxscores['playerInfo']).T.reset_index()

    away_batters_box = pd.DataFrame(boxscores['awayBatters']).iloc[1:]
    away_batters_box['team_name'] = boxscores['teamInfo']['away']['teamName']

    home_batters_box = pd.DataFrame(boxscores['homeBatters']).iloc[1:]
    home_batters_box['team_name'] = boxscores['teamInfo']['home']['teamName']

    batters_box_df = pd.concat([away_batters_box, home_batters_box]).merge(player_info_df, left_on = 'name', right_on = 'boxscoreName')
    return str(batters_box_df[['team_name','fullName','position','ab','r','h','hr','rbi','bb','sb']])


@tool 
def get_pitching_stats(game_id: str) -> str:
    """Gets player boxscore pitching stats for a particular MLB game
    
    Params:
    game_id: The 6-digit ID of the game
    """
    boxscores=statsapi.boxscore_data(game_id)
    player_info_df = pd.DataFrame(boxscores['playerInfo']).T.reset_index()

    away_pitchers_box = pd.DataFrame(boxscores['awayPitchers']).iloc[1:]
    away_pitchers_box['team_name'] = boxscores['teamInfo']['away']['teamName']

    home_pitchers_box = pd.DataFrame(boxscores['homePitchers']).iloc[1:]
    home_pitchers_box['team_name'] = boxscores['teamInfo']['home']['teamName']

    pitchers_box_df = pd.concat([away_pitchers_box,home_pitchers_box]).merge(player_info_df, left_on = 'name', right_on = 'boxscoreName')
    return str(pitchers_box_df[['team_name','fullName','ip','h','r','er','bb','k','note']])


### Define Agents

In CrewAI, agents are autonomous entities designed to perform specific roles and achieve particular goals. Each agent uses a language model (LLM) and may have specialized tools to help execute tasks. Here are the agents in this demonstration:

- **MLB Researcher**: Uses the `get_game_info` tool to gather high-level game information.
- **MLB Statistician**: Retrieves player batting and pitching boxscore stats using the `get_batting_stats` and `get_pitching_stats` tools.
- **MLB Writers**: Three agents, each using different LLMs (LLaMA-8b, Gemma-9b, Mixtral-8x7b), to write game recap articles.
- **MLB Editor**: Edits the articles from the writers to create the final game recap.

#### Mixture of Agents

In this demo, although the MLB Researcher and MLB Statistician agents use tool calling to gather data for the output, our Mixture of Agents framework consists of the three MLB Writer agents and the MLB Editor. This makes our MoA architecture a simple 2 layer design, but more complex architectures are possible to improve the output even more:

![Alt text](mixture_of_agents_diagram.png)

In [5]:
mlb_researcher = Agent(
    llm=llm_llama70b,
    role="MLB Researcher",
    goal="Identify and return info for the MLB game related to the user prompt by returning the exact results of the get_game_info tool",
    backstory="An MLB researcher that identifies games for statisticians to analyze stats from",
    tools=[get_game_info],
    verbose=True,
    allow_delegation=False
)

mlb_statistician = Agent(
    llm=llm_llama70b,
    role="MLB Statistician",
    goal="Retrieve player batting and pitching stats for the game identified by the MLB Researcher",
    backstory="An MLB Statistician analyzing player boxscore stats for the relevant game",
    tools=[get_batting_stats, get_pitching_stats],
    verbose=True,
    allow_delegation=False
)

mlb_writer_llama = Agent(
    llm=llm_llama8b,
    role="MLB Writer",
    goal="Write a detailed game recap article using the provided game information and stats",
    backstory="An experienced and honest writer who does not make things up",
    tools=[],  # The writer does not need additional tools
    verbose=True,
    allow_delegation=False
)

mlb_writer_gemma = Agent(
    llm=llm_gemma2,
    role="MLB Writer",
    goal="Write a detailed game recap article using the provided game information and stats",
    backstory="An experienced and honest writer who does not make things up",
    tools=[],  # The writer does not need additional tools
    verbose=True,
    allow_delegation=False
)

mlb_writer_mixtral = Agent(
    llm=llm_mixtral,
    role="MLB Writer",
    goal="Write a detailed game recap article using the provided game information and stats",
    backstory="An experienced and honest writer who does not make things up",
    tools=[],  # The writer does not need additional tools
    verbose=True,
    allow_delegation=False
)

mlb_editor = Agent(
    llm=llm_llama70b,
    role="MLB Editor",
    goal="Edit multiple game recap articles to create the best final product.",
    backstory="An experienced editor that excels at taking the best parts of multiple texts to create the best final product",
    tools=[],  # The writer does not need additional tools
    verbose=True,
    allow_delegation=False
)

### Define Tasks

Tasks in Crew AI are specific assignments given to agents, detailing the actions they need to perform to achieve a particular goal. Tasks can have dependencies and context. Here, we define several tasks that our agents will execute to generate and refine an MLB game recap.

- **collect_game_info**: This task gathers high-level information about an MLB game based on the user prompt. It is handled by the MLB Researcher agent and uses the `get_game_info` tool. This task must be completed before any other tasks can begin.

- **retrieve_batting_stats**: This task retrieves detailed player batting stats for the identified game. It depends on the completion of collect_game_info. This task runs in parallel with `retrieve_pitching_stats` and is handled by the MLB Statistician agent.

- **retrieve_pitching_stats**: This task retrieves detailed player pitching stats for the identified game. It also depends on the completion of `collect_game_info`. This task runs in parallel with `retrieve_batting_stats` and is handled by the MLB Statistician agent.

- **write_game_recap**: Three separate tasks handled by different MLB Writer agents (using LLaMA-8b, Gemma-9b, Mixtral-8x7b). Each writer generates a game recap article using the data collected. These tasks depend on the completion of `collect_game_info`, `retrieve_batting_stats`, and `retrieve_pitching_stats`. 

- **edit_game_recap**: This task involves synthesizing the best parts of the three game recap articles into a final, polished article. It is handled by the MLB Editor agent and depends on the completion of all previous tasks.

*Note: As of the latest CrewAI update on 7/29/24, `async_execution=True` no longer works with this Crew workflow*

In [6]:
collect_game_info = Task(
    description='''
    Identify the correct game related to the user prompt and return game info using the get_game_info tool. 
    Unless a specific date is provided in the user prompt, use {default_date} as the game date
    User prompt: {user_prompt}
    ''',
    expected_output='High-level information of the relevant MLB game',
    agent=mlb_researcher
)

retrieve_batting_stats = Task(
    description='Retrieve ONLY boxscore batting stats for the relevant MLB game',
    expected_output='A table of batting boxscore stats',
    agent=mlb_statistician,
    dependencies=[collect_game_info],
    context=[collect_game_info]
)

retrieve_pitching_stats = Task(
    description='Retrieve ONLY boxscore pitching stats for the relevant MLB game',
    expected_output='A table of pitching boxscore stats',
    agent=mlb_statistician,
    dependencies=[collect_game_info],
    context=[collect_game_info]
)

write_game_recap_llama = Task(
    description='''
    Write a game recap article using the provided game information and stats.
    Key instructions:
    - Include things like final score, top performers and winning/losing pitcher.
    - Use ONLY the provided data and DO NOT make up any information, such as specific innings when events occurred, that isn't explicitly from the provided input.
    - Do not print the box score
    ''',
    expected_output='An MLB game recap article',
    agent=mlb_writer_llama,
    dependencies=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats],
    context=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats]
)

write_game_recap_gemma = Task(
    description='''
    Write a game recap article using the provided game information and stats.
    Key instructions:
    - Include things like final score, top performers and winning/losing pitcher.
    - Use ONLY the provided data and DO NOT make up any information, such as specific innings when events occurred, that isn't explicitly from the provided input.
    - Do not print the box score
    ''',
    expected_output='An MLB game recap article',
    agent=mlb_writer_gemma,
    dependencies=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats],
    context=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats]
)

write_game_recap_mixtral = Task(
    description='''
    Write a succinct game recap article using the provided game information and stats.
    Key instructions:
    - Structure with the following sections:
          - Introduction (game result, winning/losing pitchers, top performer on the winning team)
          - Other key performers on the winning team
          - Key performers on the losing team
          - Conclusion (including series result)
    - Use ONLY the provided data and DO NOT make up any information, such as specific innings when events occurred, that isn't explicitly from the provided input.
    - Do not print the box score or write out the section names
    ''',
    expected_output='An MLB game recap article',
    agent=mlb_writer_mixtral,
    dependencies=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats],
    context=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats]
)

edit_game_recap = Task(
    description='''
    You will be provided three game recap articles from multiple writers. Take the best of
    all three to output the optimal final article.
    
    Pay close attention to the original instructions:

    Key instructions:
        - Structure with the following sections:
          - Introduction (game result, winning/losing pitchers, top performer on the winning team)
          - Other key performers on the winning team
          - Key performers on the losing team
          - Conclusion (including series result)
        - Use ONLY the provided data and DO NOT make up any information, such as specific innings when events occurred, that isn't explicitly from the provided input.
        - Do not print the box score or write out the section names

    It is especially important that no false information, such as any inning or the inning in which an event occured, 
    is present in the final product. If a piece of information is present in one article and not the others, it is probably false
    ''',
    expected_output='An MLB game recap article',
    agent=mlb_editor,
    dependencies=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats],
    context=[collect_game_info, retrieve_batting_stats, retrieve_pitching_stats]
)

### Define and Execute Crew

In this final step, we define and execute our Crew, which consists of the agents and tasks we've set up. The Crew class orchestrates the workflow, ensuring that tasks are completed in the correct order and that dependencies are respected. By running the kickoff method, we provide user input to initiate the process, and the Crew handles the rest, from data collection to generating and editing the final game recap.

In [7]:
crew = Crew(
    agents=[mlb_researcher, mlb_statistician, mlb_writer_llama, mlb_writer_gemma, mlb_writer_mixtral, mlb_editor],
    tasks=[
        collect_game_info, 
        retrieve_batting_stats, retrieve_pitching_stats,
        write_game_recap_llama, write_game_recap_gemma, write_game_recap_mixtral,
        edit_game_recap
        ],
    verbose=False
)

In [8]:
user_prompt = 'Write a recap of the Yankees game on July 14, 2024'
default_date = datetime.now().date() - timedelta(1) # Set default date to yesterday in case no date is specified

result = crew.kickoff(inputs={"user_prompt": user_prompt, "default_date": str(default_date)})



[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mThought: I need to identify the correct game related to the user prompt and return game info using the get_game_info tool. The user prompt specifies a date, July 14, 2024, so I will use that as the game date.

Action: get_game_info
Action Input: {"game_date": "2024-07-14", "team_name": "Yankees"}[0m[95m 


        Game ID: 747009
        Home Team: Baltimore Orioles
        Home Score: 6
        Away Team: New York Yankees
        Away Score: 5
        Winning Team: Baltimore Orioles
        Series Status: NYY wins 2-1
    
[00m
[32;1m[1;3mThought: I now know the final answer
Final Answer: 
        Game ID: 747009
        Home Team: Baltimore Orioles
        Home Score: 6
        Away Team: New York Yankees
        Away Score: 5
        Winning Team: Baltimore Orioles
        Series Status: NYY wins 2-1[0m

[1m> Finished chain.[0m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI understand the ta

As you can see from the output, each of our MLB Writer agents made various mistakes in their game recaps: hallucinating the sequence of events (like the specific innings in which events occured, which it couldn't infer from the data provided), incorrect data analysis (Dean Kremer did not pick up the win), or poor formatting (i.e. not including the series result in the conclusion. However, by leveraging the results of all three articles our editing agent solves these issues and delivers a factual recap that follows all of our instructions.

In [9]:
print(result)

The Baltimore Orioles edged the New York Yankees 6-5 to take the final game of the three-game series. The Orioles' winning pitcher was Craig Kimbrel, who earned his sixth win of the season despite blowing the save. The losing pitcher was Clay Holmes, who suffered his fourth loss of the season and also blew a save opportunity. The Orioles' top performer was Gunnar Henderson, who hit a home run and drove in two runs.

Gunnar Henderson's teammate, Anthony Santander, also had a strong game, hitting a home run and driving in a run. Cedric Mullins added two RBIs for the Orioles. On the Yankees' side, Trent Grisham had a standout game, hitting a home run and driving in two runs. Ben Rice also drove in three runs for the Yankees.

Despite the loss, the Yankees took the series 2-1.
