<a href="https://colab.research.google.com/github/rreimche/genai-exam/blob/main/SLRev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prerequisites:
- Google Colab
- Access to a Google Colab Secret named "HF_API_TOKEN" containing a Huggingface API token with read access.

In [30]:
#@title Settings
from google.colab import userdata

download_dir = "papers"  # directory name to download papers to revie to
huggingface_apikey = userdata.get('HF_API_TOKEN')  # use colab secrets to store the huggingface api key
groq_key = userdata.get('GROQ_KEY')  #  use colab secrets to store the huggingface api key


In [31]:
#@title Declare bibtex references

bibtext_references = """

@misc{rasheed2024codeporilargescaleautonomoussoftware,
      title={CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology},
      author={Zeeshan Rasheed and Malik Abdul Sami and Kai-Kristian Kemell and Muhammad Waseem and Mika Saari and Kari Systä and Pekka Abrahamsson},
      year={2024},
      eprint={2402.01411},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2402.01411},
},
@misc{hong2024metagptmetaprogrammingmultiagent,
      title={MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework},
      author={Sirui Hong and Mingchen Zhuge and Jiaqi Chen and Xiawu Zheng and Yuheng Cheng and Ceyao Zhang and Jinlin Wang and Zili Wang and Steven Ka Shing Yau and Zijuan Lin and Liyang Zhou and Chenyu Ran and Lingfeng Xiao and Chenglin Wu and Jürgen Schmidhuber},
      year={2024},
      eprint={2308.00352},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2308.00352},
},


@misc{qian2024chatdevcommunicativeagentssoftware,
      title={ChatDev: Communicative Agents for Software Development},
      author={Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun},
      year={2024},
      eprint={2307.07924},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2307.07924},
},

@misc{zhang2024empoweringagilebasedgenerativesoftware,
      title={Empowering Agile-Based Generative Software Development through Human-AI Teamwork},
      author={Sai Zhang and Zhenchang Xing and Ronghui Guo and Fangzhou Xu and Lei Chen and Zhaoyuan Zhang and Xiaowang Zhang and Zhiyong Feng and Zhiqiang Zhuang},
      year={2024},
      eprint={2407.15568},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2407.15568},
},

@misc{nguyen2024agilecoderdynamiccollaborativeagents,
      title={AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology},
      author={Minh Huynh Nguyen and Thang Phan Chau and Phong X. Nguyen and Nghi D. Q. Bui},
      year={2024},
      eprint={2406.11912},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2406.11912},
},


"""

In [32]:
#@title Install dependencies
!pip install -q autogen arxiv scholarly crossrefapi beautifulsoup4 requests cloudscraper pymupdf nltk autogen-agentchat autogen-ext[openai] groq pymupdf;
!pip install -q --pre bibtexparser;

In [74]:
#@title Preparation for agents: model client connections
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core.tools import FunctionTool
from autogen_core.models import UserMessage
#from autogen_agentchat.ui import Console
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient

# INSTANTIATE MODEL CLIENTS

# This client will be used for paper summarization
text_model_client = OpenAIChatCompletionClient(
    #model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
    model="mistralai/Mistral-7B-Instruct-v0.3",
    base_url="https://router.huggingface.co/hf-inference/v1",
    api_key=huggingface_apikey,
    model_info={
        "vision": False,
        "function_calling": False,
        "json_output": False,
        "family": "unknown",
    },
)

# We need another inference point for downloader agent,
# because huggingface can't serve reflection on tool use
# so that autogen agent understands it
tool_model_client = OpenAIChatCompletionClient(
    model="llama3-70b-8192",
    base_url="https://api.groq.com/openai/v1",
    api_key=groq_key,
    model_info={
        "vision": False,
        "function_calling": True,
        "json_output": False,
        "family": "unknown",
    },
)


# CHECK CONNECTION

messages = [
    UserMessage(content="What is the capital of France?", source="user"),
]
response = await text_model_client.create(messages=messages)
response_downloader = await tool_model_client.create(messages=messages)

print(response.content)
print(response_downloader.content)

The capital of France is Paris. It's an iconic city known for landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Paris is located in the northern part of the country. It's not only a major cultural hub but also a significant center of economy, education, and politics in France and Europe.
The capital of France is Paris.


In [34]:
#@title Downloader agent
import arxiv
import os
import datetime
import json
import sys

# HELPER AND TOOL FUNCTIONS

def maybe_create_dir(donwload_dir: str) -> None:
    """
        A helper function to create the downloads directory
        in colab root if the directory is not yet present.
    """

    try:
        os.makedirs(donwload_dir, exist_ok=True)
        #print(f"Directory '{download_dir}' created successfully.")
    except OSError as e:
        raise Exception(f"Error creating directory '{download_dir}': {e}")

def download_paper(eprint_id: str) -> str:
    """
        Receives a eprint_id of a paper from arxiv, downloads the specified paper
        and saves it in downloads directory.
        The filepath of the downloaded paper is in the resulting dictionary
        with the key "filepath".

        :param eprint_id: string, the arxiv eprint_id.
        :return: string with filepath of the downloaded paper.
    """

    assert len(eprint_id) > 0, "eprint_id cannot be empty"

    result = {
        "eprint_id": eprint_id,
        "success": False,
        "filepath": None,
        }


    if eprint_id:
        #print(f"{datetime.datetime.now()} - Trying arXiv for: {eprint_id}")
        try:
            client = arxiv.Client()
            search = arxiv.Search(id_list=[eprint_id], max_results=1)

            #arxiv_result = next(search.results(), None)
            arxiv_result = next(client.results(search))
            if arxiv_result:
                #print(f"  Found on arXiv: {arxiv_result.title}")
                maybe_create_dir(download_dir)
                filepath = os.path.join(download_dir, f"{arxiv_result.title.replace(' ', '_')}.pdf")
                arxiv_result.download_pdf(filename=filepath)
                result["success"] = True
                result["filepath"] = filepath
                #print(f"  Download successful: {filepath}")
            else:
                # print(f"  No arXiv result found for ID {eprint_id}.")
                raise Exception(f"ERROR: No arXiv result found for ID {eprint_id}.")
                #return f"ERROR: No arXiv result found for ID {eprint_id}."

        except Exception as e:
            #print(f"arXiv download failed for: {eprint_id}: {e}")
            raise Exception(f"ERROR: arXiv download failed for: {eprint_id}: {e}")
            #return f"ERROR: arXiv download failed for: {eprint_id}: {e}"

    # Check if the download was NOT successful
    if not result["success"]:
        #print(f"{datetime.datetime.now()} - Download failed or not attempted for: {eprint_id}")
        raise Exception(f"ERROR: Download failed or not attempted for: {eprint_id}")
        #return f"ERROR: Download failed or not attempted for: {eprint_id}"

    return result["filepath"]


# DOWNLOADER AGENT WITH TOOL

downloader_tool = FunctionTool(download_paper, description="Given a string denoting an eprint_id of a page on arxiv, downloads a paper")

downloader_agent = AssistantAgent(
    name="downloader",
    model_client=downloader_model_client,
    tools=[downloader_tool],
    reflect_on_tool_use=True,
    system_message="""
        Use downloader tool to download specified paper(s) from arxiv.
        Here is an example of how you must respond in success case:
          'mypapers/veryinterstingpaper.pdf'
        If download was not successful, respond like this:
          'ERROR: Download for paper with eprint id XXX failed for the following reason: REASON'.
      """,
)


# CHECK IF DOWNLOAD WORKS

async def downloader_run() -> None:
    response = await downloader_agent.on_messages(
        [TextMessage(content="Download the paper with eprint_id 2402.01411", source="user")],
        cancellation_token=CancellationToken(),
    )


    #print(response.inner_messages)
    #print(response.chat_message)
    print(response.chat_message.content)



# Use asyncio.run(assistant_run()) when running in a script.
await downloader_run()

The paper with eprint_id 2402.01411 has been successfully downloaded. The paper is titled "CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology" and has been saved as a PDF file in the "papers" directory.

Filename: CodePori:_Large-Scale_System_for_Autonomous_Software_Development_Using_Multi-Agent_Technology.pdf


In [57]:
#@title Parser agent
import pymupdf
#import nltk
#from nltk.tokenize import word_tokenize

#nltk.download('punkt_tab')

def parse_pdf(filepath: str) -> str:
    try:
        with pymupdf.open(filepath) as doc:
            text = ""
            for page in doc:
                text += page.get_text()
            # print(f"Successfully parsed: {title} (arXiv ID: {eprint_id})")
            return text

    except Exception as e:
        # print(f"Error parsing {title} ({filepath}): {e}")
        raise Exception(f"ERROR: Error parsing {filepath}: {e}")


parser_tool = FunctionTool(
    parse_pdf,
    description="A tool that takes a filepath of a pdf file, parses the file and returns the text"
  )

parser_agent = AssistantAgent(
    name="parser",
    model_client=tool_model_client,
    system_message="""
        For a given filepath of a scientific paper,
        you use the provided tool, which parses the paper and returns text.
        Then you return the text in your response.
      """,
    tools=[parser_tool]
)

# CHECK IF PARSING WORKS

test_paper_text = ""  # we'll need this to test the summarizer
async def parser_run() -> None:
    response = await parser_agent.on_messages(
        [TextMessage(content="Parse this paper: papers/CodePori:_Large-Scale_System_for_Autonomous_Software_Development_Using_Multi-Agent_Technology.pdf", source="user")],
        cancellation_token=CancellationToken(),
    )


    #print(response.inner_messages)
    #print(response.chat_message)
    text = response.chat_message.content
    #print(text)
    return text



# Use asyncio.run(assistant_run()) when running in a script.
test_paper_text = await parser_run()
print(test_paper_text)

CodePori: Large-Scale System for Autonomous Software
Development Using Multi-Agent Technology
Zeeshan Rasheedb, Abdul Malik Samib,, Kai-Kristian Kemellb, Muhammad Waseemb, Mika
Saarib, Kari Syst¨ab, Pekka Abrahamssonb
aFaculty of Information Technology and Communication Science, Tampere University
bFaculty of Information Technology, Jyv¨askyl¨a University
Abstract
Context: Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs)
have transformed the field of Software Engineering (SE). Existing LLM-based multi-agent mod-
els have successfully addressed basic dialogue tasks. However, the potential of LLMs for more
challenging tasks, such as automated code generation for large and complex projects, has been
investigated in only a few existing works.
Objective: This paper aims to investigate the potential of LLM-based agents in the software
industry, particularly in enhancing productivity and reducing time-to-market for complex soft-
ware solutions. Our primary objective

In [78]:
#@title Summarizer agent


summarizer_agent = AssistantAgent(
    name="summarizer",
    model_client=text_model_client,
    system_message="""
        You are a qualified and experienced reviewer and critic of scientific papers.
        You will be given a paper text and you will summarize it as your response.
        You must fit the summarization in 300 words.
        It is very important to depict the key contributions of the paper.
    """
)

# CHECK IF SUMMARIZING WORKS

async def summarizer_run(paper_text: str) -> None:
    response = await summarizer_agent.on_messages(
        [TextMessage(content=paper_text, source="user")],
        cancellation_token=CancellationToken(),
    )

    #print(paper_text)
    #print(response.inner_messages)
    print(response.chat_message.models_usage)
    print(response.chat_message.content)



# Use asyncio.run(assistant_run()) when running in a script.
await summarizer_run(test_paper_text)


RequestUsage(prompt_tokens=23933, completion_tokens=243)
The paper "CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology" presents a novel system designed to automate code generation for large and complex software projects using large language models (LLMs). The system utilizes six LLM-based multi-agents, each responsible for a specific task in the software development process, such as system design, code development, code review, code verification, and test engineering. The paper discusses the potential benefits of such a system in enhancing productivity and reducing time-to-market for complex software solutions.

The proposed system, CodePori, was evaluated using the HumanEval benchmark and manually tested. The results show that CodePori improved code accuracy and efficiency by 89% and 85%, respectively, compared to existing systems. The paper also discusses the challenges faced by current systems in handling complex tasks and the potential be