<a href="https://colab.research.google.com/github/hellomomiji/info7374-llm/blob/main/Assignment5_yang_jiang.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Agents are an emerging field thats use reflection, tools, planning, and multi agent collaboration

In this assignment, we will build a research agent. We will use serverless LLM endpoints. To get started, you create an account with Together AI or Anthropic. They should provide you with a few dollars worth of credits that should be enough to complete the assignment. You are free to choose any other provider such as OpenAI, Mistral, Fireworks, or Groq. I encourage you to play around with different models to get a feel for how they work. For this assignment, the API usage cost should be around a couple dollars. Depending on the model you choose and how many attempts you use, it may be a couple cents. For OpenAI, Anthropic, and Mistral, double check what model you are using. The flagship models are significantly more expensive than the smaller models (pricing between models varies by 50x). For the purposes of this assignment, it is sufficient to use the smallest/cheapest models.

TogetherAI, Fireworks, and Groq run open source models. For these, it’s better to run the mid-large tier models. Mixtral is a good place to start. It’s good to play around with different models.

Many providers are able to use OpenAI's client library, but some do not (like Anthropic). Use whatever makes sense.

You can run this on Colab with a CPU, or locally and submit the Jupyter notebook as your submission. Since we are using third party providers for the LLMs, we will not load the model locally. If you run on Colab, take special care to not leak your API key. Here’s an example of how to properly use secrets in Colab.

Research Agent
Build an LLM-based research agent that can take a research topic, find relevant information, and generate a short summary (~1 paragraph) on the given topic.



In [None]:
!pip install together



In [None]:
from google.colab import userdata
from together import Together
import openai

togetherai_api_key = userdata.get('togetherai')

client = Together(api_key=togetherai_api_key)

# client = openai.OpenAI(
#     api_key=togetherai_api_key,
#     base_url="https://api.together.xyz/v1"
# )

In [None]:
# Client wrapper
def get_response(model, sys_prompt, user_prompt):
  response = client.chat.completions.create(
      model=model,
      messages=[
          {
              "role": "system",
              "content": sys_prompt
          },
          {
              "role": "user",
              "content": user_prompt
          }
      ],
      max_tokens=512,
      stop=None,
      temperature=0.7
  )
  return response


## Tools to Implement (20 points, 4 points each):



1. Topic Breakdown Tool: Create a tool that takes a broad research topic and breaks it down into smaller, more focused subtopics or subqueries. You can use an LLM to generate these subtopics based on the main topic.


In [None]:
# Topic Breakdown Tool

model = "mistralai/Mistral-7B-Instruct-v0.3"

topic_breakdown_sys = "You are an research assistant doing a research on AI and Machine Learning."
topic_breakdown_user = "Generate five consice and more focused subtopics for this research topic: {topic}."

def topic_breakdown(topic):
  response = get_response(
      model=model,
      sys_prompt=topic_breakdown_sys,
      user_prompt=f"{topic_breakdown_user.format(topic=topic)}"
  )

  subtopics = response.choices[0].message.content
  return subtopics

In [None]:
# topic_breakdown("LLM")

' 1. Development and analysis of language modeling techniques in LLM: This subtopic will focus on the exploration of various language modeling methods, their applications, and comparative analysis to improve the efficiency and effectiveness of language generation in LLM.\n\n2. Advancements in LLM training strategies: This subtopic will delve into the latest strategies for training large language models, including data selection, preprocessing, and optimization techniques, with a focus on reducing training time and improving model performance.\n\n3. Evaluation and benchmarking of LLM performance: This subtopic will focus on the development and implementation of suitable evaluation metrics and benchmark datasets to assess the performance of large language models in various aspects such as fluency, accuracy, and generalization.\n\n4. Exploring applications of LLM in real-world scenarios: This subtopic will examine the practical applications of large language models in areas like natural l

2. Query Expansion Tool: Develop a tool to expand the subqueries generated by the Topic Breakdown Tool. The tool should generate related keywords, synonyms, and phrases to enhance the search results.



In [None]:
# Query Expansion Tool

query_expansion_sys = "You are an research assistant doing a research on AI and Machine Learning."
query_expansion_user = "Expand each subtopic {subtopics} and generate five keywords, synonyms and phrases following this format: {subtopics}\n [keywords]: []"
def query_expansion(subtopics):
  response = get_response(
      model=model,
      sys_prompt=query_expansion_sys,
      user_prompt=f"{query_expansion_user.format(subtopics=subtopics)}"
  )

  expanded_queries = response.choices[0].message.content
  return expanded_queries

In [None]:
# query_expansion(""" 1. Development and analysis of language modeling techniques in LLM: This subtopic will focus on the exploration of various language modeling methods, their applications, and comparative analysis to improve the efficiency and effectiveness of language generation in LLM.
# 2. Advancements in LLM training strategies: This subtopic will delve into the latest strategies for training large language models, including data selection, preprocessing, and optimization techniques, with a focus on reducing training time and improving model performance.
# 3. Evaluation and benchmarking of LLM performance: This subtopic will focus on the development and implementation of suitable evaluation metrics and benchmark datasets to assess the performance of large language models in various aspects such as fluency, accuracy, and generalization.
# 4. Exploring applications of LLM in real-world scenarios: This subtopic will examine the practical applications of large language models in areas like natural language understanding, text generation, and conversational AI systems, as well as their potential impact on industries such as healthcare, education, and customer service.
# 5. Ethical, social, and legal implications of LLM: This subtopic will address the ethical, social, and legal considerations surrounding the development and deployment of large language models""")

' 1. Language Modeling Methods: Neural Networks, Transformers, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Generative Adversarial Networks (GAN), Transfer Learning, Fine-tuning, BERT, GPT, T5, Language Model Perplexity\n2. LLM Training Strategies: Data Augmentation, Transfer Learning, Data Parallelism, Gradient Checkpointing, Mixed Precision Training, Batch Normalization, Learning Rate Scheduling, Federated Learning, Meta-Learning\n3. Evaluation Metrics for LLM: Perplexity, BLEU, ROUGE, METEOR, BERTScore, Automatic Speech Recognition (ASR), Machine Translation (MT) Evaluation metrics, Human Evaluation\n4. Real-world Applications of LLM: Chatbots, Virtual Assistants, Voice Assistants, Natural Language Understanding, Text Summarization, Sentiment Analysis, Machine Translation, Question Answering, Text Generation for Creative Writing, Personalized Content Generation\n5. Ethical, Social, and Legal Implications of LL'


3. Search Tool: Create a wrapper around the You API or Brave Search API, Serper.dev. Please note that the free tier is 1000 queries/month. Consider creating a mock while developing, and switch to actually call the You API once the agent is more stable. Additionally, consider caching the search results.


In [None]:
cache = {}

In [None]:
# Search Tool
import requests

you_api_key = userdata.get('youapi')

def get_ai_snippets_for_query(query):
    headers = {"X-API-Key": you_api_key}
    params = {"query": query}
    response = requests.get(
        f"https://api.ydc-index.io/search",
        params=params,
        headers=headers,
    ).json()
    cache[query] = response['hits']
    return response['hits']

In [None]:
get_ai_snippets_for_query("1. Language Modeling Methods: Neural Networks, Transformers, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Generative Adversarial Networks (GAN), Transfer Learning, Fine-tuning, BERT, GPT, T5, Language Model Perplexity\n2. LLM Training Strategies: Data Augmentation, Transfer Learning, Data Parallelism, Gradient Checkpointing, Mixed Precision Training, Batch Normalization, Learning Rate Scheduling, Federated Learning, Meta-Learning\n3. Evaluation Metrics for LLM: Perplexity, BLEU, ROUGE, METEOR, BERTScore, Automatic Speech Recognition (ASR), Machine Translation (MT) Evaluation metrics, Human Evaluation\n4. Real-world Applications of LLM: Chatbots, Virtual Assistants, Voice Assistants, Natural Language Understanding, Text Summarization, Sentiment Analysis, Machine Translation, Question Answering, Text Generation for Creative Writing, Personalized Content Generation\n5. Ethical, Social, and Legal Implications of LL")

[{'description': 'A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised ...',
  'snippets': ['As of June 2024, The Instruction fine tuned variant of the Llama 3 70 billion parameter model is the most powerful open LLM according to the LMSYS Chatbot Arena Leaderboard, being more powerful than GPT-3.5 but not as powerful as GPT-4. As of 2024, the largest and most capable models are all based on the Transformer architecture. Some recent implementations are based on other architectures, such as recurrent neural network variants and Mamba (a state space model).',
   'A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by


4. Critique Tool: Create a tool that critiques the summary, and offers suggestions of how to improve and potentially other relevant topics to search for.


In [None]:
# Critique Tool

critqiue_sys = "You are a professor in AI and Machine Learning, who is reviewing a summary of a research paper."
critqiue_user = "Please critique the following summary and offer suggestions of how to improve and potentially other relevant topics to search for: {summary}."

def critique(summary):
  response = get_response(
      model=model,
      sys_prompt=critqiue_sys,
      user_prompt=f"{critqiue_user.format(summary=summary)}"
  )

  critique = response.choices[0].message.content
  return critique


In [None]:
list(cache.values())[0]
critique("""The search results indicate that Large Language Models (LLMs) are a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process. The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large-scale text data. These models are capable of unsupervised training, and are often used for generative AI to produce content based on input prompts in human language. Notable examples include OpenAI's GPT-3 model and LightOn's Paradigm. Perplexity is a common metric used to evaluate the performance of LLMs.""")

' The summary provides a decent overview of Large Language Models (LLMs) and their role in natural language processing. However, there are a few areas where it could be improved and expanded:\n\n1. **Context and Background**: Although the summary briefly mentions the purpose of LLMs, it would be beneficial to provide more context about why LLMs were developed and what challenges they aim to address in the field of NLP.\n\n2. **Training Process**: While the summary mentions that LLMs learn statistical relationships from large amounts of text, it would be helpful to delve deeper into the specifics of the training process, such as the types of data used, the preprocessing steps, and the optimization techniques employed.\n\n3. **Architecture Details**: The summary mentions that LLMs are based on decoder-only transformer architectures, but it does not provide much detail about the architecture itself. You might want to explain the key components of the transformer model and how they contrib


5. Summarizer Tool (optional): Create a tool that takes some input and summarizes its content using an LLM.


In [None]:
# Summarizer Tool
summarizer_sys = "You are a research assistant doing a research on AI and Machine Learning.."
summarizer_user = "Please summarize the search results in one paragraph: {search_results}."

def summarize(search_results):
  response = get_response(
      model=model,
      sys_prompt=summarizer_sys,
      user_prompt=f"{summarizer_user.format(search_results=search_results)}"
  )

  summary = response.choices[0].message.content
  return summary


In [None]:
# summary optimization tool
summary_optim_sys = "You are an research assistant doing a research on AI and Machine Learning."
summary_optim_user = "Improve the summary based on the suggestions. summary: {summary} suggestions: {suggestions}."

def improve_summary(summary, suggestions):
  response = get_response(
      model=model,
      sys_prompt=summary_optim_sys,
      user_prompt=f"{summary_optim_user.format(summary=summary, suggestions=suggestions)}"
  )
  improved_summary = response.choices[0].message.content
  return improved_summary


In [None]:
summarize(list(cache.values())[0])

" As of 2024, Large Language Models (LLMs) are computational models designed for natural language processing tasks such as language generation. These models, such as OpenAI’s GPT-3 and Google's Transformer-based models, acquire their abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process. The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large-scale text data. These models are capable of unsupervised training, and are generative models that can consider billions of parameters and have various applications, including text generation, machine translation, and sentiment analysis. Pre-trained LLMs have achieved striking success in natural language processing (NLP), leading to a paradigm shift from supervised learning. The most powerful open LLM, as of June 2024, is The Instruction fine-tuned

Workflow (30 points)
Implement an agent workflow that uses all of these tools. In the agent workflow, the agent should be provided with all the tools and it should decide which tool to use. For the individual tool implementations, if you use a call to an LLM you do not need to provide any tools.

Sample Agent Workflow:
1. The agent receives a research topic from the user.
2. It uses the Topic Breakdown Tool to generate subtopics or subqueries.
3. The Query Expansion Tool expands the subqueries with related keywords and phrases.
4. The Search Tool uses the expanded queries and subqueries to gather relevant information from various sources.
5. The agent generates the summary incorporating the search results. (optional)
6. The agent critiques the summary, and improves the results. (optional)
7. The agent presents the final summary to the user.

The sample workflow is the minimum implementation requirement. Feel free to add more tools, add loops in the workflow, etc.

In [None]:
def research_agent():
  print("Welcome to your personal research agent!")
  input_topic = input("Please enter a research topic related to AI and machine learning: ")

  subtopics = topic_breakdown(input_topic)
  print("\n 1. Subtopics that are generated by the Topic Breakdown Tool: \n")
  print("====" * 10)
  print(subtopics)
  print("\n")

  expanded_queries = query_expansion(subtopics)
  print("\n 2. Expanded queries that are generated by the Query Expansion Tool: \n")
  print("====" * 10)
  print(expanded_queries)
  print("\n")

  search_results = get_ai_snippets_for_query(expanded_queries)
  print("\n 3. Search results that are generated by the Search Tool: \n")
  print("====" * 10)
  for result in search_results:
    for key, value in result.items():
      print(f"{key}: {value}")
  print("\n")

  summary = summarize(search_results)
  print("\n 4. Summary that is generated by the Summarizer Tool: \n")
  print("====" * 10)
  print(summary)
  print("\n")

  critique_res = critique(summary)
  print("\n 5. Critique that is generated by the Critique Tool: \n")
  print("====" * 10)
  print(critique_res)
  print("\n")

  improve_summary_res = improve_summary(summary, critique_res)
  print("\n 6. Improved summary that is generated by the Summary Optimization Tool: \n")
  print("====" * 10)
  print(improve_summary_res)
  print("\n")

  print("Thank you for using your personal research agent!")

In [None]:
research_agent()

Welcome to your personal research agent!
Please enter a research topic related to AI and machine learning: computer vision

 1. Subtopics that are generated by the Topic Breakdown Tool: 

 1. Deep Learning Algorithms for Object Detection in Computer Vision: Analyzing the effectiveness and efficiency of popular deep learning models such as YOLO, Faster R-CNN, and SSD for real-time object detection and recognition.

2. Computer Vision in Autonomous Vehicles: Investigating the advancements and challenges in using computer vision for self-driving cars, including obstacle detection, lane tracking, and traffic sign recognition.

3. Facial Recognition Systems: Evaluating the accuracy, privacy concerns, and ethical implications of facial recognition technology in various applications, such as security systems, social media platforms, and law enforcement.

4. GANs (Generative Adversarial Networks) in Computer Vision: Exploring the use of GANs for generating high-quality images, synthetic data, 