# Docs: Claude Report Revision
Expand for Docs

## Introduction
This Python notebook provides a guide on using the Claude Report Revision tool. The notebook includes essential libraries for handling web requests, JSON data, regular expressions, and file operations. It also features a custom function to preprocess textual data by removing specified lines.

## Dependencies
The following libraries are used in this notebook:
- `concurrent.futures`: Provides a high-level interface for asynchronously executing callables.
- `anthropic`: Anthropic API client library.
- `requests`: Allows sending HTTP requests.
- `json`: Enables JSON encoding and decoding.
- `ast`: Provides the ability to parse Python expressions.
- `re`: Supports regular expressions.
- `os`: Provides a way to interact with the operating system.
- `googleapiclient.discovery`: Google API client library for service discovery.

## Environment Variables
The notebook requires the following environment variables to be set:
- `ANTHROPIC_API_KEY`: Your Anthropic API key.
- `GOOGLE_API_KEY`: Your Google Search API key.
- `GOOGLE_CSE_ID`: Your Custom Search Engine ID.

Make sure to set these environment variables on your PC or replace the `os.getenv()` calls with your API keys as strings.

## Model Selection
The notebook allows you to select models for different tasks:
- `RESEARCH_MODEL`: The model used for performing the research task (default: "claude-3-haiku-20240307").
- `REPORT_MODEL`: The model used for generating the final comprehensive report (default: "claude-3-sonnet-20240229").

## Functions
The notebook includes the following functions:

### remove_first_line
Removes the first line of the given text if it starts with "Here" and ends with a colon.

### generate_text
Generates text using the specified model and prompt. It sends a request to the Anthropic API and returns the generated text after removing the first line.

### search_web
Performs a web search using the Google Custom Search API. It retrieves the title and snippet of the search results and returns them as a list.

### revise_report
Analyzes the given research report, generates search queries to gather additional information, performs web searches, and updates the report with the new information. It also identifies and removes redundant or duplicate information.

## Usage
To use the notebook:
1. Set the required environment variables or replace them with your API keys.
2. Run the notebook.
3. Enter the path to the markdown research report file when prompted.
4. The notebook will revise the research report and save the revised version as a new file with the suffix "_revised.md" in the same directory.

## Error Handling
The notebook includes error handling for various scenarios:
- If the specified research report file is not found, an error message is displayed, and the program exits.
- If an exception occurs during the report revision process, an error message is displayed, and the program exits.
- If there are unsupported characters in the generated report, an error message is displayed, and the user is prompted to try again with a different report or modify the generated report to remove the unsupported characters.

## Summary
This notebook provides a guide on using the Claude Report Revision tool to revise and enhance research reports. It utilizes the Anthropic API and Google Custom Search API to gather additional information and update the report accordingly.

# Main

In [None]:
import concurrent.futures
import anthropic
import requests
import json
import ast
import re
import os
from googleapiclient.discovery import build

# SET ENVIRONMENT VARIABLES ON YOUR PC OR REPLACE THE OS.GETENV() WITH YOUR API KEYS AS STRINGS
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")  # Replace with your Anthropic API key
GOOGLE_API_KEY = os.getenv("GOOGLE_SEARCH")  # Replace with your Google Search API key
GOOGLE_CSE_ID = os.getenv("GOOGLE_SEARCH_ENGINE_ID")  # Replace with your Custom Search Engine ID

# Select models for tasks
RESEARCH_MODEL = "claude-3-haiku-20240307"  # set the model which will perform the research task
REPORT_MODEL = "claude-3-sonnet-20240229"  # set the model which will generate the final comprehensive report

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

In [None]:
def remove_first_line(text):
    lines = text.strip().split("\n")
    if lines[0].startswith("Here") and lines[0].strip().endswith(":"):
        return "\n".join(lines[1:])
    return text

def generate_text(prompt, model, max_tokens=4000, temperature=0.4):
    headers = {
        "x-api-key": ANTHROPIC_API_KEY,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json"
    }
    data = {
        "model": model,
        "max_tokens": max_tokens,
        "temperature": temperature,
        "system": "You are a world-class researcher. Analyze the given information and generate a detailed, comprehensive, and well-structured report.",
        "messages": [{"role": "user", "content": prompt}],
    }
    try:
        response = requests.post("https://api.anthropic.com/v1/messages", headers=headers, json=data)
        response.raise_for_status()  # Raise an exception for 4xx or 5xx status codes
        response_json = response.json()
        response_text = response_json['content'][0]['text']
        print(remove_first_line(response_text.strip()))
        return remove_first_line(response_text.strip())
    except requests.exceptions.RequestException as e:
        raise Exception(f"Request failed: {e}")

def search_web(search_term, api_key, cse_id):
    try:
        service = build("customsearch", "v1", developerKey=api_key)
        result = service.cse().list(q=search_term, cx=cse_id).execute()
        parsed_data = json.loads(json.dumps(result))
        relevant_text = []
        for item in parsed_data.get('items', []):
            title = item.get('title', '')
            snippet = item.get('snippet', '')
            relevant_text.append(f"Title: {title}\nSnippet: {snippet}\n")
        print("\n".join(relevant_text))
        return relevant_text
    except Exception as e:
        print(f"Error occurred during web search: {e}")
        return []

def revise_report(research_report):
    search_data = []
    all_queries = []
    search_cache = {}

    print("Analyzing report and generating search queries...")
    analysis_prompt = f"Analyze the following research report and identify areas of the report that need more detail or further information:\n\n{research_report}\n\n---\n\nGenerate 3 to 5 search queries to gather additional information to enhance the report. Return your queries in a Python-parseable list. Return nothing but the list. Do so in one line. Start your response with [\""
    queries_response = generate_text(analysis_prompt, model=RESEARCH_MODEL)

    if queries_response.startswith('[') and ']' in queries_response:
        try:
            queries = ast.literal_eval(queries_response)
        except SyntaxError:
            print("Error: Invalid search query format. Skipping queries.")
            queries = []
    else:
        print("Error: Search query format not found. Skipping queries.")
        queries = []

    all_queries.extend(queries)

    def search_and_cache(query):
        if query in search_cache:
            return search_cache[query]
        else:
            search_results = search_web(query, GOOGLE_API_KEY, GOOGLE_CSE_ID)
            search_cache[query] = search_results
            return search_results

    with concurrent.futures.ThreadPoolExecutor() as executor:
        search_results = list(executor.map(search_and_cache, queries))
        search_data.extend(search_results)

    print("Updating report with additional information...")
    update_prompt = f"Update the following research report by incorporating the new information from the searches. Additionally, identify areas of the report which are redundant/duplicate areas of information, and make necessary changes to the verbiage ass needed in order to get points across better. However, avoid using hyperbole or terms of grandeur. The goal is to improve this report:\n\n{research_report}\n\n---\n\nAdditional search data:\n\n{str(search_data)}\n\n---\n\nGenerate an updated report that includes the new information and provides more detail in the identified areas. Remember to revise the Table of Contents as needed. Use Markdown for formatting."
    updated_report = generate_text(update_prompt, model=REPORT_MODEL, max_tokens=4000)
    print("Report revision completed!")
    return updated_report

# User input
research_report = input("Enter the path to the markdown research report file: ")

# Read the research report from the file
try:
    with open(research_report, "r", encoding="utf-8") as file:
        report_content = file.read()
except FileNotFoundError:
    print(f"Error: File '{research_report}' not found.")
    exit(1)

# Revise the research report
revised_report = None
try:
    revised_report = revise_report(report_content)
except Exception as e:
    print(f"Error: {e}")
    exit(1)

# Save the revised report to a file with the format "[original_filename]_revised.md" in the same directory
if revised_report:
    report_filename = os.path.splitext(research_report)[0] + "_revised.md"
    try:
        with open(report_filename, "w", encoding="utf-8") as file:
            file.write(revised_report)
        print(f"Revised report saved as '{report_filename}'.")
    except UnicodeEncodeError as e:
        print(f"Error: Unable to save the report due to unsupported characters. Please try again with a different research report or modify the generated report to remove any unsupported characters.")
        print(f"Error details: {e}")