# ai_workflows custom GPT assembler

This notebook assembles a system prompt and knowledge-base content for a custom GPT that helps people use the ai_workflows package. It's designed to run locally, within an ai_workflows development environment.

It outputs the system prompt and knowledge base files to `~/ai-workflows/gpt-system-prompt.md` (where `~` is your user home directory).

## Before running this

1. Make sure the local documentation has been built (see README for details)
2. If there are any new modules, make sure that they are added to the modules list in the code block below
3. Configure your local `.ini` file as discussed below

## Configuration

The notebook begins by loading credentials and configuration from an `.ini` file stored in `~/.hbai/ai-workflows.ini`. The `~` in the path refers to the current user's home directory, and the `.ini` file contents should follow this format (with keys, models, and paths as appropriate):

    [openai]
    openai-api-key=keyhere-with-sk-on-front
    openai-model=gpt-4o
    azure-api-key=keyhere-or-blank
    azure-api-base=azure-base-url-here
    azure-api-engine=gpt-4o
    azure-api-version=2024-02-01

    [anthropic]
    anthropic-api-key=keyhere
    anthropic-model=

    [aws]
    aws-profile=
    bedrock-model=
    bedrock-region=us-east-1

    [langsmith]
    langsmith-api-key=leave-blank-unless-you're-using-langsmith

You can set up either OpenAI, Azure, Anthropic, or Bedrock as the LLM, leaving settings for the other LLMs blank. You also don't need to supply a Langsmith API key unless you're using Langsmith.

If you don't have an API key for an AI provider yet, [see here to learn what that is and how to get one](https://www.linkedin.com/pulse/those-genai-api-keys-christopher-robert-l5rie/).


In [None]:
modules = [
    "llm_utilities",
    "document_utilities",
]

# for convenience, auto-reload modules when they've changed
%load_ext autoreload
%autoreload 2

import logging
import configparser
import os
from ai_workflows.llm_utilities import LLMInterface
from ai_workflows.document_utilities import DocumentInterface

# set log level to WARNING
logging.basicConfig(level=logging.INFO)

# load credentials and other configuration from a local ini file
inifile_location = os.path.expanduser("~/.hbai/ai-workflows.ini")
inifile = configparser.RawConfigParser()
inifile.read(inifile_location)

# load configuration
openai_api_key = inifile.get("openai", "openai-api-key")
openai_model = inifile.get("openai", "openai-model")
azure_api_key = inifile.get("openai", "azure-api-key")
azure_api_base = inifile.get("openai", "azure-api-base")
azure_api_engine = inifile.get("openai", "azure-api-engine")
azure_api_version = inifile.get("openai", "azure-api-version")
anthropic_api_key = inifile.get("anthropic", "anthropic-api-key")
anthropic_model = inifile.get("anthropic", "anthropic-model")
aws_profile = inifile.get("aws", "aws-profile")
bedrock_model = inifile.get("aws", "bedrock-model")
bedrock_region = inifile.get("aws", "bedrock-region")
input_dir = os.path.expanduser(inifile.get("files", "input-dir"))
output_dir = os.path.expanduser(inifile.get("files", "output-dir"))
langsmith_api_key = inifile.get("langsmith", "langsmith-api-key")

# initialize the LLM
llm = LLMInterface(
    openai_api_key=openai_api_key,
    openai_model=openai_model,
    azure_api_key=azure_api_key,
    azure_api_base=azure_api_base,
    azure_api_engine=azure_api_engine,
    azure_api_version=azure_api_version,
    langsmith_api_key=langsmith_api_key,
    anthropic_api_key=anthropic_api_key,
    anthropic_model=anthropic_model,
    bedrock_model=bedrock_model,
    bedrock_region=bedrock_region,
    bedrock_aws_profile=aws_profile,
    max_tokens=16384                       # (note that we require a higher output token limit for this task)
)

# initialize our document processor (no LLM needed)
doc_interface = DocumentInterface()

# function to condense/clean Markdown for knowledge base
def condense_for_kb(md):
    prompt = f"""I am preparing documentation to ground an LLM-based coding assistant. This coding assistant is designed for people using the `ai_workflows` Python package. Following is some documentation converted from the original package's HTML documentation. Please revise this Markdown as follows:

1. Remove anything about Sphinx, the ReadTheDocs theme, or anything else about how the documentation was produced
2. Clean up the formatting, starting with a second-level heading (`##`) at the top

Please retain all essential documentation, including examples, explanations, and hyperlinks. Please also return the revised Markdown without any enclosing code block or other formatting (just return the Markdown text alone).

Here is the Markdown to revise, enclosed in |@| delimiters:
|@|{md}|@|

Your revised version: """
    return llm.get_llm_response(prompt)

# function to condense/clean Markdown for system prompt
def condense_for_system_prompt(md):
    prompt = f"""I am preparing a system prompt to ground an LLM-based coding assistant. This coding assistant is designed for people using the `ai_workflows` Python package. Following is the overview converted from the original package's HTML documentation. Please revise this Markdown as follows:

1. Remove anything about Sphinx, the ReadTheDocs theme, or anything else about how the documentation was produced
2. Clean up the formatting, starting with a second-level heading (`##`) at the top
3. Condense to include only the most essential details about installing and using the package, plus any core concepts that are important for users to understand
4. Skip discussion of internal implementation details, but be sure to retain important guidance about how to specify JSON output specifications and work with JSON responses

Please return the revised Markdown without any enclosing code block or other formatting (just return the Markdown text alone).

Here is the Markdown to revise, enclosed in |@| delimiters:
|@|{md}|@|

Your revised version: """
    return llm.get_llm_response(prompt)

# convert top-level docs page to Markdown for system prompt and knowledge base
orig_index_md = doc_interface.convert_to_markdown("../docs/build/html/index.html")
index_filename = f"kb_overview.md"
index_url = "https://ai-workflows.readthedocs.io/en/latest/"
kb_file_list = f"   - {index_filename}: overview documentation for the `ai_workflows` package (available at {index_url})\n"
system_prompt_overview = condense_for_system_prompt(orig_index_md)
index_md = f"# ai_workflows overview documentation (available at {index_url}):\n\n" + condense_for_kb(orig_index_md)
output_path = os.path.expanduser(f"~/ai-workflows/{index_filename}")
with open(output_path, "w") as f:
    f.write(index_md)
print(f"Wrote overview knowledge base item to {output_path}")

# assemble knowledge base
for module in modules:
    url=f"https://ai-workflows.readthedocs.io/en/latest/ai_workflows.{module}.html"
    filename = f"kb_{module}.md"
    kb_file_list += f"   - {filename}: reference documentation for the `{module}` module (available at {url})\n"
    module_md = f"# {module} reference documentation (available at {url}):\n\n" + condense_for_kb(doc_interface.convert_to_markdown(f"../docs/build/html/ai_workflows.{module}.html"))
    output_path = os.path.expanduser(f"~/ai-workflows/{filename}")
    with open(output_path, "w") as f:
        f.write(module_md)
    print(f"Wrote {module} knowledge base item to {output_path}")

# construct system prompt
system_prompt_md = f"""You are a coding assistant for users of the `ai_workflows` Python package. Many of these users might be new to Python or otherwise inexperienced (they might, for example, be a social scientist or a program evaluator who seeks to use `ai_workflows` to systematize a workflow with AI assistance). In assisting the user, consider the following:

1. The condensed overview documentation included below
2. The full reference documentation in the knowledge base, including the following files:
{kb_file_list}
# Condensed overview documentation enclosed in |@| delimiters (available at {index_url}):

|@|
{system_prompt_overview}
|@|

Additional instructions:

1. **Prioritize `DocumentInterface` for Document-Based Workflows:**
   - When the user’s task begins with files or document strings (e.g., PDFs, Word documents, Markdown), default to using `DocumentInterface` methods for processing. This includes:
     - Converting files to Markdown (`convert_to_markdown()`).
     - Extracting structured data from files or Markdown (`convert_to_json()` or `markdown_to_json()`).
   - Avoid recommending standalone NLP libraries (e.g., spaCy, NLTK) for text extraction unless explicitly requested.

2. **Prioritize `LLMInterface` for Direct LLM Interaction:**
   - When the user needs to interact directly with an LLM (e.g., generating a response to a query or extracting structured data from arbitrary text), default to using `LLMInterface` methods, such as:
     - `get_llm_response()` for free-form text outputs.
     - `get_json_response()` for structured outputs.
   - Emphasize the flexibility of `LLMInterface` methods for tasks unrelated to initial document processing, such as generating summaries or answering questions.

3. **Assume Integration with LLMs:**
   - For any structured information extraction or response generation task, assume the user wants to use LLM-based workflows. Avoid suggesting non-LLM solutions unless explicitly requested.

4. **Default to JSON Outputs for Structured Data:**
   - For tasks involving structured outputs (e.g., extracting names, lists, or entities), prioritize `convert_to_json()` or `markdown_to_json()` when starting from documents.
   - Use `get_json_response()` when the input is not document-based.

5. **Require LLM for All JSON Operations:**
   - Remember that an `LLMInterface` is always required for the `DocumentInterface` methods that convert to JSON (because they use an LLM to convert to JSON).

6. **Remember that `DocumentInterface` Methods Generate JSON in Batches:**
   - Remember that `DocumentInterface` methods that convert to JSON (e.g., `convert_to_json()`, `markdown_to_json()`) generate JSON in batches. This means that the output is a list of JSON results, each following the structure specified in the `json_output_spec` parameter. The `merge_dicts()` method can be used to combine these results into a single dictionary.

7. **Encourage Customization of Prompts:**
   - Highlight the importance of customizing `json_context`, `json_job`, and `json_output_spec` parameters to tailor LLM behavior for the user’s needs.
   - Provide specific examples of prompt customization for tasks like name extraction or question generation.

8. **Be Concise, Specific, and Accurate**
   - Unless instructed otherwise, always be concise, specific, and accurate in your responses.

9. **Provide Hyperlinks**
   - Always embed hyperlinks in every response where specific documentation, methods, or tools from `ai_workflows` are mentioned. Assume users will want clickable references for all methods and key concepts. Hyperlinks must be integrated inline whenever possible."""

# write to system prompt file
output_path = os.path.expanduser("~/ai-workflows/gpt-system-prompt.md")
with open(output_path, "w") as f:
    f.write(system_prompt_md)
print(f"Wrote system prompt to {output_path}")

## What to do next

The code block above will report which files have been output. Use those files to update the GPT's system prompt and knowledge base.