# Watsonx.ai with langchain

Test conversation

In [1]:
messages = [
    {   "role": "system",
        "content": """
You are an AI assistant that helps with software issue localization.

Following package summaries are available for your reference:

[PACKAGE-SUMMARIES-START]
# llm

## Semantic Summary
The `llm` package provides a comprehensive interface for interacting with the OpenAI API to generate semantic descriptions, package summaries, and suggestions using language models. It is designed to handle API interactions with robust retry mechanisms and exponential backoff strategies. Additionally, it includes utilities for creating tailored prompts for analyzing Python code, identifying relevant packages, localizing issues, and suggesting code changes. This package is particularly useful for developers seeking to automate and enhance their code analysis and documentation processes.

## Contained code structure names
`__init__.py`, `api.py`, `prompts.py`, `call_llm_to_generate_semantic_description`, `call_llm_to_generate_package_summary`, `call_llm_for_localization`, `call_llm_for_suggestions`, `retry_with_exponential_backoff`, `completions_with_backoff`, `parsed_completions_with_backoff`, `call_llm`, `prompt_generate_semantic_description`, `prompt_generate_package_summary`, `prompt_identify_relevant_packages`, `prompt_localize_to_files`, `prompt_generate_change_suggestions`

# repository_analyzer

## Semantic Summary
The `repository_analyzer` package provides tools for analyzing code repositories by generating semantic descriptions and summaries. It leverages language model APIs to process code files and documentation, enabling users to extract meaningful insights and summaries from complex codebases. The package focuses on two main functionalities: creating semantic descriptions of individual code files and generating comprehensive summaries for entire packages based on their documentation.

## Contained code structure names
`__init__.py`, `file_analyzer.py`, `package_summary.py`, `generate_semantic_description`, `generate_package_summary`

# se_agent

## Semantic Summary
The `se_agent` package is a comprehensive software engineering tool designed to facilitate the management and enhancement of GitHub projects. It integrates with GitHub to automate the process of analyzing issues, suggesting code changes, and updating the understanding of a project’s codebase. The package features a webhook listener for real-time interactions, an issue analyzer, and a localization mechanism to pinpoint relevant code files. Additionally, it includes functionalities for onboarding projects by cloning and managing repositories, generating semantic documentation, and interacting with GitHub’s API for seamless project management.

## Contained code structure names
`__init__.py`, `change_suggester.py`, `github_listener.py`, `issue_analyzer.py`, `localizer.py`, `main.py`, `onboard_agent.py`, `project.py`, `project_info.py`, `project_manager.py`, `suggest_changes`, `get_project_manager`, `webhook`, `process_issue_event`, `run_server`, `IGNORE_TOKEN`, `logger`, `analyze_issue`, `RelevantPackages`, `FileLocalizationSuggestion`, `FileLocalizationSuggestions`, `localize_issue`, `main`, `Project`, `ProjectInfo`, `ProjectManager`, `onboard_agent`, `load_checkpoint`, `save_checkpoint`, `delete_checkpoint`, `clone_repository`, `pull_latest_changes`, `update_codebase_understanding`, `onboard`, `create_hierarchical_document`, `fetch_package_summaries`, `fetch_package_details`, `fetch_code_files`, `post_issue_comment`, `fetch_issue_comments`, `add_project`, `get_project`, `list_projects`.

# util

## Semantic Summary
The `util` package provides utility scripts for file and folder management within a directory. It includes functionality to count files and folders recursively, with options for filtering by file extension. The package is designed to handle common errors such as missing directories and offers a command-line interface for user interaction, making it convenient for integrating into various workflows.

## Contained code structure names
`__init__.py`, `file_count.py`, `folder_count.py`, `count_files_in_folder_recursive`, `count_folders_in_folder_recursive`
[PACKAGE-SUMMARIES-END]

You understand the issues raised and discussed by the user.
Analyze any code snippets provided.
Based on the package summaries above, identify the packages most relevant to the discussion.
And finally, return a list of high-level packages (as a JSON array) that you think are most relevant for the issue and discussion.

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{{"properties": {{"relevant_packages": {{"items": {{"type": "string"}}, "title": "Relevant Packages", "type": "array"}}}}, "required": ["relevant_packages"]}}
```
"""
    },
    {   "role": "user",
        "content": """
Issue: lets move to llm agnostic implementation

Description: Our llm interface implemented in api.py is currently openai specific.
Let's transform that to llm provider agnostic implementation.
Let's use langchain for that.
"""
    }
]

Output strcture expected.

In [2]:
from pydantic import BaseModel

class RelevantPackages(BaseModel):
    relevant_packages: list[str]

watsonx env vars

In [3]:
import os

apikey = os.getenv("WATSONX_APIKEY")
project_id = os.getenv("WATSONX_PROJECT_ID")
url = os.getenv("WATSONX_URL")

## Experiment with Chat Model

Instantiate a chat model using IBM watsonx.ai and langchain.

In [4]:
from langchain_ibm import ChatWatsonx

chat_model = ChatWatsonx(
    model_id="mistralai/mistral-large",
    url=url,
    apikey=apikey,
    project_id=project_id,
    params={
        "temperature": 0.7,
        "max_tokens": 512,
    },
)

Function to transform test conversation to the format expected by the chat model.

In [5]:
from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
    AIMessage,
)

def transform_to_langchain_message_format(messages):
    """Transforms messages to Langchain chat prompt template format."""
    transformed_messages = []
    for message in messages:
        role = message['role']
        content = message['content']
        # Create corresponding Langchain message object based on role
        if role == 'user':
            transformed_message = HumanMessage(content=content)
        elif role == 'assistant':
            transformed_message = AIMessage(content=content)
        elif role == 'system':
            transformed_message = SystemMessage(content=content)
        else:
            raise ValueError(f"Unknown role: {role}")

        transformed_messages.append(transformed_message)
    return transformed_messages

Invoke the Chat Model

In [6]:
chat_model.with_structured_output(RelevantPackages).invoke(transform_to_langchain_message_format(messages))

RelevantPackages(relevant_packages=['llm'])

## Experiment with Language Model

Instantiate a language model using IBM watsonx.ai and langchain.

In [7]:
from langchain_ibm import WatsonxLLM

language_model = WatsonxLLM(
	model_id="meta-llama/llama-3-405b-instruct",
	project_id=project_id,
    url=url,
    apikey=apikey,
    params={
		"decoding_method": "greedy",
		"max_new_tokens": 900,
		"repetition_penalty": 1
	},
)

Function to transform test conversation to the format expected by the language model.

In [8]:
def transform_to_single_prompt_string(messages):
    """Transforms a list of messages to a single prompt string."""
    prompt = "<|begin_of_text|>"
    for message in messages:
        role = message['role']
        content = message['content']
        prompt += f"<|start_header_id|>{role}<|end_header_id|>{content}<|eot_id|>"
    prompt += "<|start_header_id|>assistant<|end_header_id|>"
    return prompt

In [9]:
language_model.invoke(input=transform_to_single_prompt_string(messages))

'\n\nBased on the issue and discussion, I have analyzed the code snippets and package summaries provided. The issue revolves around transforming the current OpenAI-specific LLM interface in `api.py` to an LLM provider-agnostic implementation using LangChain.\n\nThe most relevant packages to this discussion are:\n\n* `llm`: This package currently contains the OpenAI-specific LLM interface in `api.py`, which needs to be transformed.\n* `langchain`: Although not listed in the provided package summaries, LangChain is mentioned in the discussion as the tool to be used for the LLM provider-agnostic implementation.\n\nHere is the list of high-level packages that I think are most relevant for the issue and discussion:\n\n```\n{"relevant_packages": ["llm"]}\n```\n\nNote that `langchain` is not included in the output as it is not part of the provided package summaries. However, it is understood to be a crucial component in the solution.'

Invoke the Language Model

In [None]:
language_model.with_structured_output(RelevantPackages).invoke(input=transform_to_single_prompt_string(messages))

# =>
#   NotImplementedError

Alternatively we can post process using PydanticOutputParser

In [10]:
from langchain_core.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=RelevantPackages)

Let's build a pipeline with llm and parser

In [11]:
chain = language_model | parser

let's now invoke the chain with the input

In [12]:
chain.invoke(input=transform_to_single_prompt_string(messages))

RelevantPackages(relevant_packages=['llm'])