<a href="https://colab.research.google.com/github/langroid/langroid/blob/main/examples/kg-chat/DependencyChatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<img width="700" src="https://raw.githubusercontent.com/langroid/langroid/main/docs/assets/langroid_neo4j_logos.png" alt="Langroid">


# Overview

🔥 for those curious about leveraging the power of LLM and knowledge graph in the software supply security domain.
In this colab, we unveil the **Dependency Chatbot**, an LLM-powered application, equipped with a suite of specialized tools. It harnesses the power of Neo4j knowledge-graph and LLM for:

* crafting queries in Neo4j's native language,
* constructing detailed dependency graphs via DepsDev API,
* searching the web for broader web-based insights.




# Motivation: Software Supply Chain Security

This is a rapidly growing field, especially in light of the significant increase in software supply chain attacks. It focuses primarily on understanding and managing the dependencies in your software supply chain. With the rise of open-source and third-party components in software development, the need for supply chain security has become more critical than ever. Organizations are now realizing the importance of vetting and monitoring the components and dependencies they rely on to ensure the integrity and security of their software. As this field continues to evolve, it will be essential for developers and organizations to stay proactive in addressing supply chain vulnerabilities and implementing robust security measures.

Managing dependencies starts with the ability to identify direct and transitive dependencies. Normally, this involves obtaining the full dependency graph, and writing custom code to answer questions about dependencies. In this colab, we introduce a far simpler approach with 2 key innovations:
- store the dependency graph in a graph-db, specifically neo4j,
- use an LLM-powered Agent that translates a user's questions into the query language of neo4j (known as Cypher)


# PyPi Package Dependency Chatbot

This application combines the power of LLM and Knowledge Graphs (KG) to create a Retrieval-Augmented Generation (RAG) application for improved understanding of dependencies.

This application focuses on PyPi packages and relies on [DepsDev](https://deps.dev/) to obtain the dependencies for a given package. More details about this Chatbot can be found [HERE](https://github.com/langroid/langroid/tree/main/examples/kg-chat).

## Dependency Chatbot Architecture

![Arch](https://github.com/langroid/langroid/blob/main/docs/assets/DepChatbot.png?raw=true)

The chatbot comprises one agent `Neo4jChatAgent` that has access to three tools:

1.   `GraphSchemaTool`: to get schema of Neo4j knowledge-graph.
2.   `CypherRetrievalTool`: to generate cypher queries to get information from Neo4j knowledge-graph (Cypher is the query language for Neo4j).
3.   `DepGraphTool`: to build the dependency graph for a given pkg version, using the API at [DepsDev](https://deps.dev/).
4.   `GoogleSearchTool`: to find package version and type information. It also can answer other question from the web about other aspects after obtaining the intended information from the dependency graph.



## Workflow
The Dependency Chatbot's workflow is as follows:


1.   The chatbot asks the user to provide the package name.
2.   The chatbot tries to identify the version and verify this package is PyPi.
3.   The user confirms the package details.
4.   The chatbot will construct the dependency graph of the package including transitive dependencies.
5.   At this stage, the user can ask the chatbot any question about the dependency graph, such as:
  *   What are the packages at level 2?
  *   Tell me 3 interesting things about the dependency graph?
6.   For some questions that the chatbot can't answer from the the graph, it can use a web search tool to obtain additional information. For example, to identify the package version, the chatbot will use the web search tool.



## Implementation
We developed this application using the following tools/APIs:

*   [Langroid](https://github.com/langroid/langroid): a framework for developling LLM applications.
*   [Neo4j](https://neo4j.com/): a graph database management system.
*   [Cypher Query Language](): graph query language that lets you retrieve data from the graph. It is like SQL for graphs.
*   [DepsDev](https://deps.dev/): Open Source Insights is a service developed and hosted by Google to help developers better understand the structure, construction, and security of open source software packages.


## Required environment settings:

Before proceeding with the implementation, ensure that you have the necessary environment settings and keys in place.

*   `OPENAI_API_KEY`
*   GoogleSearchTool requires two keys:
    *   `GOOGLE_API_KEY`: [setup a Google API key](https://developers.google.com/custom-search/v1/introduction#identify_your_application_to_google_with_api_key),
    *   `GOOGLE_CSE_ID`: [setup a Google Custom Search Engine (CSE) and get the CSE ID](https://developers.google.com/custom-search/docs/tutorial/creatingcse)
*    NEO4J ENV:
    *   `username`: typically neo4j
    *   `password`: your-neo4j-password
    *   `uri`: uri-to-access-neo4j-dayabase
    *   `database`: typically neo4j

    These Neo4j settings will be requested later in this colab
    
    ```python
    neo4j_settings = Neo4jSettings(
      uri="",
      username="neo4j",
      password="",
      database="neo4j",
    )
    ```

**NOTE:** You can setup a free account at [Neo4j Aura](https://neo4j.com/cloud/platform/aura-graph-database/) to get access to Neo4j graph database.


## Install, setup, import

In [None]:
# Silently install Langroid, suppress all output (~2-4 mins)
!pip install -q --upgrade langroid &> /dev/null

In [None]:
# Silently install Neo4j, suppress all output
!pip install -q langroid[neo4j] &> /dev/null

## Environment settings

This code will ask the user to provide the `OPENAI_API_KEY`, `GOOGLE_API_KEY`, and `GOOGLE_CSE_ID`.

In [None]:
# OpenAI API Key: Enter your key in the dialog box that will show up below
# NOTE: colab often struggles with showing this input box,
# if so, simply insert your API key in this cell, though it's not ideal.
import os

from getpass import getpass

os.environ['OPENAI_API_KEY'] = getpass('Enter your OPENAI_API_KEY key:', stream=None)

In [None]:
# Google keys for the web search tool
os.environ['GOOGLE_API_KEY'] = getpass('Enter your GOOGLE_API_KEY key:', stream=None)
os.environ['GOOGLE_CSE_ID'] = getpass('Enter your GOOGLE_CSE_ID key:', stream=None)

In [None]:
# various unfortunate things that need to be done to
# control notebook behavior.

# (a) output width

from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

# (b) logging related
import logging
logging.basicConfig(level=logging.ERROR)
import warnings
warnings.filterwarnings('ignore')
import logging
for logger_name in logging.root.manager.loggerDict:
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.ERROR)


In [None]:
from langroid.agent.special.neo4j.neo4j_chat_agent import (
    Neo4jChatAgent,
    Neo4jChatAgentConfig,
    Neo4jSettings,
)
from langroid.language_models.openai_gpt import OpenAIGPTConfig, OpenAIChatModel
from langroid.utils.constants import NO_ANSWER
from langroid.utils.configuration import set_global, Settings
from langroid.agent.tool_message import ToolMessage
from langroid.agent.tools.google_search_tool import GoogleSearchTool

from langroid.agent.task import Task

## Define the tools

In [None]:
# Define the tool `DepGraphTool` that will construct the dpendency graph
# and answer user's questions
class DepGraphTool(ToolMessage):
    request = "construct_dependency_graph"
    purpose = f"""Get package <package_version>, <package_type>, and <package_name>.
    For the <package_version>, obtain the recent version, it should be a number.
    For the <package_type>, return if the package is PyPI or not.
      Otherwise, return {NO_ANSWER}.
    For the <package_name>, return the package name provided by the user.
    ALL strings are in lower case.
    """
    package_version: str
    package_type: str
    package_name: str


In [None]:
# Defining the class of the `DependencyGraphAgent`
class DependencyGraphAgent(Neo4jChatAgent):
    def construct_dependency_graph(self, msg: DepGraphTool) -> None:
        check_db_exist = (
            "MATCH (n) WHERE n.name = $name AND n.version = $version RETURN n LIMIT 1"
        )
        response = self.read_query(
            check_db_exist, {"name": msg.package_name, "version": msg.package_version}
        )
        if response.success and response.data:
            # self.config.database_created = True
            return "Database Exists"
        else:
            construct_dependency_graph = CONSTRUCT_DEPENDENCY_GRAPH.format(
                package_type=msg.package_type.lower(),
                package_name=msg.package_name,
                package_version=msg.package_version,
            )
            if self.write_query(construct_dependency_graph):
                self.config.database_created = True
                return "Database is created!"
            else:
                return f"""
                    Database is not created!
                    Seems the package {msg.package_name} is not found,
                    """

In [None]:
# CONSTRUCT_DEPENDENCY_GRAPH is the Cypher query that will be used for constructing the dependency graph
CONSTRUCT_DEPENDENCY_GRAPH = """
        with "{package_type}" as system, "{package_name}" as name, "{package_version}" as version

        call apoc.load.json("https://api.deps.dev/v3alpha/systems/"+system+"/packages/"
                            +name+"/versions/"+version+":dependencies")
        yield value as r

        call {{ with r
                unwind r.nodes as package
                merge (p:Package:PyPi {{name: package.versionKey.name, version: package.versionKey.version}})
                return collect(p) as packages
        }}
        call {{ with r, packages
            unwind r.edges as edge
            with packages[edge.fromNode] as from, packages[edge.toNode] as to, edge
            merge (from)-[rel:DEPENDS_ON]->(to) ON CREATE SET rel.requirement
            = edge.requirement
            return count(*) as numRels
        }}

        match (root:Package:PyPi) where root.imported is null
        set root.imported = true
        with "{package_type}" as system, root.name as name, root.version as version
        call apoc.load.json("https://api.deps.dev/v3alpha/systems/"+system+"/packages/"
                            +name+"/versions/"+version+":dependencies")
        yield value as r

        call {{ with r
                unwind r.nodes as package
                merge (p:Package:PyPi {{name: package.versionKey.name, version: package.versionKey.version}})
                return collect(p) as packages
        }}
        call {{ with r, packages
                unwind r.edges as edge
                with packages[edge.fromNode] as from, packages[edge.toNode] as to, edge
                merge (from)-[rel:DEPENDS_ON]->(to) ON CREATE SET
                rel.requirement = edge.requirement
                return count(*) as numRels
        }}
        return size(packages) as numPackages, numRels
        """

## Define the dependency agent

In [None]:
# We also need to provide Neo4j environment variables before defining the `dependency_agent`
neo4j_settings = Neo4jSettings(
    uri="",
    username="neo4j",
    password="",
    database="neo4j",
)

dependency_agent = DependencyGraphAgent(
        config=Neo4jChatAgentConfig(
            neo4j_settings=neo4j_settings,
            use_tools=True,
            use_functions_api=False,
            llm=OpenAIGPTConfig(
                chat_model=OpenAIChatModel.GPT4_TURBO,
            ),
        ),
    )

## Define the task

In [None]:
# Define the dependency task that will orchestrate the work for the `dependency_agent`
system_message = f"""You are an expert in Dependency graphs and analyzing them using
    Neo4j.

    FIRST, I'll give you the name of the package that I want to analyze.

    THEN, you can also use the `web_search` tool/function to find out information about a package,
      such as version number and package type (PyPi or not).

    If unable to get this info, you can ask me and I can tell you.

    DON'T forget to include the package name in your questions.

    After receiving this infomration, make sure the package version is a number and the
    package type is PyPi.
    THEN ask the user if they want to construct the dependency graph,
    and if so, use the tool/function `construct_dependency_graph` to construct
      the dependency graph. Otherwise, say `Couldn't retrieve package type or version`
      and {NO_ANSWER}.
    After constructing the dependency graph successfully, you will have access to Neo4j
    graph database, which contains dependency graph.
    You will try your best to answer my questions. Note that:
    1. You can use the tool `get_schema` to get node label and relationships in the
    dependency graph.
    2. You can use the tool `retrieval_query` to get relevant information from the
      graph database. I will execute this query and send you back the result.
      Make sure your queries comply with the database schema.
    3. Use the `web_search` tool/function to get information if needed.
    """

task = Task(
    dependency_agent,
    name="DependencyAgent",
    system_message=system_message,
)

dependency_agent.enable_message(DepGraphTool)
dependency_agent.enable_message(GoogleSearchTool)
task.set_color_log(enable=False)
task.run()