# Classifier-Extractor Module POC (LLM+LangChain)
- Public Version - Redacted & Used Synthetic Vulnerability Reports
- For Vulnerability Case Analysis Automation
- Jayden Kim

#### Overview
- POC to validate the technical feasibility of the business requirements
- LangChain-based LLM Implementation
- This module as part of a larger automation pipeline being worked on by Team ABC and DS
- Core business problem
    - to classify a new submission into pre-determined categories and extract relevant repro steps
- ChatGPT-generated synethetic dataset used for more secure exploration
- FewShotPromptTemplate module of LangChain used for few-shot learning in a LangChain-friendly structure
- Pydantic to ensure that the output is properly structured for the next module in the pipeline

#### LLM Setup

In [1]:
# import packages
import os, json
from dotenv import load_dotenv

from langchain.chat_models import AzureChatOpenAI
from langchain import LLMChain, PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.output_parsers import PydanticOutputParser

from pydantic import BaseModel, Field, validator
from typing import List


In [2]:
# load env variables for the LLM
load_dotenv()
BASE_URL = os.getenv("BASE_URL")
API_KEY = os.getenv("API_KEY")
DEPLOYMENT_NAME = os.getenv("DEPLOYMENT_NAME")

In [3]:
# instantiate the llm object (Azure OpenAI gpt-35-turbo)
llm = AzureChatOpenAI(
    openai_api_base=BASE_URL,
    openai_api_version = "2023-03-15-preview",
    deployment_name=DEPLOYMENT_NAME,
    openai_api_key=API_KEY,
    openai_api_type="azure",
    temperature=0 #for predictable and deterministic response
)

In [5]:
# quick sanity check for this instance 
prompt_template = """What is a good name for a dog that likes {something}?"""
llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_template))
llm_chain.predict(something="a cat")

'Felix'

#### Examples for Few-shot Learning

In [6]:
#ChatGPT-generated synthetic test submission to test the POC
test_submission = {"description": """A cross-site scripting (XSS) vulnerability has been identified in the ABC Web Application version 1.2.3. 
                                    The application fails to properly sanitize user input in the search functionality, 
                                    allowing an attacker to inject malicious scripts into the returned search results. 
                                    This can lead to session hijacking, data theft, or further attacks.
                                    \n\nReproduction Steps:\n1. Navigate to the ABC Web Application search page.
                                    \n2. In the search input field, enter a script tag with a payload.\n3. 
                                    Submit the search request and observe the execution of the injected script.
                                    \n\nReference URLs:\n- ABC Web Application Official Website: https://www.example.com/abc-web-app\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-54321\n- CVE-2022-54321: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-54321""", 
                   "vulnerabilityCategory": "Cross-Site Scripting (XSS)", 
                   "extracted_repro_steps": ["1. Navigate to the ABC Web Application search page.", "2. In the search input field, enter a script tag with a payload.", "3. Submit the search request and observe the execution of the injected script."]
                   }

In [8]:
#ChatGPT-generated synthetic example submissions for few-shot learning of the LLM (total 9 examples)
##SQL Injection (SQLi) x 2
##Remote Code Execution (RCE) x 2
##Cross-Site Request Forgery (CSRF) x 2
##Information Disclosure x 2
##Cross-Site Scripting (XSS) x 1
example_submissions = [
    {
        "description": "XYZ Content Management System version 4.8.2 is susceptible to SQL injection attacks. An attacker can manipulate input fields to execute unauthorized SQL queries, potentially leading to data breaches or unauthorized access to the underlying database.\n\nReproduction Steps:\n1. Access the XYZ Content Management System login page.\n2. Enter a specially crafted SQL injection payload in the username or password field.\n3. Submit the login request and observe the response, which may reveal sensitive information or provide unauthorized access.\n\nReference URLs:\n- XYZ Content Management System Official Website: https://www.example.com/xyz-cms\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-12345\n- CVE-2022-12345: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-12345",
        "vulnerabilityCategory": "SQL Injection (SQLi)",
        "extracted_repro_steps": [
            "1. Access the XYZ Content Management System login page.",
            "2. Enter a specially crafted SQL injection payload in the username or password field.",
            "3. Submit the login request and observe the response, which may reveal sensitive information or provide unauthorized access."
        ]
    },
    {
        "description": "An RCE vulnerability has been discovered in XYZ Server Management Software version 2.5.0. Attackers can exploit this flaw to execute arbitrary commands on the targeted server, potentially compromising the entire system.\n\nReproduction Steps:\n1. Establish a network connection to the vulnerable server running XYZ Server Management Software.\n2. Craft a malicious command payload to exploit the RCE vulnerability.\n3. Send the payload to the server and observe the successful execution of arbitrary commands.\n\nReference URLs:\n- XYZ Server Management Software Official Website: https://www.example.com/xyz-server-management\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-87654\n- CVE-2022-87654: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-87654",
        "vulnerabilityCategory": "Remote Code Execution (RCE)",
        "extracted_repro_steps": [
            "1. Establish a network connection to the vulnerable server running XYZ Server Management Software.",
            "2. Craft a malicious command payload to exploit the RCE vulnerability.",
            "3. Send the payload to the server and observe the successful execution of arbitrary commands."
        ]
    },
    {
        "description": "The XYZ Banking Portal version 3.7.1 is vulnerable to cross-site request forgery (CSRF). Attackers can trick authenticated users into unknowingly performing unauthorized actions, such as transferring funds or changing account settings.\n\nReproduction Steps:\n1. Craft a malicious webpage that includes forged requests targeting sensitive actions on the XYZ Banking Portal.\n2. Persuade the victim to visit the malicious webpage while being logged into their XYZ Banking Portal account.\n3. Observe the successful execution of unauthorized actions upon the victim's interaction with the malicious webpage.\n\nReference URLs:\n- XYZ Banking Portal Official Website: https://www.example.com/xyz-banking-portal\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-34567\n- CVE-2022-34567: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-34567",
        "vulnerabilityCategory": "Cross-Site Request Forgery (CSRF)",
        "extracted_repro_steps": [
            "1. Craft a malicious webpage that includes forged requests targeting sensitive actions on the XYZ Banking Portal.",
            "2. Persuade the victim to visit the malicious webpage while being logged into their XYZ Banking Portal account.",
            "3. Observe the successful execution of unauthorized actions upon the victim's interaction with the malicious webpage."
        ]
    },
    {
        "description": "The ABC Mobile App version 2.2.0 inadvertently exposes sensitive user data, including personal information and payment details, due to improper access controls. Unauthorized users can access this information, leading to potential identity theft or fraud.\n\nReproduction Steps:\n1. Install the ABC Mobile App on a device.\n2. Attempt to access specific features or functionality without proper authentication or authorization.\n3. Observe the unintended exposure of sensitive information in the application's responses or data transmission.\n\nReference URLs:\n- ABC Mobile App Official Website: https://www.example.com/abc-mobile-app\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-87654\n- CVE-2022-87654: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-87654",
        "vulnerabilityCategory": "Information Disclosure",
        "extracted_repro_steps": [
            "1. Install the ABC Mobile App on a device.",
            "2. Attempt to access specific features or functionality without proper authentication or authorization.",
            "3. Observe the unintended exposure of sensitive information in the application's responses or data transmission."
        ]
    },
    {
        "description": "The XYZ Blogging Platform version 1.0.3 is vulnerable to stored cross-site scripting (XSS) attacks. Attackers can inject malicious scripts into the platform's comment section, which will be executed when viewed by other users.\n\nReproduction Steps:\n1. Create a new blog post on the XYZ Blogging Platform.\n2. Post a comment containing a script tag with a payload.\n3. View the blog post and observe the execution of the injected script.\n\nReference URLs:\n- XYZ Blogging Platform Official Website: https://www.example.com/xyz-blogging-platform\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-54321\n- CVE-2022-54321: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-54321",
        "vulnerabilityCategory": "Cross-Site Scripting (XSS)",
        "extracted_repro_steps": [
            "1. Create a new blog post on the XYZ Blogging Platform.",
            "2. Post a comment containing a script tag with a payload.",
            "3. View the blog post and observe the execution of the injected script."
        ]
    },
    {
        "description": "The authentication system of the ABC platform version 1.1.0 is vulnerable to blind SQL injection. By exploiting this vulnerability, an attacker can infer information from the database by analyzing the application's responses to crafted SQL queries.\n\nReproduction Steps:\n1. Access the ABC platform login page.\n2. Craft a SQL injection payload in the login fields to trigger a time delay or conditional response from the application.\n3. Observe the behavior of the application's responses, which can reveal information about the underlying database.\n\nReference URLs:\n- ABC Platform Official Website: https://www.example.com/abc-platform\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-23456\n- CVE-2022-23456: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-23456",
        "vulnerabilityCategory": "SQL Injection (SQLi)",
        "extracted_repro_steps": [
            "1. Access the ABC platform login page.",
            "2. Craft a SQL injection payload in the login fields to trigger a time delay or conditional response from the application.",
            "3. Observe the behavior of the application's responses, which can reveal information about the underlying database."
        ]
    },
    {
        "description": "The firmware of XYZ network devices version 2.0.1 is vulnerable to command injection. Attackers can inject malicious commands through specially crafted requests, potentially gaining unauthorized access or control over the affected devices.\n\nReproduction Steps:\n1. Establish a network connection to the XYZ network device.\n2. Craft a malicious command injection payload to execute arbitrary commands on the device.\n3. Send the payload to the device and verify the successful execution of the injected commands.\n\nReference URLs:\n- XYZ Network Devices Official Website: https://www.example.com/xyz-network-devices\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-87654\n- CVE-2022-87654: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-87654",
        "vulnerabilityCategory": "Remote Code Execution (RCE)",
        "extracted_repro_steps": [
            "1. Establish a network connection to the XYZ network device.",
            "2. Craft a malicious command injection payload to execute arbitrary commands on the device.",
            "3. Send the payload to the device and verify the successful execution of the injected commands."
        ]
    },
    {
        "description": "The ABC Admin Panel version 2.0.0 contains a cross-site request forgery (CSRF) vulnerability. Attackers can create malicious web pages or send crafted requests to trick authenticated administrators into executing unintended actions.\n\nReproduction Steps:\n1. Craft a malicious webpage that includes forged requests targeting sensitive actions in the ABC Admin Panel.\n2. Trick an authenticated administrator into visiting the malicious webpage or clicking on a malicious link.\n3. Observe the successful execution of unauthorized actions upon the administrator's interaction with the malicious content.\n\nReference URLs:\n- ABC Admin Panel Official Website: https://www.example.com/abc-admin-panel\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-45678\n- CVE-2022-45678: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-45678",
        "vulnerabilityCategory": "Cross-Site Request Forgery (CSRF)",
        "extracted_repro_steps": [
            "1. Craft a malicious webpage that includes forged requests targeting sensitive actions in the ABC Admin Panel.",
            "2. Trick an authenticated administrator into visiting the malicious webpage or clicking on a malicious link.",
            "3. Observe the successful execution of unauthorized actions upon the administrator's interaction with the malicious content."
        ]
    },
    {
        "description": "The configuration files of XYZ application version 1.2.3 contain sensitive information, such as database credentials and API keys, that are accessible to unauthorized users. This can lead to potential misuse or unauthorized access to critical systems.\n\nReproduction Steps:\n1. Locate the configuration files in the XYZ application installation directory.\n2. Open the files using a text editor or file viewer.\n3. Observe the presence of sensitive information, including database credentials, API keys, or other confidential data.\n\nReference URLs:\n- XYZ Application Official Website: https://www.example.com/xyz-application\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-78901\n- CVE-2022-78901: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-78901",
        "vulnerabilityCategory": "Information Disclosure",
        "extracted_repro_steps": [
            "1. Locate the configuration files in the XYZ application installation directory.",
            "2. Open the files using a text editor or file viewer.",
            "3. Observe the presence of sensitive information, including database credentials, API keys, or other confidential data."
        ]
    }
]

In [9]:
# Create a formatter for the examples
example_template = PromptTemplate(input_variables=["description", "vulnerabilityCategory", "extracted_repro_steps"],
                                 template="""
                                            Description: {description}
                                            Vulnerability Category: {vulnerabilityCategory}
                                            Extracted Repro Steps: {extracted_repro_steps}
                                """
                                 )

In [12]:
#sanity check
print(example_template.format(**example_submissions[0]))


                                            Description: XYZ Content Management System version 4.8.2 is susceptible to SQL injection attacks. An attacker can manipulate input fields to execute unauthorized SQL queries, potentially leading to data breaches or unauthorized access to the underlying database.

Reproduction Steps:
1. Access the XYZ Content Management System login page.
2. Enter a specially crafted SQL injection payload in the username or password field.
3. Submit the login request and observe the response, which may reveal sensitive information or provide unauthorized access.

Reference URLs:
- XYZ Content Management System Official Website: https://www.example.com/xyz-cms
- Vendor Security Advisory: https://www.example.com/security/advisory-2022-12345
- CVE-2022-12345: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-12345
                                            Vulnerability Category: SQL Injection (SQLi)
                                            Extracted R

#### Prompt Engineering

In [13]:
# prefix (instructions prior to the examples)
prefix = """
Given the description, classify it into one of these categories:
- Cross-Site Scripting (XSS)
- SQL Injection (SQLi)
- Remote Code Execution (RCE)
- Cross-Site Request Forgery (CSRF)
- Information Disclosure

And then, extract the relevant reproduction(repro) steps from the description.
If you can't find anything relevant, do not make up anything. Just output "None"
The classification will be based on the text description. 

Here are some examples:
"""

In [14]:
#Create a Pydantic JSON parser for output validation
class Response(BaseModel):
    vulnerabilityCategory: str=Field(description="classification of the description")
    extracted_repro_steps: List[str]=Field(description="relevant reproduction steps parsed from the description")

output_parser = PydanticOutputParser(pydantic_object=Response)

In [15]:
#Few-shot Learning Prompt Template
fewshot_template = FewShotPromptTemplate(
    examples=example_submissions,
    example_prompt=example_template,
    prefix=prefix,
    suffix="{format_instructions}\n{description}",
    input_variables=["description"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()} 
)

In [16]:
#Sanity Check: the final prompt
print(fewshot_template.format(description=test_submission["description"]))


Given the description, classify it into one of these categories:
- Cross-Site Scripting (XSS)
- SQL Injection (SQLi)
- Remote Code Execution (RCE)
- Cross-Site Request Forgery (CSRF)
- Information Disclosure

And then, extract the relevant reproduction(repro) steps from the description.
If you can't find anything relevant, do not make up anything. Just output "None"
The classification will be based on the text description. 

Here are some examples:



                                            Description: XYZ Content Management System version 4.8.2 is susceptible to SQL injection attacks. An attacker can manipulate input fields to execute unauthorized SQL queries, potentially leading to data breaches or unauthorized access to the underlying database.

Reproduction Steps:
1. Access the XYZ Content Management System login page.
2. Enter a specially crafted SQL injection payload in the username or password field.
3. Submit the login request and observe the response, which may reveal se

#### LLM Response

In [17]:
test_submission["description"]

'A cross-site scripting (XSS) vulnerability has been identified in the ABC Web Application version 1.2.3. \n                                    The application fails to properly sanitize user input in the search functionality, \n                                    allowing an attacker to inject malicious scripts into the returned search results. \n                                    This can lead to session hijacking, data theft, or further attacks.\n                                    \n\nReproduction Steps:\n1. Navigate to the ABC Web Application search page.\n                                    \n2. In the search input field, enter a script tag with a payload.\n3. \n                                    Submit the search request and observe the execution of the injected script.\n                                    \n\nReference URLs:\n- ABC Web Application Official Website: https://www.example.com/abc-web-app\n- Vendor Security Advisory: https://www.example.com/security/advisory-2022-

In [18]:
llm_chain = LLMChain(llm=llm, prompt=fewshot_template)
new_description = test_submission["description"]
result = llm_chain.run(new_description)
parsed_result = output_parser.parse(result).dict()

In [19]:
parsed_result["vulnerabilityCategory"]

'Cross-Site Scripting (XSS)'

In [25]:
parsed_result["extracted_repro_steps"]

['1. Navigate to the ABC Web Application search page.',
 '2. In the search input field, enter a script tag with a payload.',
 '3. Submit the search request and observe the execution of the injected script.']

#### Conclusion
- GPT-model successfully generated required output items in this sample testing
- Further validation needed with a larger actual submission dataset from the stakeholders to test the robustness of this particular approach.