## A tutorial to generate evidence in a standard and structured format.

### Benefits of Using Standardized formats for forensics evidence
- Consistency: easier to compare and analyze different pieces of evidence
- Interoperability: exchange of evidence across different systems and platforms
- Accuracy: reduces the risk of errors and omissions
- Automation: facilitate the use of automated tools and technologies, such as machine learning algorithms, for evidence analysis.

### Solution: Structured Threat Information eXpression (STIX)
- Share information about cyber threats
    - think of it as a common language that everyone in the cybersecurity community can use to communicate effectively
    - improve their threat intelligence capabilities
- Include basic predefined objects can be used as `digital forensics evidence`
    - email, URL, indentity, etc.
- Community support: maintained by the Organization for the Advancement of Structured Information Standards (OASIS)
    - open sourced
    - tools and library support
- Adaptability: flexible and can be extended to accommodate new types of threat information as the cybersecurity landscape evolves.

### Example of `email-message`
```
in STIX
    {
        "type": "email-message",
        "id": "email-message--c79b6bde-4f4c-4b38-a8c8-fb82921d6b97",
        "is_multipart": false,
        "subject": "Urgent Benefits Package Update",
        "from_ref": "email-addr--0c0d2094-df97-45a7-9e9c-223569a9e798",
        "body": "Please click the link to review the changes to your benefits package."
    }

    vs.
without STIX

        "Email": {
        "From": "support@banksecure.com",
        "Subject": "Urgent: Verify Your Account Now",
        "Content": "strange email asking to verify account details urgently"
    }
```

### Goal
- Capture threat information in STIX directly from the conversation
- Evidence entities and/or relationships are in the STIX

### Step 1: Download libraries and files for the lab
- Make use you download necessary library and files.
- All downloaded and saved files can be located in the `content` folder if using google Colab

In [None]:
!pip install python-dotenv
!pip install dspy-ai==2.4.17
!pip install graphviz

# ✅ Download required file
!wget https://raw.githubusercontent.com/frankwxu/digital-forensics-lab/main/AI4Forensics/CKIM2024/PhishingAttack/PhishingAttackScenarioDemo/conversation.txt

# ✅ Import libraries
import dspy
import os
import openai
import json
from dotenv import load_dotenv
from IPython.display import display

In [None]:
from google.colab import files
uploaded = files.upload()

### Step 2: Config DSPy with openAI
- You `MUST` have an openAI api key
- load an openAI api key from `openai_api_key.txt` file
- or, hard code your open api key

In [None]:
def set_dspy():
    # ==============set openAI enviroment=========
    # Path to your API key file
    key_file_path = "openai_api_key.txt"

    # Load the API key from the file
    with open(key_file_path, "r") as file:
        openai_api_key = file.read().strip()

    # Set the API key as an environment variable
    os.environ["OPENAI_API_KEY"] = openai_api_key
    openai.api_key = os.environ["OPENAI_API_KEY"]
    turbo = dspy.OpenAI(model="gpt-3.5-turbo", max_tokens=2000, temperature=0.5)
    dspy.settings.configure(lm=turbo)
    return turbo
    # ==============end of set openAI enviroment=========


def set_dspy_hardcode_openai_key():
    os.environ["OPENAI_API_KEY"] = "sk-proj-yourapikeyhere"
    openai.api_key = os.environ["OPENAI_API_KEY"]
    turbo = dspy.OpenAI(model="gpt-3.5-turbo", temperature=0, max_tokens=2000)
    dspy.settings.configure(lm=turbo)
    return turbo


# provide `openai_api_key.txt` with your openAI api key
turbo = set_dspy()
# optionally, hard code your openAI api key at line 21
# turbo=set_dspy_hardcode_openai_key()

### Step 3: Load the cyber incident repot (e.g., conversation)

In [None]:
def load_text_file(file_path):
    """
    Load a text file and return its contents as a string.

    Parameters:
    file_path (str): The path to the text file.

    Returns:
    str: The contents of the text file.
    """
    try:
        with open(file_path, "r") as file:
            contents = file.read()
        return contents
    except FileNotFoundError:
        return "File not found."
    except Exception as e:
        return f"An error occurred: {e}"


conversation = load_text_file("conversation.txt")
print(conversation)

Alice: Hey Bob, I just got a strange email from support@banksecure.com. It says I need to verify my account details urgently. The subject line was "Urgent: Verify Your Account Now". The email looks suspicious to me.

Bob: Hi Alice, that does sound fishy. Can you forward me the email? I’ll take a look at the headers to see where it came from.

Alice: Sure, forwarding it now.

Bob: Got it. Let’s see... The email came from IP address 192.168.10.45, but the domain banksecure.com is not their official domain. It's actually registered to someone in Russia.

Alice: That’s definitely not right. Should I be worried?

Bob: We should investigate further. Did you click on any links or download any attachments?

Alice: I did click on a link that took me to a page asking for my login credentials. I didn't enter anything though. The URL was http://banksecure-verification.com/login.

Bob: Good call on not entering your details. Let’s check the URL. This domain was just registered two days ago. It’s hi

### Step 4: Tell an LLM `WHAT` are the inputs/outputs by defining DSPy: Signature

- A signature is one of the basic building blocks in DSPy's prompt programming
- It is a declarative specification of input/output behavior of a DSPy module
    - Think about a function signature
- Allow you to tell the LLM what it needs to do.
    - Don't need to specify how we should ask the LLM to do it.
- The following signature identifies a list of evidence based on the conversation
    - Inherit from `dspy.Signature`
    - Exact `ONE` input, e.g., the conversation
    - Exact `ONE` output, e.g., cyber threat information in JSON

In [None]:
class STIXGenerator(dspy.Signature):
    """Describe a conversation in STIX, which stands for Structured Threat Information eXpression, is a standardized language for representing cyber threat information."""

    question: str = dspy.InputField(
        desc="a conversation describing a cyber incident between an IT Security Specialist and an employee."
    )

    answer: str = dspy.OutputField(
        desc="the formalized STIX in JSON representing cyber threat information based on the conversation, e.g., [{object 1}, {object 2}, ... {object n}]"
    )

### Step 5: Tell an LLM `HOW` to generate answer:

The following function generates and saves threat information from a conversation using a specified signature.

#### Parameters:
- `signature` (dspy.Signature): The signature defining the input and output structure for evidence identification.
- `conversation` (str): The conversation text to analyze for threat information.
- `output_file` (str): The file path where the identified threat information will be saved as JSON.

#### Returns:
None. The function saves the result to a file and prints a confirmation message.

In [None]:
def generate_answer_CoT(signature, conversation, output_file):
    generate_answer = dspy.ChainOfThought(signature)
    answer = generate_answer(question=conversation).answer  # here we use the module

    with open(output_file, "w") as json_file:
        result = json.loads(answer)
        print(answer)
        json.dump(result, json_file, indent=4)
    print(f"The evidence has been saved to the file {output_file}")

### Step 6: Generate entities using `STIXGenerator`

In [None]:
output_file = "03_output.json"
generate_answer_CoT(
    STIXGenerator,
    conversation,
    output_file,
)

### Step 7: Inspect the last prompt send to the LLM

You want to check:
- Prompt Description Section: Description in the signature
- Format Section: `Following the following format.`
    - Pay attention to a new inserted field `REASONING: Let's think step by step ...`
- Result Section: a threat information in `.JSON`

In [None]:
turbo.inspect_history(n=1)