## Machine Learning in Systems Security: Initial Assessment of LLM effectiveness against Social Engineering Attacks

The following notebook walks through an initial exploration of various Large Language Models (LLMs) on detecting whether malicious information is solicited from LLM-generated chats that simulate normal human interaction. In the following code, we test the following models with a zero-shot approach:

- Llama-3.1-8B-Instruct
- Gemini 2.0 Flash
- GPT-4

### Prerequisites:

If you have not used pip install to install any of the imported packages below, it would be a good idea to do so. Use the following example as a model of how to write pip install queries:

```!pip install openai```

### What are the goals of this project?
- Can we do MORE, with a less pipelined approach? 
- Chain-of-Thought is a prompting strategy that can help the model "think" through whether a conversation is considered:

In [None]:
import json
import torch

# Base and Modelling Packages:
import re
import numpy as np
import pandas as pd
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig
from huggingface_hub import login

# LLM response-formatting packages (Langchain):
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from typing import Optional

# Miscellaneous:
from dotenv import load_dotenv

# Import helper methods:
from helpers import *

with open('data/annotated_test.json') as f:
    content = f.read()
    data = json.loads(content)

### Initial Step:

First we will need to preprocess the test data provided through the .json files. The .json files are located in the data folder. See helpers.py for more details on how the data is preprocessed:

In [None]:
msgs, ground_truth = preprocess_test_data(data)

### Llama 3.1 Model Testing:

The following code will test Llama 3.1 to see whether the LLM can accurately detect whether malicious information is solicited through a zero-shot approach. The prompt is noted in the helper file but as a reference we put it here:

```The following is an interaction between two people. Determine whether malicious information is solicited from the conversation chain. Reply with yes or no. <|eot_id|> <|start_header_id|> {convo} <|end_header_id|```

In [None]:
# Define the model and tokenizer for Llama 3.1-8B:

TOKEN = ''
model_name = "meta-llama/Llama-3.1-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(model_name, token=TOKEN, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, token=TOKEN)

# Add EOS tag to prevent Llama 3.1 from producing open-ended text generation:
generation_config = model.generation_config
generation_config.pad_token_id = tokenizer.eos_token_id

In [None]:
# Define the Transformers Pipeline and read the ground truth dataset data into a DataFrame:
pipeline = transformers.pipeline("text-generation", model=model, torch_dtype=torch.float16, tokenizer = tokenizer, device='cuda')
llama_llm_results = get_llama_results(pipeline, msgs)

In [None]:
# Extract the LLM results:

filtered_results = [re.search(r'<\|end_header_id\|>.*\n\n(.*)', entry).group(1).strip() for entry in llama_llm_results]
filtered_results = [True if re.search(r'[Y|y]es', entry) else False for entry in filtered_results]

In [None]:
print(f"Llama-3-8B Results: {get_accuracy(filtered_results, ground_truth)}")

### Google Gemini 2.0 Flash:

We will perform the same zero-shot prompting strategy with Google Gemini's model:

In [None]:
# Run this command if you don't have this package installed:
!pip install -q -U google-genai

In [None]:
from google import genai
from google.genai.types import GenerateContentConfig
import time

client = genai.Client(api_key='')

llm_results_gemini = []
count = 0

for convo in msgs:
    response = client.models.generate_content(model = 'models/gemini-2.0-flash',
                                              contents = f"{convo}",
                                              config=GenerateContentConfig(
                                                  system_instruction=[
                                                      "The following is an interaction between two people.",
                                                      "Determine whether malicious information has been solicited \
                                                      from the conversation chain. Reply with yes or no.",
                                                  ]
                                              )
                                             )
    llm_results_gemini.append(response.text)
    count = count + 1
    print(f"Prompts processed: {count}")
    print("--------")
    
    # Google Gemini only processes 15 queries per minute 
    time.sleep(5)

### OpenAI GPT-4:
 

In [None]:
import openai
from openai import OpenAI

client = OpenAI(api_key = '')

completion = client.chat.completions.create(
    model = "gpt",
    messages = [{
        "role": "user",
        "content": "Write a one-sentence bedtime story about a unicorn."
    }]
)

### Initial Proposal: