<a href="https://colab.research.google.com/github/inp2/Security-Plus-Data-Science/blob/master/ai_cyber_ollama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


How to build a free LLM cybersecurity lab with Google Colab and Ollama

Learn how to set up a free cybersecurity lab using Large Language Models (LLMs), Google Colab, and Ollama. Explore practical use cases for AI in cybersecurity.

Introduction

In this notebook, we will explore the powerful combination of Large Language Models (LLMs) and cybersecurity using Ollama. Ollama allows us to download and run LLMs locally, providing several advantages for cybersecurity applications:

    Data privacy: Process sensitive information without sending it to external servers.
    Customization: Easily fine-tune models for specific cybersecurity tasks.
    Offline capability: Perform analysis without an internet connection.
    Cost-effective: Avoid usage fees associated with cloud-based LLM services.

We'll demonstrate two practical use-cases:

    Generating malware information cards
    Creating a cybersecurity news digest

While we're using Google Colab for this demonstration due to its accessibility, keep in mind some limitations:

    Runtime limits (sessions typically disconnect after 12 hours)
    Variability in GPU availability
    Potential network bandwidth constraints

For production use, consider running this notebook on a dedicated machine or cloud instance with more stable resources.

Let's dive in and see how we can leverage LLMs for cybersecurity tasks!

Installation of ollama

In [None]:
!curl https://ollama.ai/install.sh | sh

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0>>> Downloading ollama...
100 12030    0 12030    0     0  29528      0 --:--:-- --:--:-- --:--:-- 29557
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


Ollama is now installed. Ollama is an open-source tool that simplifies running large language models locally.
It provides an easy way to download, run, and manage various LLMs on your machine.

If you encounter any GPU-related issues, try the following troubleshooting steps:
1. Ensure CUDA is properly installed and configured.
2. Check if the correct GPU drivers are installed.
3. Verify that Ollama has access to the GPU resources.

Now, let's start Ollama in a separate thread so we can use it throughout this notebook.

We then start ollama in a separate thread so that we can use it after

Reference: https://stackoverflow.com/a/77828874

In [None]:
import os
import asyncio
import threading

# NB: You may need to set these depending and get cuda working depending which backend you are running.
# Set environment variable for NVIDIA library
# Set environment variables for CUDA
os.environ['PATH'] += ':/usr/local/cuda/bin'
# Set LD_LIBRARY_PATH to include both /usr/lib64-nvidia and CUDA lib directories
os.environ['LD_LIBRARY_PATH'] = '/usr/lib64-nvidia:/usr/local/cuda/lib64'

async def run_process(cmd, stdout=None, stderr=None):
    print('>>> starting', *cmd)
    process = await asyncio.create_subprocess_exec(
        *cmd,
        stdout=stdout or asyncio.subprocess.PIPE,
        stderr=stderr or asyncio.subprocess.PIPE
    )

    if stdout is None and stderr is None:
        async def pipe(lines):
            async for line in lines:
                print(line.decode().strip())

        await asyncio.gather(
            pipe(process.stdout),
            pipe(process.stderr),
        )
    else:
        await process.wait()

async def start_ollama_serve():
    await run_process(['ollama', 'serve'],
                      stdout=open(os.devnull, 'w'),
                      stderr=open(os.devnull, 'w'))

def run_async_in_thread(loop, coro):
    asyncio.set_event_loop(loop)
    loop.run_until_complete(coro)
    loop.close()

# Create a new event loop that will run in a new thread
new_loop = asyncio.new_event_loop()

# Start ollama serve in a separate thread so the cell won't block execution
thread = threading.Thread(target=run_async_in_thread, args=(new_loop, start_ollama_serve()))
thread.start()

# Wait 5s for ollama to load
import time
time.sleep(5)

>>> starting ollama serve


In [None]:
# We first need to install dependencies
!pip install openai pydantic instructor

Collecting openai
  Downloading openai-1.40.6-py3-none-any.whl.metadata (22 kB)
Collecting instructor
  Downloading instructor-1.3.7-py3-none-any.whl.metadata (14 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
  Downloading jiter-0.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting tenacity<9.0.0,>=8.4.1 (from instructor)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.40.6-py3-none-any.whl (361 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3

For this first example, we will use gemma2 from Google in its 9 billion parameters version. Its currently one of the best small (<12B parameters) and opensource model.

In [None]:
!ollama pull gemma2

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ 

In [None]:
MODEL = "gemma2"

In [None]:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
from datetime import datetime

import instructor


class Malware(BaseModel):
    name: str
    first_seen: datetime
    language: str
    architecture: str
    developer: str
    description: str

def get_malware_description(malware_name: str) -> Malware:
  client = instructor.from_openai(
      OpenAI(
          base_url="http://127.0.0.1:11434/v1",
          api_key="ollama",  # required, but unused
      ),
      mode=instructor.Mode.JSON,
  )

  resp = client.chat.completions.create(
      model=MODEL,
      messages=[
          {
              "role": "user",
              "content": malware_name,
          }
      ],
      response_model=Malware,
  )
  return resp

malwares = ["Wannacry", "Stuxnet", "Darkcomet"]
for malware in malwares:
  resp = get_malware_description(malware)
  print(resp.model_dump_json(indent=2))

{
  "name": "WannaCry",
  "first_seen": "2017-05-12T00:00:00Z",
  "language": "C/Assembly",
  "architecture": "x86 and x64",
  "developer": "Lazarus Group (North Korean state-sponsored)",
  "description": "A highly contagious ransomware attack that exploited a vulnerability in Microsoft Windows to encrypt victims' data"
}
{
  "name": "Stuxnet",
  "first_seen": "2010-06-21T00:00:00",
  "language": "Assembly",
  "architecture": "x86",
  "developer": "Unknown (Possibly Israel and United States)",
  "description": "Highly sophisticated malware specifically designed to sabotage Iranian nuclear facilities. It targeted industrial control systems (SCADA), exploiting vulnerabilities in Siemens PLCs to disrupt the uranium enrichment process."
}
{
  "name": "Darkcomet",
  "first_seen": "2006-09-15T00:00:00Z",
  "language": "C#",
  "architecture": "32-bit, 64-bit",
  "developer": "Unknown",
  "description": "Darkcomet is a popular modular remote access trojan (RAT) that has been used for various m

Define the ChatGPT Query Function. Create a function, call_gpt, to handle sending prompts to ChatGPT and receiving its responses.

In [None]:
def call_gpt(prompt):
    messages = [
        {
            "role": "system",
            "content": "You are a cybersecurity SOC analyst with more than 25 years of experience."
        },
        {
            "role": "user",
            "content": prompt
        }
    ]
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        max_tokens=2048,
        n=1,
        stop=None,
        temperature=0.7
    )
    return response.choices[0].message.content

Create the Threat Analysis Function. Now create a function, analyze_threat_data, which takes a file path as an argument and uses call_gpt to analyze the threat data.
Complete the Threat Analysis Function. Complete the analyze_threat_data function by adding the code to query ChatGPT for threat identification, IoC extraction, and contextual analysis.

In [None]:
def analyze_threat_data(file_path):
    # Read the raw threat data from the provided file
    with open(file_path, 'r') as file:
        raw_data = file.read()
    # Query ChatGPT to identify and categorize potential threats
    identified_threats = call_gpt(f"Analyze the following threat data and identify potential threats: {raw_data}")
    # Extract IoCs from the threat data
    extracted_iocs = call_gpt(f"Extract all indicators of compromise (IoCs) from the following threat data: {raw_data}")
    # Obtain a detailed context or narrative behind the identified threats
    threat_context = call_gpt(f"Provide a detailed context or narrative behind the identified threats in this data: {raw_data}")
    # Print the results
    print("Identified Threats:", identified_threats)
    print("\nExtracted IoCs:", extracted_iocs)
    print("\nThreat Context:", threat_context)

Run the Script. Finally, put it all together and run the main script.

---



In [None]:
if __name__ == "__main__":
    file_path = input("Enter the path to the raw threat data .txt file: ")
    analyze_threat_data(file_path)