In [17]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Exploring Security Advisories

This notebook explores the `Advisories` class and the parsed security advisory documents.

In [18]:
from snyk_ai.advisories import Advisories, Advisory, _Section, _Chunk
from snyk_ai.utils.markdown import BlockType
from snyk_ai import Models

# Load all advisories
ADVISORIES = Advisories("../data/advisories")
print(f"Loaded {len(ADVISORIES)} advisories")
print(f"Files: {' '.join(ADVISORIES.filenames)}")

# LLM used for summarizing code blocks
MODEL = Models.Llama_3_2

Loaded 8 advisories
Files: advisory-001.md advisory-002.md advisory-003.md advisory-004.md advisory-005.md advisory-006.md advisory-007.md advisory-008.md


## Advisory Overview

In [19]:
# List all advisories with their titles
for adv in ADVISORIES:
    print(f"{adv.filename}: {adv.title} ({len(adv.blocks)} blocks, {len(adv.sections)} sections)")

advisory-001.md: Cross-Site Scripting (XSS) in express-validator (43 blocks, 13 sections)
advisory-002.md: SQL Injection in webapp-auth (56 blocks, 15 sections)
advisory-003.md: Dependency Confusion in secure-config (73 blocks, 15 sections)
advisory-004.md: Path Traversal in data-processor (61 blocks, 15 sections)
advisory-005.md: Remote Code Execution in file-handler (65 blocks, 15 sections)
advisory-006.md: Cross-Site Request Forgery (CSRF) in api-client (57 blocks, 15 sections)
advisory-007.md: Server-Side Request Forgery (SSRF) in http-server (67 blocks, 15 sections)
advisory-008.md: Insecure Deserialization in json-parser (61 blocks, 15 sections)


## Exploring a Single Advisory

In [20]:
adv = ADVISORIES["advisory-003.md"]

print(f"{adv.title}\n\n{adv.executive_summary}")

Dependency Confusion in secure-config

A dependency confusion vulnerability has been discovered in the `secure-config` package affecting versions 3.0.0 through 3.1.9. This vulnerability allows attackers to potentially inject malicious packages into the dependency resolution process by exploiting missing package integrity checks and scoped package naming conflicts.


In [21]:
print("Markdown blocks:")
for i, block in enumerate(adv.blocks):
    content_preview = block.content[:50].replace("\n", " ")
    if len(block.content) > 50:
        content_preview += "..."
    print(f"  {i:2}: {block.type.value:12} | {content_preview}")

Markdown blocks:
   0: header       | Security Advisory: Dependency Confusion in secure-...
   1: paragraph    | **CVE ID:** CVE-2024-1237   **Package:** secure-co...
   2: header       | Executive Summary
   3: paragraph    | A dependency confusion vulnerability has been disc...
   4: header       | Vulnerability Details
   5: header       | Description
   6: paragraph    | The `secure-config` package fails to verify packag...
   7: header       | Affected Versions
   8: table        | | Version Range | Status | Fixed Version | |------...
   9: header       | Attack Vector
  10: paragraph    | Dependency confusion attacks exploit the package r...
  11: list_item    | Publish a malicious package with the same name to ...
  12: list_item    | Use a version number higher than the private packa...
  13: list_item    | Wait for automated builds or installations to pull...
  14: header       | Vulnerable Configuration Example
  15: code_block   | {   "name": "my-application",   "version": "

## Sections

A section is all blocks between two headers.

In [22]:
# Section breakdown with chunks
print(f"Total sections: {len(adv.sections)}")

for i, sec in enumerate(adv.sections):
    print(f'\n---\n\n{i+1:2}. Section: "{sec.header.content}"')
    print()
    print(f"    Markdown blocks ({len(sec.blocks)}): {' '.join([b.type.value for b in sec.blocks])}")
    print()
    
    chunks = sec.get_chunks(model=MODEL)
    print(f"    Chunks ({len(chunks)}):")
    for chunk in chunks:
        text = chunk.text[:70] + "..." if len(chunk.text) > 70 else chunk.text
        # text = chunk.text
        print(f"      [{chunk.source_type.value}] {text}")


Total sections: 15

---

 1. Section: "Security Advisory: Dependency Confusion in secure-config"

    Markdown blocks (1): paragraph

    Chunks (1):
      [paragraph] **CVE ID:** CVE-2024-1237   **Package:** secure-config   **Ecosystem:*...

---

 2. Section: "Executive Summary"

    Markdown blocks (1): paragraph

    Chunks (2):
      [paragraph] A dependency confusion vulnerability has been discovered in the `secur...
      [paragraph] This vulnerability allows attackers to potentially inject malicious pa...

---

 3. Section: "Description"

    Markdown blocks (1): paragraph

    Chunks (3):
      [paragraph] The `secure-config` package fails to verify package integrity and does...
      [paragraph] Additionally, the package uses an unscoped name that could conflict wi...
      [paragraph] This creates a dependency confusion attack vector where attackers coul...

---

 4. Section: "Affected Versions"

    Markdown blocks (1): table

    Chunks (3):
      [table] version_range: ">=

In [23]:
ADVISORIES.init_vectordb(MODEL)

  from .autonotebook import tqdm as notebook_tqdm


In [24]:
ADVISORIES.search(
    "Explain how path traversal attacks work and show me a vulnerable code example."
)

[SearchResult(advisory-004.md:Vulnerable Code Example, d=0.360),
 SearchResult(advisory-004.md:Description, d=0.372),
 SearchResult(advisory-004.md:Attack Vector, d=0.415),
 SearchResult(advisory-004.md:Executive Summary, d=0.419),
 SearchResult(advisory-008.md:Executive Summary, d=0.469)]