# Vulnerability Assessment (DSPy + Gemini) - Google Colab

This notebook clones the LLM-Assisted-Container-Security-Analysis repository from GitHub, installs dependencies, and runs a DSPy/Gemini-based vulnerability assessment on combined scanner reports (Trivy + Grype).

## Before Running
1. You'll need a **GEMINI_API_KEY** (Google AI Studio API key)
2. Run all cells in order from top to bottom
3. When prompted, enter your API key in the secure input field

## Step 1: Clone the Repository

In [None]:
import os

# Clone the repository if not already present
if not os.path.exists('/content/LLM-Assisted-Container-Security-Analysis'):
    !git clone https://github.com/satyam-thakur/LLM-Assisted-Container-Security-Analysis.git
    print('✓ Repository cloned successfully')
else:
    print('✓ Repository already exists')

# Change to the repository directory
os.chdir('/content/LLM-Assisted-Container-Security-Analysis')
print(f'Current directory: {os.getcwd()}')

## Step 2: Install Dependencies

In [None]:
# Install required packages from vul_analysis requirements
!pip install -q -r vul_analysis/requirements.txt
print('✓ Dependencies installed successfully')

## Step 3: Configure API Key

Enter your Google Gemini API key. You can get one from [Google AI Studio](https://makersuite.google.com/app/apikey).

In [None]:
import os
from getpass import getpass

# Prompt for API key securely
if 'GEMINI_API_KEY' not in os.environ:
    api_key = getpass('Enter your GEMINI_API_KEY: ')
    os.environ['GEMINI_API_KEY'] = api_key
    print('✓ API key configured')
else:
    print('✓ API key already configured')

# Set default model if not specified
if 'GEMINI_MODEL' not in os.environ:
    os.environ['GEMINI_MODEL'] = 'gemini-1.5-flash'

# Optional: Set temperature
if 'LM_TEMPERATURE' not in os.environ:
    os.environ['LM_TEMPERATURE'] = '0.2'

## Step 4: Verify Files and Setup

In [None]:
import sys
import os

# Add the repository root to Python path
repo_root = '/content/LLM-Assisted-Container-Security-Analysis'
if repo_root not in sys.path:
    sys.path.insert(0, repo_root)

# Verify key files exist
scanner_file = os.path.join(repo_root, 'Scanner/combined_results/hyperledger_fabric-peer_1.1.0_combined.json')
vul_analysis_dir = os.path.join(repo_root, 'vul_analysis')

print('Checking files:')
print(f'  Scanner file exists: {os.path.exists(scanner_file)}')
print(f'  vul_analysis directory exists: {os.path.exists(vul_analysis_dir)}')
print(f'  Scanner file path: {scanner_file}')

# Create outputs directory if it doesn't exist
outputs_dir = os.path.join(vul_analysis_dir, 'outputs')
os.makedirs(outputs_dir, exist_ok=True)
print(f'  Outputs directory: {outputs_dir}')

## Step 5: Run Vulnerability Assessment

This cell loads the scanner results, runs the AI-powered assessment, and displays a summary.

**Note:** This may take several minutes depending on the number of vulnerabilities.

In [None]:
import json
from vul_analysis.run_assessment import assess

# Define input file path
INPUT_FILE = '/content/LLM-Assisted-Container-Security-Analysis/Scanner/combined_results/hyperledger_fabric-peer_1.1.0_combined.json'

print('Starting vulnerability assessment...')
print(f'Input file: {INPUT_FILE}')
print('This may take several minutes...\n')

# Run the assessment
result = assess(INPUT_FILE, output_dir='/content/LLM-Assisted-Container-Security-Analysis/vul_analysis/outputs')

print('\n✓ Assessment complete!')
print(f'JSON report: {result["json_path"]}')
print(f'Markdown report: {result["md_path"]}')

## Step 6: Display Results Summary

In [None]:
# Display summary DataFrame
print('\n=== Vulnerability Assessment Summary ===')
summary_df = result['summary_df']
print(f'Total vulnerabilities assessed: {len(summary_df)}')
print(f'\nAffected vulnerabilities: {summary_df["affected"].sum()}')
print(f'Not affected: {(~summary_df["affected"]).sum()}')

# Show label distribution
print('\n=== Label Distribution ===')
print(summary_df['label'].value_counts())

# Show risk distribution
print('\n=== Risk Distribution ===')
print(summary_df['risk'].value_counts())

# Display first few results
print('\n=== First 10 Results ===')
display(summary_df[['vuln_id', 'package_name', 'affected', 'label', 'risk']].head(10))

## Step 7: View Detailed Results (Optional)

In [None]:
# Display full dataframe with reasons and remediation
print('=== Detailed Results ===')
display(summary_df[['vuln_id', 'package_name', 'affected', 'label', 'risk', 'reason', 'remediation']])

## Step 8: View JSON Output (Optional)

In [None]:
# Load and display JSON output
with open(result['json_path'], 'r', encoding='utf-8') as f:
    json_output = json.load(f)

print('=== JSON Output Structure ===')
print(f"Keys: {list(json_output.keys())}")
print(f"\nInput metadata:")
print(f"  Image: {json_output['input']['image']}")
print(f"  Scanners: {json_output['input']['scanners_used']}")
print(f"  Total vulnerabilities: {json_output['input']['total_vulnerabilities']}")
print(f"\nAssessed: {len(json_output['output'])} vulnerabilities")

# Show first result as example
print('\n=== Example Result ===')
print(json.dumps(json_output['output'][0], indent=2))

## Step 9: Preview Markdown Report (Optional)

In [None]:
# Display first 50 lines of markdown report
with open(result['md_path'], 'r', encoding='utf-8') as f:
    md_content = f.read()

lines = md_content.split('\n')
preview_lines = min(50, len(lines))
print(f'=== Markdown Report Preview (first {preview_lines} lines) ===')
print('\n'.join(lines[:preview_lines]))
if len(lines) > 50:
    print(f'\n... ({len(lines) - 50} more lines)')

## Step 10: Download Results

Run this cell to download the JSON and Markdown reports to your local machine.

In [None]:
from google.colab import files

# Download JSON report
print('Downloading JSON report...')
files.download(result['json_path'])

# Download Markdown report
print('Downloading Markdown report...')
files.download(result['md_path'])

print('✓ Downloads initiated')

## Notes

- **DSPy Integration**: If DSPy cannot be configured with Gemini in the Colab environment, the pipeline automatically falls back to direct Gemini API calls.
- **Output Location**: All outputs are saved to `/content/LLM-Assisted-Container-Security-Analysis/vul_analysis/outputs/`
- **Performance**: Processing ~650 vulnerabilities may take 10-30 minutes depending on API rate limits and response times.
- **Labels Used**: `vulnerable`, `code_not_present`, `code_not_reachable`, `mitigated`, `fixed`, `false_positive`
- **Custom Input**: To analyze a different scanner file, modify the `INPUT_FILE` path in Step 5.

## Troubleshooting

- **API Key Issues**: Make sure your GEMINI_API_KEY is valid and has sufficient quota.
- **Import Errors**: Re-run Step 2 to ensure all dependencies are installed.
- **Rate Limits**: If you hit rate limits, the script may fail partway through. Consider adding rate limiting or retry logic.

## Repository

GitHub: [satyam-thakur/LLM-Assisted-Container-Security-Analysis](https://github.com/satyam-thakur/LLM-Assisted-Container-Security-Analysis)