# Step 1: Extract STIG Findings from SCAP XML

This notebook extracts comprehensive STIG findings from SCAP XML files using the enhanced parser.

**Input:** SCAP XML file (e.g., `xml_files/sample_data/node2.example.com-STIG-20250710162433.xml`)

**Output:** 
- Enhanced findings JSON file
- Ansible targets JSON file
- Processing summary

In [2]:
# Import required libraries
import sys
import os
import json
from pathlib import Path
from datetime import datetime

# Add src to path - use absolute path resolution
notebook_dir = Path(__file__).parent if '__file__' in globals() else Path.cwd()
src_dir = notebook_dir.parent / 'src'
sys.path.insert(0, str(src_dir))

# Import our enhanced STIG parser
from stig_parser_complete_info_extraction import EnhancedSTIGParser  # type: ignore

print("üì¶ Libraries imported successfully")
print(f"üêç Python version: {sys.version.split()[0]}")
print(f"üìÅ Current working directory: {os.getcwd()}")
print(f"üìÇ Source directory: {src_dir}")

üì¶ Libraries imported successfully
üêç Python version: 3.11.12
üìÅ Current working directory: /Users/wjackson/Developer/AI-Building-Blocks/ansible_playbook_from_stig/notebooks
üìÇ Source directory: /Users/wjackson/Developer/AI-Building-Blocks/ansible_playbook_from_stig/src


In [4]:
# Configuration - Update these paths as needed
SCAP_XML_FILE = "../../xml_files/sample_data/node2.example.com-STIG-20250710162433.xml"
OUTPUT_DIR = "../findings"

# Create timestamp for this run
RUN_TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
print(f"üïê Run timestamp: {RUN_TIMESTAMP}")

# Verify input file exists
scap_file_path = Path(SCAP_XML_FILE)
if not scap_file_path.exists():
    print(f"‚ùå SCAP XML file not found: {scap_file_path}")
    print("Please update SCAP_XML_FILE path in the cell above")
else:
    print(f"‚úÖ SCAP XML file found: {scap_file_path}")
    print(f"üìè File size: {scap_file_path.stat().st_size / (1024*1024):.1f} MB")

üïê Run timestamp: 20250714_110147
‚úÖ SCAP XML file found: ../../xml_files/sample_data/node2.example.com-STIG-20250710162433.xml
üìè File size: 31.5 MB


In [5]:
# Initialize the enhanced STIG parser
print("üöÄ Initializing Enhanced STIG Parser...")
parser = EnhancedSTIGParser()

# Parse the SCAP XML file
print(f"üîç Parsing SCAP XML file: {SCAP_XML_FILE}")
findings = parser.parse_stig_file(SCAP_XML_FILE)

print(f"\nüìä Parsing Results:")
print(f"   Total findings extracted: {len(findings)}")

if findings:
    # Get summary statistics
    summary = parser.get_findings_summary()
    print(f"\nüìà Summary Statistics:")
    print(f"   Failed findings: {summary['failed_count']}")
    print(f"   Actionable findings: {summary['actionable_count']}")
    print(f"   Critical severity: {summary['critical_count']}")
    print(f"   High severity: {summary['high_count']}")
    
    print(f"\nüîç By Severity: {summary['by_severity']}")
    print(f"üéØ By Target Type: {summary['by_target_type']}")
    print(f"üìã By Status: {summary['by_status']}")
else:
    print("‚ùå No findings extracted. Check the XML file format.")

üöÄ Initializing Enhanced STIG Parser...
üîç Parsing SCAP XML file: ../../xml_files/sample_data/node2.example.com-STIG-20250710162433.xml
üîç Parsing enhanced STIG file: ../../xml_files/sample_data/node2.example.com-STIG-20250710162433.xml
üìã Detected namespaces: ['xccdf', 'arf', 'ds', 'oval', 'cpe']
üìÑ Document metadata: ARF
üîÑ Phase 1: Extracting rule definitions...
üîç Processed 1529 rule elements, extracted 1529 definitions
üìö Found 1529 rule definitions
üîÑ Phase 2: Extracting test results and merging...
üéØ Processing TestResult: xccdf_org.open-scap_testresult_xccdf_org.ssgproject.content_profile_stig
üîç Processed 1 TestResult elements
‚úÖ Created 1529 enhanced findings

üìä Parsing Results:
   Total findings extracted: 1529

üìà Summary Statistics:
   Failed findings: 1529
   Actionable findings: 435
   Critical severity: 0
   High severity: 69

üîç By Severity: {'medium': 1221, 'high': 69, 'low': 119, 'unknown': 120}
üéØ By Target Type: {'unknown': 1094, 'pa

In [6]:
# Show sample findings for inspection
if findings:
    print("üîç Sample Findings (first 3):")
    for i, finding in enumerate(findings[:3]):
        print(f"\nüìã Finding {i+1}:")
        print(f"   Rule ID: {finding.rule_id}")
        print(f"   Severity: {finding.severity}")
        print(f"   Status: {finding.status}")
        print(f"   Title: {finding.title[:60]}...")
        
        if finding.target_info:
            print(f"   Target Type: {finding.target_info.target_type}")
            print(f"   Target Name: {finding.target_info.target_name}")
            print(f"   Ansible Module: {finding.target_info.ansible_module}")
        else:
            print(f"   Target Info: None (manual review required)")
        
        print(f"   Compliance: CCI={len(finding.compliance.cci_refs)}, NIST={len(finding.compliance.nist_refs)}")

üîç Sample Findings (first 3):

üìã Finding 1:
   Rule ID: xccdf_org.ssgproject.content_rule_prefer_64bit_os
   Severity: medium
   Status: unknown
   Title: Prefer to use a 64-bit Operating System when supported...
   Target Type: unknown
   Target Name: prefer_64bit_os
   Ansible Module: debug
   Compliance: CCI=0, NIST=1

üìã Finding 2:
   Rule ID: xccdf_org.ssgproject.content_rule_package_prelink_removed
   Severity: medium
   Status: unknown
   Title: Package "prelink" Must not be Installed...
   Target Type: package
   Target Name: prelink
   Ansible Module: yum
   Compliance: CCI=0, NIST=1

üìã Finding 3:
   Rule ID: xccdf_org.ssgproject.content_rule_disable_prelink
   Severity: medium
   Status: unknown
   Title: Disable Prelinking...
   Target Type: unknown
   Target Name: disable_prelink
   Ansible Module: debug
   Compliance: CCI=0, NIST=1


In [8]:
# Export findings to JSON files
if findings:
    # Create output directory
    output_dir = Path(OUTPUT_DIR)
    output_dir.mkdir(exist_ok=True)
    
    # Generate output filenames with timestamp
    base_name = scap_file_path.stem
    findings_file = output_dir / f"{base_name}_{RUN_TIMESTAMP}_enhanced_findings.json"
    targets_file = output_dir / f"{base_name}_{RUN_TIMESTAMP}_ansible_targets.json"
    
    # Export enhanced findings
    print(f"üíæ Exporting enhanced findings to: {findings_file}")
    parser.export_findings_json(str(findings_file))
    
    # Export ansible targets
    print(f"üíæ Exporting ansible targets to: {targets_file}")
    parser.export_ansible_targets(str(targets_file))
    
    print(f"\n‚úÖ Export completed successfully!")
    print(f"üìÅ Output files:")
    print(f"   Enhanced findings: {findings_file}")
    print(f"   Ansible targets: {targets_file}")
    
    # Store variables for next notebook
    ENHANCED_FINDINGS_FILE = str(findings_file)
    ANSIBLE_TARGETS_FILE = str(targets_file)
    
    print(f"\nüîÑ Variables for next notebook:")
    print(f"   ENHANCED_FINDINGS_FILE = '{ENHANCED_FINDINGS_FILE}'")
    print(f"   ANSIBLE_TARGETS_FILE = '{ANSIBLE_TARGETS_FILE}'")
    print(f"   RUN_TIMESTAMP = '{RUN_TIMESTAMP}'")
else:
    print("‚ùå No findings to export")

üíæ Exporting enhanced findings to: ../findings/node2.example.com-STIG-20250710162433_20250714_110147_enhanced_findings.json
üíæ Exported 1529 enhanced findings to ../findings/node2.example.com-STIG-20250710162433_20250714_110147_enhanced_findings.json
üíæ Exporting ansible targets to: ../findings/node2.example.com-STIG-20250710162433_20250714_110147_ansible_targets.json
üéØ Exported 435 Ansible targets to ../findings/node2.example.com-STIG-20250710162433_20250714_110147_ansible_targets.json

‚úÖ Export completed successfully!
üìÅ Output files:
   Enhanced findings: ../findings/node2.example.com-STIG-20250710162433_20250714_110147_enhanced_findings.json
   Ansible targets: ../findings/node2.example.com-STIG-20250710162433_20250714_110147_ansible_targets.json

üîÑ Variables for next notebook:
   ENHANCED_FINDINGS_FILE = '../findings/node2.example.com-STIG-20250710162433_20250714_110147_enhanced_findings.json'
   ANSIBLE_TARGETS_FILE = '../findings/node2.example.com-STIG-2025071016

In [9]:
# Final summary and next steps
if findings:
    summary = parser.get_findings_summary()
    actionable_count = summary['actionable_count']
    total_count = summary['total_findings']
    failed_count = summary['failed_count']
    
    print("üéØ EXTRACTION SUMMARY")
    print("=" * 50)
    print(f"Total findings extracted: {total_count}")
    print(f"Actionable with targets: {actionable_count}")
    print(f"Failed findings: {failed_count}")
    print(f"Manual review needed: {total_count - actionable_count}")
    
    if actionable_count > 0:
        print(f"\n‚úÖ Ready for Step 2: Process {actionable_count} actionable findings")
        print(f"üìù Use the variables above in the next notebook (02_process_deterministic.ipynb)")
    else:
        print(f"\n‚ö†Ô∏è  No actionable findings found")
        print(f"üìù All {total_count} findings require manual review")
        
    print(f"\nüìä Processing Strategy:")
    print(f"   Deterministic targets: {actionable_count} findings")
    print(f"   LLM classification needed: {failed_count} findings")
    print(f"   Manual review: {total_count - actionable_count - failed_count} findings")
else:
    print("‚ùå Extraction failed - check the SCAP XML file format")

üéØ EXTRACTION SUMMARY
Total findings extracted: 1529
Actionable with targets: 435
Failed findings: 1529
Manual review needed: 1094

‚úÖ Ready for Step 2: Process 435 actionable findings
üìù Use the variables above in the next notebook (02_process_deterministic.ipynb)

üìä Processing Strategy:
   Deterministic targets: 435 findings
   LLM classification needed: 1529 findings
   Manual review: -435 findings
