# F2. Vulnerablities Identification

Once the ``cpe_whitelist.csv`` has been compiled, the CVE API can be called to ingest CVEs relating to the user-defined CPEs.

## Information returned

Common Vulnerability Scoring System (CVSS) v3 is the prioritized version for returning information. However, CVSSv2 will be used when no information for a given CPE can be returned from CVSSv3.

Types of information returned:
* CVE ID
* CVE Title
* Attack complexity
* Attack Vector
* Availability Impact
* Base Score
* Base Severity
* Confidentiality Impact
* Exploitability Score
* Impact Score
* Integrity Impact
* Privileges Required
* Scope
* User Interaction
* Source
* Description
* Last Modified Date
* Vulnerability Published Data
* Type (Vulnerability Reference Lists)
* URL (Vulnerability Reference Lists)

## Intended Purpose of Code

The below code was generated by AI to create a simple to use tool that interacts with the NVD CVE API.

Key features:
* Loads ``cpe_whitelist.csv`` generated in __F1. Asset Inventory__
    * Either loads to a dataframe or saves to a new file for ingested CVE data to be appended to
        * Regardless of loading to dataframe or saving to new file, ingested CVE data will need to be joined to corresponding CPE
            * Due to this, duplicate CVEs are to be expected and necessary for risk scoring granularity
* Ingests CVEs according to CPEs recorded in ``cpe_whitlelist.csv``
    * Columns for CVE data should reflect types of information discussed in the __Information returned__ section above
* Joins CVE data to corresponding CPEs
    * A surrogate key is generated for the joined dataset
    * A new row will be created for each CVE that corresponds to one (1) CPE

## Known Issues



In [30]:
# ingest_all_cves.ipynb  –  pull every CVE for every CPE in the whitelist
import os, time, requests, pandas as pd
from pathlib import Path

# ── config ─────────────────────────────────────────────────────────────
api_url      = "https://services.nvd.nist.gov/rest/json/cves/2.0"
api_key      = os.getenv("NVD_api_key") or "ea5501a5-24fe-4720-80e3-2abed401d92f"
whitelist    = Path("../data/cpe_whitelist.csv")
rate_secs    = 1.0
per_page     = 2000
progress_every = 25                # how often to print a status line
# ───────────────────────────────────────────────────────────────────────

def fetch_cves_for_cpe(cpe_uri: str) -> list[dict]:
    parts = cpe_uri.split(":")
    if len(parts) < 6:
        return []
    cpe_query = ":".join(parts[:6]) if parts[5] == "*" else cpe_uri

    all_items, start = [], 0
    headers = {"apiKey": api_key}
    
    while True:
        params = {
            "cpeName":        cpe_query,
            "resultsPerPage": per_page,
            "startIndex":     start,
        }
        r = requests.get(api_url, headers=headers, params=params, timeout=30)
        if r.status_code != 200:
            print(f"⚠️ {cpe_query[:70]} → {r.status_code}")
            break

        data   = r.json()
        items  = data.get("vulnerabilities", [])
        all_items.extend(items)

        start += per_page
        if start >= data.get("totalResults", 0) or not items:
            break
        time.sleep(rate_secs)
    return all_items

def flatten(v: dict, cpe_uri: str) -> dict:
    cve      = v["cve"]
    metrics  = cve.get("metrics", {})
    cvss31   = metrics.get("cvssMetricV31", [{}])[0].get("cvssData", {})
    cvss30   = metrics.get("cvssMetricV30", [{}])[0].get("cvssData", {})
    cvss     = cvss31 or cvss30
    descr    = next((d["value"] for d in cve.get("descriptions", []) if d["lang"] == "en"), "")
    cwes     = [
        d["value"] for w in cve.get("weaknesses", [])
        for d in w.get("description", []) if d.get("lang") == "en"
    ]
    
    refs     = " | ".join(r["url"] for r in cve.get("references", [])[:10])
    tags = ", ".join(tag for r in cve.get("references", [])[:10] for tag in r.get("tags", []))
    
    return {
        "cveID": cve["id"],
        "cpeName": cpe_uri,
        "published": cve.get("published"),
        "last_modified": cve.get("lastModified"),
        "vectorString": cvss.get("vectorString"),
        "baseScore": cvss.get("baseScore"),
        "baseSeverity": cvss.get("baseSeverity"),
        "attackVector": cvss.get("attackVector"),
        "attackComplexity": cvss.get("attackComplexity"),
        "privilegesRequired": cvss.get("privilegesRequired"),
        "userInteraction": cvss.get("userInteraction"),
        "scope": cvss.get("scope"),
        "confidentialityImpact": cvss.get("confidentialityImpact"),
        "integrityImpact": cvss.get("integrityImpact"),
        "availabilityImpact": cvss.get("availabilityImpact"),
        #"exploitabilityScore": ,
        #"impactScore": ,
        "cwes": ";".join(cwes) if cwes else None,
        "description": descr[:1000],
        "references": refs,
        "tags": tags,
        "full_json": v,
    }

# ── 1. load whitelist ─────────────────────────────────────────────────
cpe_list = (
    pd.read_csv(whitelist, dtype=str)["cpeName"]
      .dropna()
      .unique()
)
print(f"📋  {len(cpe_list):,} unique CPEs to query")

# ── 2. query API ───────────────────────────────────────────────────────
rows = []
for idx, cpe in enumerate(cpe_list, start=1):
    if idx % progress_every == 0 or idx == 1:
        print(f"  → {idx}/{len(cpe_list)}   {cpe[:70]}…")

    for vuln in fetch_cves_for_cpe(cpe):
        rows.append(flatten(vuln, cpe))

print("✔️  API queries finished")

# ── 3. build DataFrame & de-dup ────────────────────────────────────────
df = (
    pd.DataFrame(rows)
      .drop_duplicates(subset=["cveID", "cpeName"])
      .reset_index(drop=True)
)
print(f"🗂  {df.shape[0]:,} CVE–CPE rows collected")

📋  2 unique CPEs to query
  → 1/2   cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:*:*:classic:*:*:*…
✔️  API queries finished
🗂  316 CVE–CPE rows collected


In [31]:
df.head(4)

Unnamed: 0,cveID,cpeName,published,last_modified,vectorString,baseScore,baseSeverity,attackVector,attackComplexity,privilegesRequired,userInteraction,scope,confidentialityImpact,integrityImpact,availabilityImpact,cwes,description,references,tags,full_json
0,CVE-2021-39836,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...,2021-09-29T16:15:08.513,2024-11-21T06:20:20.730,CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H,7.8,HIGH,LOCAL,LOW,NONE,REQUIRED,UNCHANGED,HIGH,HIGH,HIGH,CWE-416,Acrobat Reader DC versions 2021.005.20060 (and...,https://helpx.adobe.com/security/products/acro...,"Release Notes, Vendor Advisory, Release Notes,...","{'cve': {'id': 'CVE-2021-39836', 'sourceIdenti..."
1,CVE-2021-39837,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...,2021-09-29T16:15:08.573,2024-11-21T06:20:20.890,CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H,7.8,HIGH,LOCAL,LOW,NONE,REQUIRED,UNCHANGED,HIGH,HIGH,HIGH,CWE-416,Acrobat Reader DC versions 2021.005.20060 (and...,https://helpx.adobe.com/security/products/acro...,"Release Notes, Vendor Advisory, Release Notes,...","{'cve': {'id': 'CVE-2021-39837', 'sourceIdenti..."
2,CVE-2021-39838,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...,2021-09-29T16:15:08.633,2024-11-21T06:20:21.040,CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H,7.8,HIGH,LOCAL,LOW,NONE,REQUIRED,UNCHANGED,HIGH,HIGH,HIGH,CWE-416,Acrobat Reader DC versions 2021.005.20060 (and...,https://helpx.adobe.com/security/products/acro...,"Release Notes, Vendor Advisory, Release Notes,...","{'cve': {'id': 'CVE-2021-39838', 'sourceIdenti..."
3,CVE-2021-39839,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...,2021-09-29T16:15:08.693,2024-11-21T06:20:21.190,CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H,7.8,HIGH,LOCAL,LOW,NONE,REQUIRED,UNCHANGED,HIGH,HIGH,HIGH,CWE-416,Acrobat Reader DC versions 2021.005.20060 (and...,https://helpx.adobe.com/security/products/acro...,"Release Notes, Vendor Advisory, Release Notes,...","{'cve': {'id': 'CVE-2021-39839', 'sourceIdenti..."
