# F2. Vulnerablities Identification

Once the ``cpe_whitelist.csv`` has been compiled, the CVE API can be called to ingest CVEs relating to the user-defined CPEs.

## Information returned

Common Vulnerability Scoring System (CVSS) v3 is the prioritized version for returning information. However, CVSSv2 will be used when no information for a given CPE can be returned from CVSSv3.

Types of information returned:
* CVE ID
* Attack complexity
* Attack Vector
* Availability Impact
* Base Score
* Base Severity
* Confidentiality Impact
* Exploitability Score
* Impact Score
* Integrity Impact
* Privileges Required
* Scope
* User Interaction
* Source
* Description
* Last Modified Date
* Vulnerability Published Data
* Type (Vulnerability Reference Lists)
* URL (Vulnerability Reference Lists)

## Intended Purpose of Code

The below code was generated by AI to create a simple to use tool that interacts with the NVD CVE API.

Key features:
* Loads ``cpe_whitelist.csv`` generated in __F1. Asset Inventory__
    * Either loads to a dataframe or saves to a new file for ingested CVE data to be appended to
        * Regardless of loading to dataframe or saving to new file, ingested CVE data will need to be joined to corresponding CPE
            * Due to this, duplicate CVEs are to be expected and necessary for risk scoring granularity
* Ingests CVEs according to CPEs recorded in ``cpe_whitlelist.csv``
    * Columns for CVE data should reflect types of information discussed in the __Information returned__ section above
* Joins CVE data to corresponding CPEs
    * A surrogate key is generated for the joined dataset
    * A new row will be created for each CVE that corresponds to one (1) CPE

To handle cases where a CPE from the whitelist has no associated CVEs, a record must be generated that explicitly inserts a row for every CPE that returns no CVEs. This allows users to:
* Track all inventoried assets, even those with zero known vulnerabilities.
* Preserve one-to-one continuity between your asset inventory and your vulnerability table.
* Avoid accidental data loss or gaps in your final reports.

In [4]:
import os, time, requests, pandas as pd
from pathlib import Path

# ── config ─────────────────────────────────────────────────────────────
api_url      = "https://services.nvd.nist.gov/rest/json/cves/2.0"
api_key      = os.getenv("NVD_api_key") or "ea5501a5-24fe-4720-80e3-2abed401d92f"
whitelist    = Path("../data/cpe_whitelist.csv")
rate_secs    = 1.0
per_page     = 2000
progress_every = 25
# ───────────────────────────────────────────────────────────────────────

def fetch_cves_for_cpe(cpe_uri: str) -> list[dict]:
    parts = cpe_uri.split(":")
    if len(parts) < 6:
        return []
    cpe_query = ":".join(parts[:6]) if parts[5] == "*" else cpe_uri
    all_items, start = [], 0
    headers = {"apiKey": api_key}
    while True:
        params = {
            "cpeName": cpe_query,
            "resultsPerPage": per_page,
            "startIndex": start,
        }
        r = requests.get(api_url, headers=headers, params=params, timeout=30)
        if r.status_code != 200:
            print(f"⚠️ {cpe_query[:70]} → {r.status_code}")
            break
        data = r.json()
        items = data.get("vulnerabilities", [])
        all_items.extend(items)
        start += per_page
        if start >= data.get("totalResults", 0) or not items:
            break
        time.sleep(rate_secs)
    return all_items

def flatten(v: dict, cpe_uri: str) -> dict:
    cve      = v["cve"]
    metrics  = cve.get("metrics", {})
    cvss31   = metrics.get("cvssMetricV31", [{}])[0].get("cvssData", {})
    cvss30   = metrics.get("cvssMetricV30", [{}])[0].get("cvssData", {})
    cvss     = cvss31 or cvss30
    descr    = next((d["value"] for d in cve.get("descriptions", []) if d["lang"] == "en"), "")
    cwes     = [
        d["value"] for w in cve.get("weaknesses", [])
        for d in w.get("description", []) if d.get("lang") == "en"
    ]
    refs     = " | ".join(r["url"] for r in cve.get("references", [])[:10])
    tags = ", ".join(tag for r in cve.get("references", [])[:10] for tag in r.get("tags", []))
    return {
        "cveID": cve["id"],
        "cpeName": cpe_uri,
        "published": cve.get("published"),
        "last_modified": cve.get("lastModified"),
        "vectorString": cvss.get("vectorString"),
        "baseScore": cvss.get("baseScore"),
        "baseSeverity": cvss.get("baseSeverity"),
        "attackVector": cvss.get("attackVector"),
        "attackComplexity": cvss.get("attackComplexity"),
        "privilegesRequired": cvss.get("privilegesRequired"),
        "userInteraction": cvss.get("userInteraction"),
        "scope": cvss.get("scope"),
        "confidentialityImpact": cvss.get("confidentialityImpact"),
        "integrityImpact": cvss.get("integrityImpact"),
        "availabilityImpact": cvss.get("availabilityImpact"),
        "cwes": ";".join(cwes) if cwes else None,
        "description": descr[:1000],
        "references": refs,
        "tags": tags,
        "full_json": v,
    }

# ── 1. load whitelist ─────────────────────────────────────────────────
assets = pd.read_csv(whitelist, dtype=str)
cpe_list = assets["cpeName"].dropna().unique()
print(f"📋  {len(cpe_list):,} unique CPEs to query")

# ── 2. query API, record CPEs with no CVEs ─────────────────────────────
rows = []
for idx, cpe in enumerate(cpe_list, start=1):
    if idx % progress_every == 0 or idx == 1:
        print(f"  → {idx}/{len(cpe_list)}   {cpe[:70]}…")
    vulns = fetch_cves_for_cpe(cpe)
    if vulns:
        for vuln in vulns:
            rows.append(flatten(vuln, cpe))
    else:
        # Insert row for CPEs with no CVEs
        rows.append({
            "cveID": None,
            "cpeName": cpe,
            "published": None,
            "last_modified": None,
            "vectorString": None,
            "baseScore": None,
            "baseSeverity": None,
            "attackVector": None,
            "attackComplexity": None,
            "privilegesRequired": None,
            "userInteraction": None,
            "scope": None,
            "confidentialityImpact": None,
            "integrityImpact": None,
            "availabilityImpact": None,
            "cwes": None,
            "description": "NO CVEs FOUND FOR THIS ASSET",
            "references": None,
            "tags": None,
            "full_json": None
        })

print("✔️  API queries finished")

# ── 3. build DataFrame & de-dup ────────────────────────────────────────
df = (
    pd.DataFrame(rows)
      .drop_duplicates(subset=["cveID", "cpeName"])
      .reset_index(drop=True)
)
print(f"🗂  {df.shape[0]:,} CVE–CPE rows collected")

📋  5 unique CPEs to query
  → 1/5   cpe:2.3:a:alteryx:alteryx_server:2022.1.1.42590:*:*:*:*:*:*:*…
✔️  API queries finished
🗂  320 CVE–CPE rows collected


In [18]:
# check that everything looks ok
df.head(4)

Unnamed: 0,cveID,cpeName,published,last_modified,vectorString,baseScore,baseSeverity,attackVector,attackComplexity,privilegesRequired,userInteraction,scope,confidentialityImpact,integrityImpact,availabilityImpact,cwes,description,references,tags,full_json
0,CVE-2023-26961,cpe:2.3:a:alteryx:alteryx_server:2022.1.1.4259...,2023-08-08T20:15:10.080,2024-11-21T07:52:07.460,CVSS:3.1/AV:N/AC:L/PR:H/UI:R/S:C/C:L/I:L/A:N,4.8,MEDIUM,NETWORK,LOW,HIGH,REQUIRED,CHANGED,LOW,LOW,NONE,CWE-79,Alteryx Server 2022.1.1.42590 does not employ ...,http://alteryx.com | https://gist.github.com/D...,"Vendor Advisory, Exploit, Third Party Advisory...","{'cve': {'id': 'CVE-2023-26961', 'sourceIdenti..."
1,CVE-2020-14728,cpe:2.3:a:oracle:suitecommerce_advanced:-:*:*:...,2020-08-27T00:15:12.050,2024-11-21T05:03:59.317,CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:L/A:N,5.4,MEDIUM,NETWORK,LOW,LOW,REQUIRED,CHANGED,LOW,LOW,NONE,NVD-CWE-noinfo,Vulnerability in the SuiteCommerce Advanced (S...,https://system.netsuite.com/app/help/helpcente...,"Permissions Required, Vendor Advisory, Permiss...","{'cve': {'id': 'CVE-2020-14728', 'sourceIdenti..."
2,CVE-2020-14729,cpe:2.3:a:oracle:suitecommerce_advanced:-:*:*:...,2020-08-27T00:15:12.097,2024-11-21T05:03:59.443,CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:H/A:N,5.4,MEDIUM,NETWORK,HIGH,LOW,REQUIRED,UNCHANGED,LOW,HIGH,NONE,NVD-CWE-noinfo,Vulnerability in SuiteCommerce Advanced (SCA) ...,https://system.netsuite.com/app/help/helpcente...,"Permissions Required, Vendor Advisory, Permiss...","{'cve': {'id': 'CVE-2020-14729', 'sourceIdenti..."
3,,cpe:2.3:a:oracle:suitecommerce_advanced:2020.1...,,,,,,,,,,,,,,,NO CVEs FOUND FOR THIS ASSET,,,


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 320 entries, 0 to 319
Data columns (total 20 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   cveID                  318 non-null    object 
 1   cpeName                320 non-null    object 
 2   published              318 non-null    object 
 3   last_modified          318 non-null    object 
 4   vectorString           318 non-null    object 
 5   baseScore              318 non-null    float64
 6   baseSeverity           318 non-null    object 
 7   attackVector           318 non-null    object 
 8   attackComplexity       318 non-null    object 
 9   privilegesRequired     318 non-null    object 
 10  userInteraction        318 non-null    object 
 11  scope                  318 non-null    object 
 12  confidentialityImpact  318 non-null    object 
 13  integrityImpact        318 non-null    object 
 14  availabilityImpact     318 non-null    object 
 15  cwes  

In [21]:
df.describe(include='all')

Unnamed: 0,cveID,cpeName,published,last_modified,vectorString,baseScore,baseSeverity,attackVector,attackComplexity,privilegesRequired,userInteraction,scope,confidentialityImpact,integrityImpact,availabilityImpact,cwes,description,references,tags,full_json
count,318,320,318,318,318,318.0,318,318,318,318,318,318,318,318,318,318,320,318,318,318
unique,318,5,318,318,21,,3,2,2,3,2,2,3,3,3,34,152,45,13,318
top,CVE-2025-27174,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...,2025-03-11T18:15:35.600,2025-04-28T16:48:26.390,CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H,,HIGH,LOCAL,LOW,NONE,REQUIRED,UNCHANGED,HIGH,HIGH,HIGH,CWE-416,Acrobat Reader DC version 22.001.2011x (and ea...,https://helpx.adobe.com/security/products/acro...,"Vendor Advisory, Vendor Advisory","{'cve': {'id': 'CVE-2025-27174', 'sourceIdenti..."
freq,1,315,1,1,147,,183,310,312,313,317,314,284,185,199,97,20,65,161,1
mean,,,,,,6.728302,,,,,,,,,,,,,,
std,,,,,,1.329364,,,,,,,,,,,,,,
min,,,,,,2.5,,,,,,,,,,,,,,
25%,,,,,,5.5,,,,,,,,,,,,,,
50%,,,,,,7.8,,,,,,,,,,,,,,
75%,,,,,,7.8,,,,,,,,,,,,,,


In [22]:
assets

Unnamed: 0,Title,cpeName
0,Alteryx Server 2022.1.1.42590,cpe:2.3:a:alteryx:alteryx_server:2022.1.1.4259...
1,Oracle SuiteCommerce Advanced,cpe:2.3:a:oracle:suitecommerce_advanced:-:*:*:...
2,Oracle SuiteCommerce Advanced 2020.1.4,cpe:2.3:a:oracle:suitecommerce_advanced:2020.1...
3,Adobe Acrobat Reader 20.004.30006 Classic Edition,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...
4,Tableau Desktop 2021.1,cpe:2.3:a:tableau:tableau_desktop:2021.1:*:*:*...


In [15]:
# merge cve outputs and assets to quickly know which asset corresponds to which cve
vulnerabilities = pd.merge(assets,df, how='inner', on='cpeName')
# add a surrogate key to respect relational database rules
vulnerabilities['sid'] = vulnerabilities.index
# move "sid" to the first column
vulnerabilities.insert(0, "sid", vulnerabilities.pop("sid"))
vulnerabilities.head(4)

Unnamed: 0,sid,Title,cpeName,cveID,published,last_modified,vectorString,baseScore,baseSeverity,attackVector,...,userInteraction,scope,confidentialityImpact,integrityImpact,availabilityImpact,cwes,description,references,tags,full_json
0,0,Alteryx Server 2022.1.1.42590,cpe:2.3:a:alteryx:alteryx_server:2022.1.1.4259...,CVE-2023-26961,2023-08-08T20:15:10.080,2024-11-21T07:52:07.460,CVSS:3.1/AV:N/AC:L/PR:H/UI:R/S:C/C:L/I:L/A:N,4.8,MEDIUM,NETWORK,...,REQUIRED,CHANGED,LOW,LOW,NONE,CWE-79,Alteryx Server 2022.1.1.42590 does not employ ...,http://alteryx.com | https://gist.github.com/D...,"Vendor Advisory, Exploit, Third Party Advisory...","{'cve': {'id': 'CVE-2023-26961', 'sourceIdenti..."
1,1,Oracle SuiteCommerce Advanced,cpe:2.3:a:oracle:suitecommerce_advanced:-:*:*:...,CVE-2020-14728,2020-08-27T00:15:12.050,2024-11-21T05:03:59.317,CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:L/A:N,5.4,MEDIUM,NETWORK,...,REQUIRED,CHANGED,LOW,LOW,NONE,NVD-CWE-noinfo,Vulnerability in the SuiteCommerce Advanced (S...,https://system.netsuite.com/app/help/helpcente...,"Permissions Required, Vendor Advisory, Permiss...","{'cve': {'id': 'CVE-2020-14728', 'sourceIdenti..."
2,2,Oracle SuiteCommerce Advanced,cpe:2.3:a:oracle:suitecommerce_advanced:-:*:*:...,CVE-2020-14729,2020-08-27T00:15:12.097,2024-11-21T05:03:59.443,CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:H/A:N,5.4,MEDIUM,NETWORK,...,REQUIRED,UNCHANGED,LOW,HIGH,NONE,NVD-CWE-noinfo,Vulnerability in SuiteCommerce Advanced (SCA) ...,https://system.netsuite.com/app/help/helpcente...,"Permissions Required, Vendor Advisory, Permiss...","{'cve': {'id': 'CVE-2020-14729', 'sourceIdenti..."
3,3,Oracle SuiteCommerce Advanced 2020.1.4,cpe:2.3:a:oracle:suitecommerce_advanced:2020.1...,,,,,,,,...,,,,,,,NO CVEs FOUND FOR THIS ASSET,,,


In [14]:
# demonstrates how a no-cve-returned scenario is handled
vulnerabilities[vulnerabilities['Title'] == "Tableau Desktop 2021.1"]

Unnamed: 0,sid,Title,cpeName,cveID,published,last_modified,vectorString,baseScore,baseSeverity,attackVector,...,userInteraction,scope,confidentialityImpact,integrityImpact,availabilityImpact,cwes,description,references,tags,full_json
319,319,Tableau Desktop 2021.1,cpe:2.3:a:tableau:tableau_desktop:2021.1:*:*:*...,,,,,,,,...,,,,,,,NO CVEs FOUND FOR THIS ASSET,,,


In [16]:
# save to file
vulnerabilities.to_csv('../data/vuln_catalogue.csv')