# F1. Asset Inventory: Ingesting CPEs

Before CVEs can be ingested and processed for risk scoring, ``cpe_name``s must be identified and ingested as this is a required parameter when using the NVD CVE API. When no ``cpe_name`` is defined in the NVD CVE API call, all ``cpe_names`` are returned. 

## Scope Management

For the purposes of managing the scope of this application tool: 

* It is important to note that each CPE is likely to have multiple CVEs.
* In some cases, a CPE can have hundreds (100's) of CVEs
    * especially if the CPE is associated with a well-established product (e.g. Windows products)
* When saving CPEs to the ``cpe_whitelist.csv``, only save the CPEs necessary for inventorying assets
    * improves risk scoring and overall application performance
    * prevents system errors/disruptions
 
## Keyword Search Tips

Because this search engine is __not__ an exact keyword search, it is recommended that users:
* start with vendor name only (e.g. Adobe)
* refine search by adding product keyword (e.g. Acrobat)
* further refine search by adding version number (e.g. 20.004.30006)

## Intended Purpose of Code

The below code was generated by AI to create a simple to use tool that interacts with the NVD CPE API.

Key features:
* User input to keyword search the CPE API
* Machine outputs results
* Prompts user with a yes/no/exit scenario before saving results to file
* Saves search results to "../data/cpe_whitelist.csv"
    * Does not overwrite previous outputs to file
    * Appends new outputs to existing outputs in file

In [2]:
import csv, time, requests, os
from pathlib import Path
from datetime import datetime

# --- config -----------------------------------------------------------
api_url = "https://services.nvd.nist.gov/rest/json/cpes/2.0"
api_key = "ea5501a5-24fe-4720-80e3-2abed401d92f"
rate_secs = 1.2
whitelist = Path("../data/cpe_whitelist.csv")
header = ['WrittenAt', 'Title', 'cpeName']
UNDO_LOG = Path("../data/cpe_undo_log.csv")
undo_header = ['UnwrittenAt', 'Title', 'cpeName']
# ----------------------------------------------------------------------

def search_cpe_names(keyword):
    all_results = []
    start_index = 0
    results_per_page = 100
    headers = {"apiKey": api_key}
    while True:
        params = {
            "keywordSearch": keyword,
            "resultsPerPage": results_per_page,
            "startIndex": start_index
        }
        response = requests.get(api_url, params=params, headers=headers)
        if response.status_code != 200:
            print(f"Error fetching data for '{keyword}': {response.status_code} - {response.text}")
            break
        data = response.json()
        cpe_matches = data.get('products', [])
        if not cpe_matches:
            break
        for item in cpe_matches:
            metadata = item.get('cpe', {}).get('titles', [])
            title = next((t['title'] for t in metadata if t.get('lang') == 'en'), metadata[0]['title'] if metadata else '')
            cpe_uri = item.get('cpe', {}).get('cpeName', '')
            if cpe_uri:
                all_results.append({'title': title, 'cpeName': cpe_uri})
        total_results = data.get('totalResults', 0)
        start_index += results_per_page
        if start_index >= total_results:
            break
        time.sleep(rate_secs)
    return all_results

def write_entries_to_csv(entries, path, header):
    write_header = not path.exists() or os.path.getsize(path) == 0
    now = datetime.now().isoformat(timespec='milliseconds')
    rows_to_write = [[now, e['title'], e['cpeName']] for e in entries]
    with open(path, "a", newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        if write_header:
            writer.writerow(header)
        writer.writerows(rows_to_write)
    return now, len(rows_to_write)

def log_removal_to_undo_log(removed_rows, undo_log, undo_header):
    undo_write_header = not undo_log.exists() or os.path.getsize(undo_log) == 0
    now = datetime.now().isoformat(timespec='milliseconds')
    rows_to_log = [[now, title, cpename] for _, title, cpename in removed_rows]
    with open(undo_log, "a", newline='', encoding='utf-8') as undofile:
        undowriter = csv.writer(undofile)
        if undo_write_header:
            undowriter.writerow(undo_header)
        undowriter.writerows(rows_to_log)

def undo_last_write(path, header, undo_log, undo_header):
    # Remove last batch of rows from whitelist and record them in undo log with UnwrittenAt
    with open(path, "r", encoding='utf-8') as infile:
        lines = list(csv.reader(infile))
    if lines and lines[0] == header:
        header_row = lines[0]
        data_rows = lines[1:]
    else:
        header_row = header
        data_rows = lines
    if not data_rows:
        print("No record(s) found.")
        return
    # Find the most recent WrittenAt timestamp (last batch)
    last_written_at = data_rows[-1][0]
    rows_to_remove = [row for row in data_rows if row[0] == last_written_at]
    if not rows_to_remove:
        print("No record batch found.")
        return
    # Remove these rows from data_rows
    remaining_rows = [row for row in data_rows if row[0] != last_written_at]
    with open(path, "w", newline='', encoding='utf-8') as outfile:
        writer = csv.writer(outfile)
        writer.writerow(header_row)
        writer.writerows(remaining_rows)
    # Log these removals to undo log with UnwrittenAt
    log_removal_to_undo_log(rows_to_remove, undo_log, undo_header)
    print(f"Undid last write: removed {len(rows_to_remove)} rows and logged them to the undo log.")

def main():
    while True:
        user_input = input("Enter keywords, or type 'undo' to undo last write, or 'exit': ").strip()
        if user_input.lower() in ("exit", "e"):
            print("Exiting program. Goodbye!")
            break
        if user_input.lower() == "undo":
            undo_last_write(whitelist, header, UNDO_LOG, undo_header)
            continue
        search_keywords = [kw.strip() for kw in user_input.split(',')]
        print("Search Keywords:", search_keywords)
        all_results = []
        for keyword in search_keywords:
            if keyword.lower() in ("exit", "e"):
                print("Exiting program. Goodbye!")
                return
            time.sleep(rate_secs)
            matches = search_cpe_names(keyword)
            print(f"Number of matching results: {len(matches)}")
            print(f"\nMatches for '{keyword}':")
            if matches:
                for match in matches:
                    print(f"   Title: {match['title']}")
                    print(f" - CPE Name: {match['cpeName']}")
            else:
                print("No matches found.")
            all_results.extend(matches)
        if all_results:
            while True:
                save_input = input(f"\nDo you want to save {len(all_results)} results to file? (yes/no/exit): ").strip().lower()
                if save_input in ("yes", "y"):
                    now, nrows = write_entries_to_csv(all_results, whitelist, header)
                    print(f"\n{nrows} results appended to {whitelist.resolve()} at {now}")
                    break
                elif save_input in ("no", "n"):
                    print("\nYou chose not to save. Please enter new keywords.")
                    break
                elif save_input in ("exit", "e"):
                    print("Exiting program. Goodbye!")
                    return
                else:
                    print("Invalid input. Please enter 'yes', 'no', or 'exit'.")
        else:
            print("\nNo results to save. Please enter new keywords.")

if __name__ == "__main__":
    main()

Enter keywords, or type 'undo' to undo last write, or 'exit':  microsoft exchange


Search Keywords: ['microsoft exchange']
Number of matching results: 286

Matches for 'microsoft exchange':
   Title: Computer Associates Unicenter Management Microsoft Exchange
 - CPE Name: cpe:2.3:a:ca:unicenter_management_microsoft_exchange:-:*:*:*:*:*:*:*
   Title: Microsoft exchange_instant_messenger
 - CPE Name: cpe:2.3:a:microsoft:exchange_instant_messenger:-:*:*:*:*:*:*:*
   Title: Microsoft exchange_instant_messenger 4.5
 - CPE Name: cpe:2.3:a:microsoft:msn_messenger_service_for_exchange:4.5:*:*:*:*:*:*:*
   Title: Microsoft exchange_instant_messenger 4.6
 - CPE Name: cpe:2.3:a:microsoft:msn_messenger_service_for_exchange:4.6:*:*:*:*:*:*:*
   Title: Microsoft Exchange Server
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:-:*:*:*:*:*:*:*
   Title: Microsoft exchange_srv 4.0
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:4.0:*:*:*:*:*:*:*
   Title: Microsoft exchange_srv 5.0
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:5.0:*:*:*:*:*:*:*
   Title: Microsoft exchange_srv 5.


Do you want to save 286 results to file? (yes/no/exit):  Microsoft Exchange Server 2019


Invalid input. Please enter 'yes', 'no', or 'exit'.



Do you want to save 286 results to file? (yes/no/exit):  n



You chose not to save. Please enter new keywords.


Enter keywords, or type 'undo' to undo last write, or 'exit':  Microsoft Exchange Server 2019


Search Keywords: ['Microsoft Exchange Server 2019']
Number of matching results: 15

Matches for 'Microsoft Exchange Server 2019':
   Title: Microsoft Exchange Server 2019 Cumulative Update 1
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:2019:cumulative_update_1:*:*:*:*:*:*
   Title: Microsoft Exchange Server 2019 Cumulative Update 2
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:2019:cumulative_update_2:*:*:*:*:*:*
   Title: Microsoft Exchange Server 2019 Cumulative Update 3
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:2019:cumulative_update_3:*:*:*:*:*:*
   Title: Microsoft Exchange Server 2019 Cumulative Update 4
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:2019:cumulative_update_4:*:*:*:*:*:*
   Title: Microsoft Exchange Server 2019 Cumulative Update 5
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:2019:cumulative_update_5:*:*:*:*:*:*
   Title: Microsoft Exchange Server 2019
 - CPE Name: cpe:2.3:a:microsoft:exchange_server:2019:-:*:*:*:*:*:*
   Title: Microsoft Exchan


Do you want to save 15 results to file? (yes/no/exit):  y



15 results appended to C:\Users\hgbtx\Desktop\MIS433\final-project\cyber-risk-scoring\data\cpe_whitelist.csv at 2025-06-08T22:12:59.597


Enter keywords, or type 'undo' to undo last write, or 'exit':  e


Exiting program. Goodbye!
