# **1) Temporal Analysis**

In [0]:
-- 1) Temporal Analysis
-- yearly reserved vulnerability counts
-- All data from 2024?

SELECT
  YEAR(date_reserved) AS year,
  COUNT(*) AS vulnerability_count
FROM workspace.cve_silver.core
WHERE date_reserved IS NOT NULL
GROUP BY year
ORDER BY year

year,vulnerability_count
2023,1519
2024,35595
2025,1613


Databricks visualization. Run in Databricks to view.

In [0]:
-- yearly published vulnerability counts
-- All data from 2024?

SELECT
  YEAR(date_published) AS year,
  COUNT(*) AS vulnerability_count_published
FROM workspace.cve_silver.core
WHERE date_published IS NOT NULL
GROUP BY year
ORDER BY year

year,vulnerability_count_published
2024,32924
2025,5370


Databricks visualization. Run in Databricks to view.

In [0]:
-- Publication latency analysis: difference between dateReserved and datePublished

SELECT
  cve_id,
  date_reserved,
  date_published,
  DATEDIFF(
    date_published,
    date_reserved
  ) AS publication_latency_days
FROM workspace.cve_silver.core
WHERE date_reserved IS NOT NULL
  AND date_published IS NOT NULL
ORDER BY publication_latency_days DESC


cve_id,date_reserved,date_published,publication_latency_days
CVE-2024-0028,2023-11-16T22:58:45.676Z,2025-09-05T16:10:01.094Z,659
CVE-2024-25621,2024-02-08T22:26:33.511Z,2025-11-06T18:36:21.566Z,637
CVE-2024-21927,2024-01-03T16:43:09.233Z,2025-09-23T21:33:54.121Z,629
CVE-2024-21935,2024-01-03T16:43:14.976Z,2025-09-23T21:38:22.057Z,629
CVE-2024-21947,2024-01-03T16:43:21.322Z,2025-09-06T17:10:47.951Z,612
CVE-2024-21970,2024-01-03T16:43:28.699Z,2025-09-06T17:20:19.749Z,612
CVE-2024-21977,2024-01-03T16:43:30.196Z,2025-09-05T12:58:39.312Z,611
CVE-2024-26008,2024-02-14T09:18:43.245Z,2025-10-14T15:23:04.753Z,608
CVE-2024-25011,2024-02-02T21:33:13.076Z,2025-09-18T11:38:18.371Z,594
CVE-2024-31573,2024-04-05T00:00:00.000Z,2025-10-17T00:00:00.000Z,560


Databricks visualization. Run in Databricks to view.

Databricks visualization. Run in Databricks to view.

In [0]:
-- Seasonal vulnerability patterns
-- Monthly counts of reserved and published vulnerabilities

SELECT
  month,
  SUM(vulnerability_count_reserved) AS vulnerability_count_reserved,
  SUM(vulnerability_count_published) AS vulnerability_count_published
FROM (
  SELECT
    MONTH(date_reserved) AS month,
    COUNT(*) AS vulnerability_count_reserved,
    0 AS vulnerability_count_published
  FROM workspace.cve_silver.core
  WHERE date_reserved IS NOT NULL
  GROUP BY MONTH(date_reserved)

  UNION ALL

  SELECT
    MONTH(date_published) AS month,
    0 AS vulnerability_count_reserved,
    COUNT(*) AS vulnerability_count_published
  FROM workspace.cve_silver.core
  WHERE date_published IS NOT NULL
  GROUP BY MONTH(date_published)
) AS combined
GROUP BY month
ORDER BY month


month,vulnerability_count_reserved,vulnerability_count_published
1,4116,3320
2,3344,2834
3,3268,3573
4,3086,3484
5,3160,3781
6,2968,2835
7,2552,2991
8,3114,2794
9,2479,2483
10,3782,3410


Databricks visualization. Run in Databricks to view.

Findings: 

Though, most CVEs are reserved and published in their corresponding year, there is some minor overlap between surrounding years. While most CVEs are published with some immediacy (~10 days of latency), some are published much later, extending out past one year, with one CVE being published nearly two years after reservation. Further analysis could be done to analyze what companies/cvss priorities correlate to these longer latencies.

# **2) Risk Distribution Analysis**

In [0]:
-- 2) Risk Distribution Analysis
-- CVSS score bucketing and risk severity trends over time

SELECT
  YEAR(date_published) AS year,
  MONTH(date_published) AS month,
  CASE
    WHEN cvss_score >= 9 THEN 'Critical (9+)'
    WHEN cvss_score >= 7 THEN 'High (7 - 9)'
    WHEN cvss_score >= 4 THEN 'Medium (4 - 7)'
    WHEN cvss_score >= 0 THEN 'Low (0 - 4)'
    ELSE 'Unknown'
  END AS severity_bucket,
  COUNT(*) AS vulnerability_count
FROM workspace.cve_silver.core
WHERE date_published IS NOT NULL
GROUP BY year, month, severity_bucket
ORDER BY year, month, severity_bucket;



year,month,severity_bucket,vulnerability_count
2024,1,Critical (9+),48
2024,1,High (7 - 9),282
2024,1,Low (0 - 4),91
2024,1,Medium (4 - 7),445
2024,1,Unknown,268
2024,2,Critical (9+),100
2024,2,High (7 - 9),372
2024,2,Low (0 - 4),80
2024,2,Medium (4 - 7),657
2024,2,Unknown,560


Databricks visualization. Run in Databricks to view.

In [0]:
SELECT
  CASE
    WHEN cvss_score >= 9 THEN 'Critical (9+)'
    WHEN cvss_score >= 7 THEN 'High (7 - 9)'
    WHEN cvss_score >= 4 THEN 'Medium (4 - 7)'
    WHEN cvss_score >= 0 THEN 'Low (0 - 4)'
    ELSE 'Unknown'
  END AS severity_bucket,
  COUNT(*) AS vulnerability_count
FROM workspace.cve_silver.core
GROUP BY severity_bucket
ORDER BY severity_bucket;

severity_bucket,vulnerability_count
Critical (9+),2047
High (7 - 9),8398
Low (0 - 4),1159
Medium (4 - 7),14153
Unknown,12970


Databricks visualization. Run in Databricks to view.

Findings:

The prevalence of each cvss severity does not appear to change with time. Each month shares a similar pattern of ~35% Unknown, ~5% Low, ~35% Medium, ~20% High, and ~5% Critical. Interestingly, the prevalence of each severity almost follows a pyramid pattern (where the highest prevalence is associated with the lowest severity, and the lowest prevalence with the highest severity), however, the low severity takes up an extremely small portion of the published CVEs. This may be because many CVEs are such low priority that they are not even rated on the cvss scale, resulting in the large portion in the "Unknown" category. More research into cvss ratings will be needed to confirm/deny this idea.

# **3) Vendor Intelligence**

In [0]:
-- 3) Vendor Intelligence
-- Top 25 vendors by vulnerability count

SELECT
  vendor,
  COUNT(*) AS vulnerability_count
FROM workspace.cve_silver.affected
GROUP BY vendor
ORDER BY vulnerability_count DESC
LIMIT 25

vendor,vulnerability_count
Microsoft,13161
Unknown,8770
Linux,6152
"Brother Industries, Ltd",4427
Red Hat,3913
Siemens,2540
Apple,1692
Lenovo,929
Adobe,751
Autodesk,743


Databricks visualization. Run in Databricks to view.

In [0]:
-- Market concentration analysis: percentage of vulnerabilities attributed to top vendors

SELECT
  vendor,
  COUNT(*) AS vulnerability_count,
  ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS vulnerability_pct
FROM workspace.cve_silver.affected
GROUP BY vendor
ORDER BY vulnerability_count DESC
LIMIT 25

vendor,vulnerability_count,vulnerability_pct
Microsoft,13161,17.53
Unknown,8770,11.68
Linux,6152,8.2
"Brother Industries, Ltd",4427,5.9
Red Hat,3913,5.21
Siemens,2540,3.38
Apple,1692,2.25
Lenovo,929,1.24
Adobe,751,1.0
Autodesk,743,0.99


Databricks visualization. Run in Databricks to view.

In [0]:
-- Vendor-specific risk profiles: CVSS severity distribution and vulnerability counts per vendor
-- Limit to top 25 vendors
WITH top_vendors AS (
  SELECT
    a.vendor,
    COUNT(*) AS total_vulns
  FROM workspace.cve_silver.affected a
  JOIN workspace.cve_silver.core c
    ON a.cve_id = c.cve_id
  GROUP BY a.vendor
  ORDER BY total_vulns DESC
  LIMIT 25
)


SELECT
  a.vendor as vendor,
  COUNT(*) AS vulnerability_count,
  AVG(c.cvss_score) AS avg_cvss_score,
  CASE
    WHEN c.cvss_score >= 9 THEN 'Critical (9+)'
    WHEN c.cvss_score >= 7 THEN 'High (7 - 9)'
    WHEN c.cvss_score >= 4 THEN 'Medium (4 - 7)'
    WHEN c.cvss_score >= 0 THEN 'Low (0 - 4)'
    ELSE 'Unknown'
  END AS severity_bucket
FROM workspace.cve_silver.affected a
JOIN workspace.cve_silver.core c
  ON a.cve_id = c.cve_id
JOIN top_vendors v
  ON a.vendor = v.vendor
GROUP BY a.vendor, severity_bucket
ORDER BY vulnerability_count DESC, severity_bucket


vendor,vulnerability_count,avg_cvss_score,severity_bucket
Microsoft,9516,8.017969735183756,High (7 - 9)
Unknown,7578,,Unknown
Linux,6138,,Unknown
Microsoft,3375,6.105214814814762,Medium (4 - 7)
"Brother Industries, Ltd",2355,5.705732484076205,Medium (4 - 7)
Red Hat,2117,7.682239017477535,High (7 - 9)
Apple,1692,,Unknown
Red Hat,1660,5.686084337349303,Medium (4 - 7)
"Brother Industries, Ltd",1383,7.374837310195196,High (7 - 9)
Siemens,1338,5.38505231689092,Medium (4 - 7)


Databricks visualization. Run in Databricks to view.

Databricks visualization. Run in Databricks to view.

In [0]:
-- average cvss score by vendor
-- Limit top 25 vendors

SELECT
  a.vendor as vendor,
  COUNT(*) AS vulnerability_count,
  AVG(c.cvss_score) AS avg_cvss_score
FROM workspace.cve_silver.affected a
JOIN workspace.cve_silver.core c
  ON a.cve_id = c.cve_id
WHERE c.cvss_score IS NOT NULL
GROUP BY vendor
ORDER BY vulnerability_count DESC, avg_cvss_score DESC
LIMIT 25


vendor,vulnerability_count,avg_cvss_score
Microsoft,13160,7.559414893617999
"Brother Industries, Ltd",4427,6.864377682403008
Red Hat,3913,6.761717352415011
Siemens,2540,6.380708661417283
Unknown,1192,6.358976510067144
Lenovo,917,6.635005452562651
Adobe,751,6.174034620506047
Autodesk,743,7.796904441453672
D-Link,616,6.584253246753307
IBM,570,5.872105263157919


Databricks visualization. Run in Databricks to view.

Findings:

As expected, most published CVEs are associated with widely recognizable names in the technology industries, such as Microsoft, Apple, Google, etc. Interestingly, some of these companies (Apple, Gigabyte, Google, Linux, Mozilla, and NEC) have no/almost no CVEs with cvss scores. It seems obvious that these companies would have at least a few rateable CVEs published, suggesting that there is some reason that they are not scored, be it data privacy or maintaining a certain public image. Another interesting finding is the prevalence of critical priority CVEs published for Brother Industries (15.6%, 589 CVes). This percentage and quantity is much higher than other companies listed, with the next highest percentage being from Cisco (5.51%, 20 CVEs) and the next highest quantity being from Microsoft (2.02%, 266 CVEs). A cursory search shows Brother Industries is a manufacturer of sewing machines and printers, leading to the question of why these products are resulting in such a high quantity of critical priority CVEs. It could possibly be due to the security issues associated with wireless connection to printers that are used for secured data, but an analysis of these CVEs may prove interesting.