### Abstract:
- The first systematic , longitudinal analysis of cryptographic libraries and the vulnerabilities they produce.
- 48,4% of vulnerabilities in libraries are written in C and C++, mainly related to memory safety.
- 27,5% related to cryptographic design and implementation issues.
- 19,4% side channel attacks.

### Introduction
 
 - Cryptogrqphic libraries are responsible for securing all network communications, yet have produced notoriously severe vulnerabilities.
 - In 2014, HeartBleed vulnerability in OpenSSL enables attackers to read the content of servers private memory.
 - In June 2020, GnuTLS suffered a significatn vulnerability allowing a remote attacker to passively decrypt network traffic.
 - A common aphorism in the security community is 'Cryptographic code is inherintly difficult to secure due to its complexity', another maxim is 'Complexity is the enemy of security'.
 - Many cryptographic libraries includes code that is not purely cryptographic in nature, including often complex network protocols, data sophistication (e.g. X.509 parsing), and system configuration code.
 - Novel cryptographic primitives, programming languages abd use-cases suc h as cryptocurrencies and zero-knowledge proofs emerge increasingly in the industry.
 - An oft-cited strategy to improve security in cryptographic libraries and other systems is to write code in memory-safe languages such as Rust.
 - The study examines 37 commonly used cryptographic libraries, and include the 23 libraries  that have had any vulnerability reported in the National Vulnerability Database (NVD) accross the 18-year period from 2005 and 2022.
 - Special attention is paid to the relationship between software complaxity and vulnerability frequency , recording codebase size and cyclomatic complexity of each library.
 - Special attenmtion is paid to the relationship between software complexity and vulnerability frequency, recording codebase size and cyclomatic complexity of each library.
 - Among the findings are that while 27,5% of vulnerabilities in cryptographix software are issues directly related to the cryptographic protocols or implementation, 40.0% of errors across all libraries are related to memory management and a further 19,5% are side-channel attacks, suggesting that developers shoudld focus their efforts on systems-level implementation issues.
 - In-depth syb-classification reveals that just 14 of 502, or 2,5% of all vulnerabilities are due to broken protocols or ciphers.
 - Over 35% were located in implementation of the SSL/TLS protocols.
 - 27,9% arising from certificate parsing implementation.
 - The median exploitable lifetime of a vulnerability in a cryptographic library is 3,88% years.
 - Through an analysis od OpenSSL's major version release, OpenSSL has an average defect density of 1 CVE per thousand lines of code.
 - The contribution of the study is as follows:
    - Compile and publish a dataset of all vulbnerabilities in cryptographic libraries.
    - In-depth review of each issue, extensively characterizing all vulnerabilities by type, feature location, lifetime, severity and other characteristics.
    - Investigate complexity whithin the library code-bases as a potential source of vulnerabilities, finding substantial variation in complexity among different cryptographic components.
    - Presenting two novel classification taxonomies specific to cryptographic software and provide guidelines to improve NVDs data quality
- Related work:
    - Relationship between complexity and security
    - Empirical vulnerability analysis
    - Role of human factors in software insecurity

### Methodology:

- Collecting source code repository and vulnerability data from 23 open source cryptographic libraries.
- Cryptographic library is defined as general purpose collection of implementations of cryptographic primitives and/or protocols. In particular, wrappers are excluded in addition to libraries focused on comparatively nice primitives such as multi-party computation and zk-SNARKs.
- Discussing vulnerability characteristics, our collection and data cleansing methodology.
- Collecting some of the dataset was obtained through web scraping, we also manually compiled amd analysed our dataset due to inconsistencies and inaccuracies in the NVD and other sources.

- Systems analysed: are Open Source, and have suffiscient CVE reporting ( at least one reported CVE)
- Vulnerability data sources:
    - NVD managed by NIST, a vulnerability is defined as an entry in CVE list maintained by MIRE.
    - To supplement and refine the data provided by the NVD, a manual review are conducted for individual projects by trackers, mailing lists, blogs, and other external references, for more granular informationon CVEs , patch commit descriptioon. Other ways such as individual project issue tracker. In total, our dataset consists of n=552 CVEs in cryptographic libraries published by NVD between 2005 and 2022 inclusive.
- Classifiying Vulnerability Type:
    NVD assigns each vulnerability a Common Weakness Enumeration (CWE). Issues were observeed in the context of our study:
    1- Missing CWEs : 60 CVEs, or just over 10% has no CWE label
    2- Overly broad categorizations: broad and vague labels
    3- Inconsistent labeling: e.g. an integer cauding a buffer overflow which is categorized as a 'Numeric Issue' in one CVE, and 'Buffer overflow' in another.

- Memory fasety flag :
    - While the root cause is not memory related, the vulnerability was exacerbated or made exploitable by the use of a memroy-unsafe language. An additional binary flag is added to whether the problem would have been mitigated and/or eliominated by memory safety.
- Calculating vulnerability lifetime:
Calculating a lower bound on this lifetime requires determining the release date of the first version affected and the release date of the patch.
- Calculating vulnerability severity:
We use NVD CVSS scores  to study vulnerability severity across systems. The majority only have v2 scoring, while more recent vulnerabilities only have v3 scoring. We default to v2, and turn to v3 when v2 is not provided.
- Complexity Metrics:
Total lines of code / Cyclomatic complexity
    - We use command-line tool 'cloc' to count total lines of code for each language in our database
    - We use a seperate command-line tool 'lizard' to calculate the cyclomatic complexity of each individual function.
Additionally, we calculate  the average cyclomatic complexity number (CCN) for given set of files taking the average over all the functions, rather than calculating the CCN of each file and averaging the files together as lizard does.
- Limitations:
    - NVD vulnerability reporting: not all systems report vulnerabilities when discovered, therefore we avoid using CVE count as an absolute metric in the analysis.
    - Quality bias: CVE listings and/or project security advisories often fail to include suffiscient detail.
    - Open source: limitations arise as propriety and close-course software doesn't have the same trends observed in open source projects.
    - Lack of language diversity: mostly C/C++
    - Non-technical factors

### Results

- Characteristics of vulnerabilities in cryptographic software:
We categorize vulnerabilities by type, broadly categorizing them as cryptographic and non-cryptographic in nature, and investigate origine within the source code.
- Vulnerabilities by type:
General memory management issues comprise the largest individual category at 221 out of 552 or 40,0% of CVEs. This includes 202 memory safety issues (out-of-bounds write, out-of-bound read, incorrect calculation of buffer size, etc) as well as 19 other memory related issues such as infinite recursion and memory exhaustion.
Cryptographic issues comprise the second-largest individual category at 27,5%. We further subdivide this category, various side-channel attacks, such as timing oe memory cache attacks, furhter produce 9,5% of CVEs in cryptographic libraries. We list side-channel attacks seperately from the general 'cryptographic issues' category since they exploit weakness in the physical hardware of a system or timing and cache-access data rather than direct flaws in the cryptographic implementations. A further 7,8% of vulnerabilities arise from the numeric errors (i.e. errors in numerical calculation or conversion not specific to any one cipher or algorithm, as as carry propagating errors or squarring very large numbers) and various system issue comprise 2,9%.
-Memory safety issues in C/C++ codes: 
Using the memory safety flag described earlier, we find that 248 out of 52 C/C++ CVEs or 48,4% of the CVEs in C/C++ libraries would have been either prevented or mitigated by using a diffrent language.
- Cryptographic Vulnerabilities:
Our findings show that just 27,5% of CVEs in cryptographic software are directly related to cryptographic software are directly related to cryptographic design and implementation. To verify that the trend is consistent across libraries, we calculate the ratio of cryptographic to non-cryptographic CVEs in the five cryptographic libraries with the largest quantities finding that the individual library percentages are consistent with the range of 25-35%.



