# 10. CODE DEFECTS

1. Types
2. ML Approaches
3. Exercise
4. References

# 1. Types

*Defect, bug, error, flaw, failure, fault, vulnerability, weakness, antipattern, smell...*

Basic expectations from the code:
- works as intended: correctly, efficiently, securely
- easy to maintain: understand what is happening; make changes

Violations of these expectations are *defects* in the code.
1. If the code does not work as intended --- error.
2. If the code is difficult to maintain --- anti-pattern.

Errors come in different forms:
- bug: a violation of functionality
- vulnerability: a violation of functionality that can lead to a security issue (intent to abuse the bug).

#### Why is vulnerability detection important?

- software is an integral part of modern life
- approximately 2244 cyber attacks per day (more than 800K attacks per year), approximately one attack [every 39 seconds](https://svitla.com/blog/cybersecurity-threats)
- software vulnerabilities are the main reason for the prevalence of cyber attacks
- modern tools [miss](https://doi.org/10.1145/3533767.3534380) from 47\% to 80\% of real vulnerabilities
- by the way, Kontur has a [Vulnerability Search Program](https://kontur.ru/bugbounty)

#### CWE vs CVE

CWE (Common Weakness Enumeration) is a list of types of software weaknesses.

[Top 25 Most Dangerous](https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html):

1. CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')
2. CWE-787: Out-of-bounds Write
3. CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')
4. CWE-352: Cross-Site Request Forgery (CSRF)
5. CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
6. CWE-125: Out-of-bounds Read
7. CWE-78: Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')
8. CWE-416: Use After Free
9. CWE-862: Missing Authorization
10. CWE-434: Unrestricted Upload of File with Dangerous Type
11. CWE-94: Improper Control of Generation of Code ('Code Injection')
12. CWE-20: Improper Input Validation
13. CWE-77: Improper Neutralization of Special Elements used in a Command ('Command Injection')
14. CWE-287: Improper Authentication
15. CWE-269: Improper Privilege Management
16. CWE-502: Deserialization of Untrusted Data
17. CWE-200: Exposure of Sensitive Information to an Unauthorized Actor
18. CWE-863: Incorrect Authorization
19. CWE-918: Server-Side Request Forgery (SSRF)
20. CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
21. CWE-476: NULL Pointer Dereference
22. CWE-798: Use of Hard-coded Credentials
23. CWE-190: Integer Overflow or Wraparound
24. CWE-400: Uncontrolled Resource Consumption
25. CWE-306: Missing Authentication for Critical Function

CVE (Common Vulnerabilities and Exposures database) is a list of known vulnerabilities and security flaws.

> For example, vulnerability CVE-2021-44228 is a Remote Code Execution vulnerability in Apache Log4j,
The 2021 substring denotes the year in which the CVE identifier was assigned or in which the vulnerability was published.
The 44228 substring is a unique identifier for the vulnerability during the year.

# 2. ML Approaches

## 2.0 Datasets

Requirements:
1. high-quality labels
2. realistic
3. size

Public datasets:
1. [SARD 2018](https://samate.nist.gov/SARD/), [Draper](https://arxiv.org/abs/1807.04320), [Devign](https://arxiv.org/abs/1909.03496), [D2A](https://arxiv.org/abs/2102.07995), [ReVeal](https://arxiv.org/abs/2009.07235), [SecurityEval](https://doi.org/10.1145/3549035.3561184), [DiverseVul](https://arxiv.org/abs/2304.00409)
2. different languages: C++, Java, etc.
3. synthetic / real data
4. different quality of datasets

**WARNING**

Some datasets (especially synthetic ones) contain comments or variable names that explicitly indicate the type of vulnerability.
A model trained on such a dataset will be useless in practice.

![](res/10_cwe457_comment.png)

Dataset quality:
- 20%--71% of vulnerability labels in real-world datasets are [incorrect](https://arxiv.org/abs/2301.05456)
- 17%--99% of examples are [duplicated](https://arxiv.org/abs/2301.05456)
- after training on such datasets in real-world scenarios, the quality of models [drops](https://ieeexplore.ieee.org/abstract/document/9448435) by more than 50%

![](res/10_quality.png)

![](res/10_attributes.png)

[Source: [Croft et al. 2023]](https://arxiv.org/abs/2301.05456)

## 2.1 Encoders

1. Pre-trained encoder
2. Fine-tune on dataset

#### LineVul: локализация на уровне строк кода

In [[Fu Tantithamthavorn 2022]](https://michaelfu1998-create.github.io/papers/linevul.pdf) the authors implement line-level vulnerability localization:

Given a function predicted as vulnerable by LineVul, the authors perform a line-level vulnerability localization by leveraging the self-attention mechanism inside the Transformer architecture to locate the vulnerable lines.

The intuition is that tokens that are most contributed to the predictions are likely to be vulnerable tokens.

For each subword token, we can summarize the self-attention scores from each of the $12$ Transformer encoder blocks. After obtaining the attention subword-token scores, we can then integrate those scores into line scores.

The authors split a whole function into many lists of tokens (each list of tokens represents a line) by the Newline control character (i.e., \n). Finally, for each list of token scores, they summarize it into one attention line score and rank line scores in a descending order.

Let's summarize:

1. tokens that are most contributed to the predictions are likely to be vulnerable tokens
2. for each subword token in the function, the authors summarize the self-attention scores from each of the 12 Transformer encoder blocks
3. integrate those scores into line scores

![](res/10_linevul.png)

## 2.2 LLMs

[Prompts](https://arxiv.org/abs/2308.12697) for vulnerability detection using ChatGPT:
- basic prompting
- role-based basic prompt
- reverse-question prompt
- prompting with auxiliary information (Data Flow, API Calls)
- Chain-of-Thought prompting

Basic prompting:
> `Is the following program buggy? Please answer Yes or No. [CODE]`

Role-based basic prompt:
> `I want you to act as a vulnerability detection system. My first request is "Is the following program buggy?" Please answer Yes or No. [CODE]`

Reverse-question prompt:
> `I want you to act as a vulnerability detection system. My first request is "Is the following program **correct**?" Please answer Yes or No. [CODE]`

Prompting with auxiliary information:
> `I want you to act as a vulnerability detection system. I will provide you with the original program and the data flow information, and you will act upon them. Is the following program buggy? [CODE]. [Data Flow description].`
>
> ![](res/10_aux.png)
>
> `Data Flow description: The data value of the variable $v_i$ at the $p_i$-th token comes from/is computed by the variable $v_j$ at the $p_j$-th token.`

Chain-of-Thought Prompting:
> `1. Please describe the intent of the given code. [CODE]`
> `2. I want you to act as a vulnerability detection system. Is the above program buggy? Please answer Yes or No.`

Drawbacks:

- LLMs require a lot of resources
- limitation on inference time
- some LLMs are closed

## 2.3 Agents

See: [COLLABORATIVE AGENTS](09_collaborative_agents.ipynb)

#### [CodeAgent: Collaborative Agents for Software Engineering](https://arxiv.org/abs/2402.02172)

- A multi-agent-based system for code review.
- Website: https://code-agent-new.vercel.app/index.html
- Demo: https://code-agent-new.vercel.app/index.html#demo

Tasks:
- Semantic consistency detection between commit and commit message
- Vulnerability analysis
- Format consistency detection
- Code revision

![](res/09_codeagent_pipeline.png)

#### Results (detected vulnerabilities)

![](res/09_codeagent_vulnerabilities.png)

# 3. Exercise

Investigate the effectiveness of different prompting-based approaches for detecting CWE.
1. Choose LLM
2. Choose prompting-based approaches
3. Choose examples (e.g. https://samate.nist.gov/SARD/)
4. Conduct research
5. Draw conclusions

# 4. References

- https://kontur.ru/bugbounty
- https://svitla.com/blog/cybersecurity-threats
- https://doi.org/10.1145/3533767.3534380
- https://dl.acm.org/doi/10.1145/3549035.3561184
- https://samate.nist.gov/SARD/
- https://ieeexplore.ieee.org/abstract/document/9448435
- https://michaelfu1998-create.github.io/papers/linevul.pdf
- https://arxiv.org/abs/1807.04320
- https://arxiv.org/abs/1909.03496
- https://arxiv.org/abs/2009.07235
- https://arxiv.org/abs/2102.07995
- https://arxiv.org/abs/2301.05456
- https://arxiv.org/abs/2304.00409
- https://arxiv.org/abs/2308.12697
- https://arxiv.org/abs/2402.02172