llms-for-verified-programs

Repository containing the dataset, algorithms, experiments and results of the thesis "Large Language Models for Verified Programs" (Report: thesis.pdf).

Getting started

Set up the Nagini server

This project depends on the Nagini. Follow the instructions here to set up Nagini.

Note: Nagini requires Python 3.8, hence set it up in a separate virtual environment.

Nagini can be run in server mode using this command:

python3 -m nagini_translation.main a --server

However, a code change in nagini_translation.main is required so that the server doesn't crash on exceptions [todo: provide link to modified nagini_translation/main.py here].

The server by default listens on "tcp://localhost:5555", which we've used in the client (DEFAULT_CLIENT_SOCKET).

Set up this project

Create a .env file at the root level with the following content. .gitignore takes care not to commit this file to version control.

GPT_SECRET_KEY=sk-<your_api_key>

Create a virtual environment with Python 3.12 and install dependencies of the project:

virtualenv l4vp
soruce l4vp/bin/activate
cd <this_directory>
pip install -r requirements.txt

Run experiments in one of the jupyter notebook e.g. final.ipynb. Make sure the Nagini server is running.

Dataset

The (root) dataset dataset/, described in Appendix A.1 of the thesis, consists of 50 methods across 3 predicates (list, lseg, tree) verified for memory safety. A config.json file (e.g. config.json) specifies the dataset completely. It includes: the unverified program, verification error from Nagini, verified program, the system prompt, and the predicate definition / top level declarations for the predicate. For improved performance on the first run, we cache the verification error from Nagini using the script provided in scripts.ipynb.

The data.py module serves as an interface for all dataset operations.

Few-shot prompting

The run_example method in evaluation.py implements "Algorithm 1: $\texttt{TryVerify}_{k,n}$: Prompting with errors" from the thesis. The Evaluation class takes the model and dataset as arguments. The run_example method is implemented against the general Model interface, making it easy to use generically for different models.

Models

The model interface and the currently supported models are implemented in model.py. The supported models include OpenAI GPT-3.5, GPT-4, CodeLlama models hosted on TogetherAI, and our custom fine-tuned model hosted on Amazon SageMaker.

Incremental Verification

The notebook final.ipynb contains an implementation of the "Algorithm 2: Incremental Verification" described in the thesis.

Fine-tuning

Training dataset generation

"Algorithm 3: Training dataset generation" is implemented in create_dataset.ipynb. This relies on AST analysis of the verified (or master) program which is implemented ast_util.py.

Fine-tuning on Amazon SageMaker

With the dataset generated (as a jsonl file), the notebook fine_tuning.ipynb contains the code to fine-tune the model on Amazon SageMaker. The training script to be run inside the container is in scripts/sagemaker_train.py.

Deploying and evaluating the fine-tuned model

fine_tuning.ipynb also contains code to deploy the fine-tuned model (either synchronously using the Estimator object after training finishes, or using the model artifacts from S3). Once deployed, note the predictor endpoint name. Then the model can be evaluated using the Evaluation class just like any other (OpenAI / TogetherAI) model. The notebook eval_finetuned.ipynb contains the code and evaluation results of our three fine-tuned models. A custom inference script is required (as we trained using LoRA, need to combine the adapter with the model during initialization) -- this is present in scripts_inference/inference.py.

Results / transcripts

The final/ directory contains transcripts of running evaluation with few-shot prompting, incremental verification (GPT-4) and our fine-tuned models.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
dataset		dataset
final		final
nagini_examples		nagini_examples
scripts		scripts
scripts_inference		scripts_inference
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
ast_util.py		ast_util.py
cot_instruct.ipynb		cot_instruct.ipynb
create_dataset.ipynb		create_dataset.ipynb
data.py		data.py
eval_finetuned.ipynb		eval_finetuned.ipynb
eval_pretrained.ipynb		eval_pretrained.ipynb
evaluation.py		evaluation.py
experiments.ipynb		experiments.ipynb
final.ipynb		final.ipynb
finetune.ipynb		finetune.ipynb
instruct.ipynb		instruct.ipynb
interact.ipynb		interact.ipynb
interactive_widget.py		interactive_widget.py
model.py		model.py
nagini_client.py		nagini_client.py
partial.ipynb		partial.ipynb
requirements.txt		requirements.txt
scripts.ipynb		scripts.ipynb
thesis.pdf		thesis.pdf

omkar-ethz/llms-for-verified-programs

Folders and files

Latest commit

History

Repository files navigation

llms-for-verified-programs

Getting started

Set up the Nagini server

Set up this project

Dataset

Few-shot prompting

Models

Incremental Verification

Fine-tuning

Training dataset generation

Fine-tuning on Amazon SageMaker

Deploying and evaluating the fine-tuned model

Results / transcripts

About

Resources

Stars

Watchers

Forks

Languages