Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies

This repository contains the artifact for the paper "Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies." The paper investigates whether regex composition tasks are unique enough to merit dedicated machinery or if reuse is sufficient.

Overview

Regular expressions (regexes) are prevalent in software engineering but are known to be difficult to compose correctly. This research systematically evaluates three major regex composition strategies:

Reuse-by-example: Our novel operationalization of regex reuse practices
Formal regex synthesis: Using algorithmic approaches to generate regexes
Generative AI: Using large language models (LLMs) to compose regexes

We evaluated these strategies across multiple dimensions including accuracy, syntactic and semantic similarity, constraint balance, and computational efficiency.

Repository Structure

data/: Contains all data used and produced in our experiments
- regex-composition-bench/: Our novel dataset of regex composition tasks mined from GitHub and RegExLib
- regex-reuse-database/: Production-ready regexes for the reuse-by-example approach
- generated-regexes/: Regexes generated by different strategies
- evaluation-results/: Evaluation results for each strategy
modules/: Contains the code for all components of our research
- extractor/: Code for extracting regexes from software repositories
- evaluator/: Code for reuse-by-example and its evaluation
- helpfulness_score/: Implementation of our novel "helpfulness" metric
- regex_semantic_sim/: Semantic similarity comparison between regexes
- regex_syntactic_sim/: Syntactic similarity comparison between regexes
- run_strategies/: Code to run different regex composition strategies, including the prompts for LLMs
- run_analysis/: Scripts for analyzing results
- make_plots/: Scripts for generating plots and visualizations

More detailed information about each component can be found in their respective directories.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
data		data
modules		modules
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies

Overview

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

friberk/regex-evaluation

Folders and files

Latest commit

History

Repository files navigation

Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies

Overview

Repository Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages