Skip to content

friberk/regex-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies

This repository contains the artifact for the paper "Is Reuse All You Need? A Systematic Comparison of Regular Expression Composition Strategies." The paper investigates whether regex composition tasks are unique enough to merit dedicated machinery or if reuse is sufficient.

Overview

Regular expressions (regexes) are prevalent in software engineering but are known to be difficult to compose correctly. This research systematically evaluates three major regex composition strategies:

  1. Reuse-by-example: Our novel operationalization of regex reuse practices
  2. Formal regex synthesis: Using algorithmic approaches to generate regexes
  3. Generative AI: Using large language models (LLMs) to compose regexes

We evaluated these strategies across multiple dimensions including accuracy, syntactic and semantic similarity, constraint balance, and computational efficiency.

Repository Structure

  • data/: Contains all data used and produced in our experiments

    • regex-composition-bench/: Our novel dataset of regex composition tasks mined from GitHub and RegExLib
    • regex-reuse-database/: Production-ready regexes for the reuse-by-example approach
    • generated-regexes/: Regexes generated by different strategies
    • evaluation-results/: Evaluation results for each strategy
  • modules/: Contains the code for all components of our research

    • extractor/: Code for extracting regexes from software repositories
    • evaluator/: Code for reuse-by-example and its evaluation
    • helpfulness_score/: Implementation of our novel "helpfulness" metric
    • regex_semantic_sim/: Semantic similarity comparison between regexes
    • regex_syntactic_sim/: Syntactic similarity comparison between regexes
    • run_strategies/: Code to run different regex composition strategies, including the prompts for LLMs
    • run_analysis/: Scripts for analyzing results
    • make_plots/: Scripts for generating plots and visualizations

More detailed information about each component can be found in their respective directories.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •