Skip to content

ni/bib-dedup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bib-dedup

A small Python CLI to read multiple BibTeX (.bib) files, identify duplicate entries, and write one output BibTeX file that is the unique union of the inputs.

Install (editable)

python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .

Alternatively (one command):

make install-dev

Usage

bib-dedup input1.bib input2.bib -o merged.bib

By default, the output BibTeX wraps the title field with double braces (e.g., title = {{My Title}}) to preserve capitalization in downstream tools. To disable this behavior:

bib-dedup input1.bib input2.bib -o merged.bib --no-double-brace-titles

If you don’t want to activate the venv, you can also run:

.venv/bin/bib-dedup input1.bib input2.bib -o merged.bib

Optional JSON report:

bib-dedup input1.bib input2.bib -o merged.bib --report report.json

The report includes an excluded_entries list containing:

  • duplicate input entries that were dropped in favor of a single merged entry
  • any invalid/ignored records encountered while parsing

Dedup strategy

  • Primary: normalized DOI (if present)
  • Fallback: normalized title + year + first-author last name

Within a duplicate group, entries are merged by taking the union of fields and preferring “better” values (non-empty / longer / more specific).

About

A python tool that takes multiple BibTex files and produces a unique union as output.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors