Skip to content

DOC: Add bibliography support with BibTeX citations across documentation#1472

Merged
romanlutz merged 51 commits intomicrosoft:mainfrom
romanlutz:bibliography-support
Mar 16, 2026
Merged

DOC: Add bibliography support with BibTeX citations across documentation#1472
romanlutz merged 51 commits intomicrosoft:mainfrom
romanlutz:bibliography-support

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

This PR adds comprehensive bibliography and citation support to PyRIT's documentation using MyST's native BibTeX integration.

Changes

Bibliography infrastructure

  • doc/references.bib — curated BibTeX file with 65 entries covering all referenced papers, datasets, blog posts, and tools
  • doc/bibliography.md — bibliography page using MyST's {bibliography} directive with "Show All" expand button
  • doc/css/custom.css — removed CSS overrides that were hiding the bibliography expand button

Citations in Python docstrings (API reference)

  • Added [@bibtex_key] cite tags to docstrings across 35+ source files (datasets, attacks, converters)
  • Citations render as interactive tooltips in the API reference docs with author/title/year details
  • Removed redundant bare URLs where citations now provide the link
  • Fixed wrong arXiv ID for VLSU dataset (was 2501.01151, corrected to 2510.18214)

Citations in user guide & cookbooks

  • Added first-mention citations across architecture, attack, cookbook, converter, scenario, target, blog, and index pages
  • Updated both .py and matching .ipynb files for all changes

Datasets page enhancements

  • doc/code/datasets/1_loading_datasets — added inline citations for 31 datasets with BibTeX entries

API reference improvements

  • build_scripts/gen_api_md.py — added short_title frontmatter for cleaner sidebar navigation; simplified headings to show just class/function names
    (full signatures in code blocks below)

New BibTeX entries added

Aegis (Ghosh et al.), ALERT (Tedeschi et al.), CBT-Bench (Zhang et al.), garak (Derczynski et al.), HarmfulQA (Chu et al.), LLM-LAT (Sheshadri et al.),
MedSafetyBench (Han et al.), PromptIntel (Thomas Roccia), Red Team Social Bias (Simone Van Taylor), Transphobia Awareness (Scheuerman et al.), TDC23
(Mazeika et al.), VLSU (Palaskar et al.), ANSI Escape Sequences (Fredrik "STÖK" Alexandersson), CCP Sensitive Prompts (promptfoo)

romanlutz and others added 30 commits March 12, 2026 06:44
- Upgrade jupyter-book from 1.0.4 (Sphinx-based) to 2.1.2 (MyST engine)
- Replace Sphinx autodoc/autosummary/napoleon with griffe-based API doc generation
- Add scripts/pydoc2json.py and scripts/gen_api_md.py for API doc generation
- Add doc/api/ with generated API reference pages for all pyrit submodules
- Add doc/myst.yml unified config replacing _config.yml and _toc.yml
- Custom landing page with hero banner, key capabilities, installation cards
- Top-level navigation bar
- Update Makefile, GitHub Actions, ReadTheDocs, pre-commit hooks
- Remove old Sphinx-dependent build scripts and config files
- Add griffe to dev dependencies, remove sphinxcontrib-mermaid

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…down)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These pre-commit hooks are still needed for .ipynb files (strip kernelspec
metadata and sanitize user-specific paths).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ok.py)

build_scripts/validate_docs.py checks myst.yml TOC references exist and
detects orphaned doc files. Runs in <1s vs full build taking 18s+.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s to hyphens

JB2 URL slugging: 0_dataset → dataset, 1a_install_uv → a-install-uv,
0_prompt_targets → prompt-targets. Updated all site.nav URLs to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…xists)

API docs are now auto-generated from source by griffe, which reads __all__
directly. The generation script itself ensures completeness.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace backslash path separators with forward slashes in doc/myst.yml
  for cross-platform compatibility (Windows + WSL/Linux)
- Replace snake emojis with sun emojis on landing page cards (conda -> uv)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- pydoc2json.py: pass include_submodules=True recursively so nested
  submodules (e.g. pyrit.executor.attack) are included in pyrit_all.json
- gen_api_md.py: split aggregate JSON into per-module files before
  generating markdown, so all API pages referenced by myst.yml exist
- generate_rss.py: update path from _build/html to _build/site for JB2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JB2 no longer produces standalone HTML blog pages. Rewrite
generate_rss.py to parse blog source markdown files directly
from doc/blog/ and output RSS to doc/_build/site/blog/rss.xml.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Populate doc/references.bib with 21 BibTeX entries (15 papers + 6 blogs/reports)
  including Crescendo, PAIR, Spotlighting, and all previously referenced works
- Add bibliography.md with full author names, titles, years, venues, and links
- Bibliography is the single reference page (no duplicate per-page references)
- Notebooks keep existing hyperlinks without inline {cite} tags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- 2401.15817: Forrest McKee, David Noever (not Gong et al.)
- 2407.11969: Andriushchenko & Flammarion, 'Does Refusal Training
  in LLMs Generalize to the Past Tense?' (was misattributed)
- 2409.11445: Bethany et al. (correct author list)
- AI Red Team: 'Lessons From Red Teaming 100 Generative AI Products'
  with full 26 author list and arXiv/NeurIPS details

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comprehensive pass over all pyrit/ source files found references to
16 additional papers (mostly datasets and benchmarks used by PyRIT):
BeaverTails, Do Anything Now, Multilingual Alignment Prism,
HarmBench, JailbreakBench, Do-Not-Answer, Multilingual Vulnerabilities,
OR-Bench, PKU-SafeRLHF, ToxicChat, SOSBENCH, SORRY-Bench,
SimpleSafetyTests, SALAD-Bench, EquityMedQA, and Adam optimizer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Found 5 more dataset/benchmark papers referenced in source code:
DecodingTrust, WMDP, XSTest, MLCommons AILuminate.
Total bibliography: 42 entries across 4 sections.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Final sweep found 4 more missing references:
- PyRIT paper itself (Lopez Munoz et al., 2024)
- DarkBench (Apart Research, 2025)
- Sneaky Bits blog post (Rehberger, 2025)
Total bibliography: 46 entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove section headers (Academic Papers, Datasets, etc.) and merge
all 46 entries into one alphabetically sorted list by first author.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Haider et al. 'Phi-3 Safety Post-Training' (2024)
- Shayegani et al. 'Computer-Use Agents Exhibit Blind Goal-Directedness' (2025)
- Bryan et al. 'Taxonomy of Failure Mode in Agentic AI Systems' (2025)
- Bullwinkel et al. 'Representation Engineering and Multi-Turn Jailbreaks' (2025)
- Jones et al. 'Security Vulnerabilities in Computer Use Agents' (2025)
- Russinovich et al. 'The Price of Intelligence' CACM (2025)
- Bullwinkel et al. 'The Trigger in the Haystack' (2026)
Total bibliography: 53 entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MyST auto-detects doi.org links and creates a duplicate References
section at the bottom of the page. Use plain text DOI instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These top-level modules were missing from the myst.yml navigation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
romanlutz and others added 21 commits March 13, 2026 15:46
Modules like pyrit.datasets that only re-export aliases were
showing empty pages. Now aliases are rendered in a Re-exports
section showing name → target path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts:
#	Makefile
#	build_scripts/gen_api_md.py
#	doc/myst.yml
The merge incorrectly reverted the artifact path from html back to site.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bibliography.md is the single source for all references. The .bib
file was not being consumed by any {cite} tags and duplicated all
entries. Remove it and the myst.yml bibliography config.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- references.bib is the single source of truth for all citations
- build_scripts/gen_bibliography.py parses .bib and generates
  doc/bibliography.md with proper formatting (sorted by author,
  commas between names, venue info, links)
- bibliography.md is now gitignored (regenerated during docs-build)
- Added gen_bibliography.py step to Makefile docs-build target
- Restored bibliography: references.bib config in myst.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of a custom script, bibliography.md simply cites all .bib
entries using pandoc syntax [@key1; @key2; ...]. MyST renders the
full formatted bibliography natively with no custom code needed.

references.bib remains the single source of truth. To add a new
reference, add it to .bib and its key to bibliography.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Citations are needed to trigger MyST's reference rendering, but
the raw citation keys are not useful to readers. Wrap them in a
collapsed dropdown so only the formatted References section shows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MyST collapses the references list by default with a Show All button.
Override via CSS to show all entries expanded on the bibliography page.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MyST's native {bibliography} directive truncates to 15 entries with
a JS-powered 'Show All' button that can't be auto-expanded via CSS.
Revert to generating bibliography.md from references.bib as pure
markdown, which renders all 51 entries without truncation.

references.bib is the source of truth; bibliography.md is generated
during docs-build and gitignored.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added [@bibtex_key] citation tags alongside existing URLs in Python
docstrings across 25 files (datasets, attacks, converters). These
render as clickable bibliography citations in the MyST-generated API
reference docs. Also removed CSS overrides that were hiding the
bibliography Show All button.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New bibliography entries: CBT-Bench, HarmfulQA, VLSU, Transphobia
Awareness (Scheuerman et al.), and ANSI Escape Sequences (STÖK).

Also fixed wrong arXiv ID for VLSU dataset (was 2501.01151 which is a
physics paper, corrected to 2510.18214). Added cite tags to crescendo,
skeleton_key, mlcommons, and character_space_converter docstrings.

Updated bibliography.md citation list with all new keys.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added [@bibtex_key] citations to 8 user guide .py/.ipynb pairs:
crescendo, flip attack, many-shot jailbreak, skeleton key, TAP,
GCG auxiliary attacks, GPTFuzzer, and transparency attack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced markdown [text](url) links with plain text where a [@cite]
tag is already present, avoiding two competing clickable elements.
Added website URL to crescendo bibtex entry note field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ich table

Added [@bibtex_key] citations on first mention of techniques/datasets
across architecture, attack, cookbook, converter, scenario, target,
blog, and index pages (both .py and .ipynb).

Extended doc/code/datasets/1_loading_datasets with a dataset overview
table that extracts name, description, source URL, and citation from
each provider's docstring.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolved conflicts:
- build_scripts/gen_api_md.py: kept our short_title/heading changes,
  adopted main's backtick bases formatting
- doc/myst.yml: kept both pyrit_setup_initializers and pyrit_show_versions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removed the auto-generated table code cell. Instead, added inline
citations for all 21 datasets with bibtex entries in the introductory
markdown text. Kept the existing get_all_dataset_names() output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New entries: Aegis (Ghosh et al.), ALERT (Tedeschi et al.), garak
(Derczynski et al.), MedSafetyBench (Han et al.), LLM-LAT (Sheshadri
et al.), TDC23 (Mazeika et al.), CCP Sensitive Prompts (promptfoo),
PromptIntel (Thomas Roccia), Red Team Social Bias (Simone Van Taylor).

Added cite tags to all dataset docstrings and the loading datasets
page. Fixed missing aliases variable in gen_api_md.py from merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@bashirpartovi bashirpartovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@romanlutz romanlutz merged commit 9614910 into microsoft:main Mar 16, 2026
38 checks passed
@romanlutz romanlutz deleted the bibliography-support branch March 16, 2026 22:41
riyosha pushed a commit to riyosha/PyRIT that referenced this pull request Mar 24, 2026
…ion (microsoft#1472)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants