DOC: Add bibliography support with BibTeX citations across documentation#1472
Merged
romanlutz merged 51 commits intomicrosoft:mainfrom Mar 16, 2026
Merged
DOC: Add bibliography support with BibTeX citations across documentation#1472romanlutz merged 51 commits intomicrosoft:mainfrom
romanlutz merged 51 commits intomicrosoft:mainfrom
Conversation
- Upgrade jupyter-book from 1.0.4 (Sphinx-based) to 2.1.2 (MyST engine) - Replace Sphinx autodoc/autosummary/napoleon with griffe-based API doc generation - Add scripts/pydoc2json.py and scripts/gen_api_md.py for API doc generation - Add doc/api/ with generated API reference pages for all pyrit submodules - Add doc/myst.yml unified config replacing _config.yml and _toc.yml - Custom landing page with hero banner, key capabilities, installation cards - Top-level navigation bar - Update Makefile, GitHub Actions, ReadTheDocs, pre-commit hooks - Remove old Sphinx-dependent build scripts and config files - Add griffe to dev dependencies, remove sphinxcontrib-mermaid Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…down) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These pre-commit hooks are still needed for .ipynb files (strip kernelspec metadata and sanitize user-specific paths). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ok.py) build_scripts/validate_docs.py checks myst.yml TOC references exist and detects orphaned doc files. Runs in <1s vs full build taking 18s+. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s to hyphens JB2 URL slugging: 0_dataset → dataset, 1a_install_uv → a-install-uv, 0_prompt_targets → prompt-targets. Updated all site.nav URLs to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…xists) API docs are now auto-generated from source by griffe, which reads __all__ directly. The generation script itself ensures completeness. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace backslash path separators with forward slashes in doc/myst.yml for cross-platform compatibility (Windows + WSL/Linux) - Replace snake emojis with sun emojis on landing page cards (conda -> uv) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- pydoc2json.py: pass include_submodules=True recursively so nested submodules (e.g. pyrit.executor.attack) are included in pyrit_all.json - gen_api_md.py: split aggregate JSON into per-module files before generating markdown, so all API pages referenced by myst.yml exist - generate_rss.py: update path from _build/html to _build/site for JB2 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JB2 no longer produces standalone HTML blog pages. Rewrite generate_rss.py to parse blog source markdown files directly from doc/blog/ and output RSS to doc/_build/site/blog/rss.xml. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Populate doc/references.bib with 21 BibTeX entries (15 papers + 6 blogs/reports)
including Crescendo, PAIR, Spotlighting, and all previously referenced works
- Add bibliography.md with full author names, titles, years, venues, and links
- Bibliography is the single reference page (no duplicate per-page references)
- Notebooks keep existing hyperlinks without inline {cite} tags
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- 2401.15817: Forrest McKee, David Noever (not Gong et al.) - 2407.11969: Andriushchenko & Flammarion, 'Does Refusal Training in LLMs Generalize to the Past Tense?' (was misattributed) - 2409.11445: Bethany et al. (correct author list) - AI Red Team: 'Lessons From Red Teaming 100 Generative AI Products' with full 26 author list and arXiv/NeurIPS details Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comprehensive pass over all pyrit/ source files found references to 16 additional papers (mostly datasets and benchmarks used by PyRIT): BeaverTails, Do Anything Now, Multilingual Alignment Prism, HarmBench, JailbreakBench, Do-Not-Answer, Multilingual Vulnerabilities, OR-Bench, PKU-SafeRLHF, ToxicChat, SOSBENCH, SORRY-Bench, SimpleSafetyTests, SALAD-Bench, EquityMedQA, and Adam optimizer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Found 5 more dataset/benchmark papers referenced in source code: DecodingTrust, WMDP, XSTest, MLCommons AILuminate. Total bibliography: 42 entries across 4 sections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Final sweep found 4 more missing references: - PyRIT paper itself (Lopez Munoz et al., 2024) - DarkBench (Apart Research, 2025) - Sneaky Bits blog post (Rehberger, 2025) Total bibliography: 46 entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove section headers (Academic Papers, Datasets, etc.) and merge all 46 entries into one alphabetically sorted list by first author. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Haider et al. 'Phi-3 Safety Post-Training' (2024) - Shayegani et al. 'Computer-Use Agents Exhibit Blind Goal-Directedness' (2025) - Bryan et al. 'Taxonomy of Failure Mode in Agentic AI Systems' (2025) - Bullwinkel et al. 'Representation Engineering and Multi-Turn Jailbreaks' (2025) - Jones et al. 'Security Vulnerabilities in Computer Use Agents' (2025) - Russinovich et al. 'The Price of Intelligence' CACM (2025) - Bullwinkel et al. 'The Trigger in the Haystack' (2026) Total bibliography: 53 entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MyST auto-detects doi.org links and creates a duplicate References section at the bottom of the page. Use plain text DOI instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These top-level modules were missing from the myst.yml navigation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Modules like pyrit.datasets that only re-export aliases were showing empty pages. Now aliases are rendered in a Re-exports section showing name → target path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # Makefile # build_scripts/gen_api_md.py # doc/myst.yml
The merge incorrectly reverted the artifact path from html back to site. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bibliography.md is the single source for all references. The .bib
file was not being consumed by any {cite} tags and duplicated all
entries. Remove it and the myst.yml bibliography config.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- references.bib is the single source of truth for all citations - build_scripts/gen_bibliography.py parses .bib and generates doc/bibliography.md with proper formatting (sorted by author, commas between names, venue info, links) - bibliography.md is now gitignored (regenerated during docs-build) - Added gen_bibliography.py step to Makefile docs-build target - Restored bibliography: references.bib config in myst.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of a custom script, bibliography.md simply cites all .bib entries using pandoc syntax [@key1; @key2; ...]. MyST renders the full formatted bibliography natively with no custom code needed. references.bib remains the single source of truth. To add a new reference, add it to .bib and its key to bibliography.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Citations are needed to trigger MyST's reference rendering, but the raw citation keys are not useful to readers. Wrap them in a collapsed dropdown so only the formatted References section shows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MyST collapses the references list by default with a Show All button. Override via CSS to show all entries expanded on the bibliography page. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MyST's native {bibliography} directive truncates to 15 entries with
a JS-powered 'Show All' button that can't be auto-expanded via CSS.
Revert to generating bibliography.md from references.bib as pure
markdown, which renders all 51 entries without truncation.
references.bib is the source of truth; bibliography.md is generated
during docs-build and gitignored.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit cbcd326.
Added [@bibtex_key] citation tags alongside existing URLs in Python docstrings across 25 files (datasets, attacks, converters). These render as clickable bibliography citations in the MyST-generated API reference docs. Also removed CSS overrides that were hiding the bibliography Show All button. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New bibliography entries: CBT-Bench, HarmfulQA, VLSU, Transphobia Awareness (Scheuerman et al.), and ANSI Escape Sequences (STÖK). Also fixed wrong arXiv ID for VLSU dataset (was 2501.01151 which is a physics paper, corrected to 2510.18214). Added cite tags to crescendo, skeleton_key, mlcommons, and character_space_converter docstrings. Updated bibliography.md citation list with all new keys. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added [@bibtex_key] citations to 8 user guide .py/.ipynb pairs: crescendo, flip attack, many-shot jailbreak, skeleton key, TAP, GCG auxiliary attacks, GPTFuzzer, and transparency attack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced markdown [text](url) links with plain text where a [@cite] tag is already present, avoiding two competing clickable elements. Added website URL to crescendo bibtex entry note field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ich table Added [@bibtex_key] citations on first mention of techniques/datasets across architecture, attack, cookbook, converter, scenario, target, blog, and index pages (both .py and .ipynb). Extended doc/code/datasets/1_loading_datasets with a dataset overview table that extracts name, description, source URL, and citation from each provider's docstring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolved conflicts: - build_scripts/gen_api_md.py: kept our short_title/heading changes, adopted main's backtick bases formatting - doc/myst.yml: kept both pyrit_setup_initializers and pyrit_show_versions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removed the auto-generated table code cell. Instead, added inline citations for all 21 datasets with bibtex entries in the introductory markdown text. Kept the existing get_all_dataset_names() output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New entries: Aegis (Ghosh et al.), ALERT (Tedeschi et al.), garak (Derczynski et al.), MedSafetyBench (Han et al.), LLM-LAT (Sheshadri et al.), TDC23 (Mazeika et al.), CCP Sensitive Prompts (promptfoo), PromptIntel (Thomas Roccia), Red Team Social Bias (Simone Van Taylor). Added cite tags to all dataset docstrings and the loading datasets page. Fixed missing aliases variable in gen_api_md.py from merge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3 tasks
riyosha
pushed a commit
to riyosha/PyRIT
that referenced
this pull request
Mar 24, 2026
…ion (microsoft#1472) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds comprehensive bibliography and citation support to PyRIT's documentation using MyST's native BibTeX integration.
Changes
Bibliography infrastructure
doc/references.bib— curated BibTeX file with 65 entries covering all referenced papers, datasets, blog posts, and toolsdoc/bibliography.md— bibliography page using MyST's{bibliography}directive with "Show All" expand buttondoc/css/custom.css— removed CSS overrides that were hiding the bibliography expand buttonCitations in Python docstrings (API reference)
[@bibtex_key]cite tags to docstrings across 35+ source files (datasets, attacks, converters)Citations in user guide & cookbooks
.pyand matching.ipynbfiles for all changesDatasets page enhancements
doc/code/datasets/1_loading_datasets— added inline citations for 31 datasets with BibTeX entriesAPI reference improvements
build_scripts/gen_api_md.py— addedshort_titlefrontmatter for cleaner sidebar navigation; simplified headings to show just class/function names(full signatures in code blocks below)
New BibTeX entries added
Aegis (Ghosh et al.), ALERT (Tedeschi et al.), CBT-Bench (Zhang et al.), garak (Derczynski et al.), HarmfulQA (Chu et al.), LLM-LAT (Sheshadri et al.),
MedSafetyBench (Han et al.), PromptIntel (Thomas Roccia), Red Team Social Bias (Simone Van Taylor), Transphobia Awareness (Scheuerman et al.), TDC23
(Mazeika et al.), VLSU (Palaskar et al.), ANSI Escape Sequences (Fredrik "STÖK" Alexandersson), CCP Sensitive Prompts (promptfoo)