Legal history tools to manipulate downloaded Shepard's citation data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

sheptools: Legal history utilities to manipulate downloaded Shepard's citation data

Last updated: March 27, 2017

NOTE: This toolkit was designed to be used with the downloaded output of "Lexis-Nexis Academic," which has been completely redesigned and rebranded as "Nexis Uni." This will not work with the output from the new Nexis Uni site, and it appears that Lexis-Nexis will phase out the old site around the end of calendar 2017.


The sheptools collection of utilities was written by Eric Nystrom ( to make it possible to analyze how court cases have been used after their publication. They rely on parsing information downloadable from the commercial "Shepard's Citations" service, which is owned by LexisNexis and is typically available to researchers whose institution subscribes to LexisNexis. (I am not affiliated in any way with LexisNexis or any other such service.)

The utilities are all licensed under the MIT License. In addition to this README, most of the scripts are excessively commented. If you find these useful, I'd be very interested to hear how you are using them (via email, Github, Twitter, website).


  • shep2csv: Parse citations from HTML-exported results from LexisNexis "Shepardize." (Perl)
  • shep-group: add a group column based on jurisdiction to the output of a Shepards breakdown generated by shep2csv. Different jurisdiction maps can be specified; grouping maps for controlling circuit (MAP-control-circ.tsv) and for court level (MAP-court-levels.tsv) are provided. (Awk)
  • shep-group-state: evaluate jurisdiction and citation info to make groups by state. (Awk)
  • shep-group-pivot: a pivot-tables style yearly aggregator, for output that has had a "group" added in the 10th column by shep-group or shep-group-state, pivoting on year information in column 8. (Awk)
  • shep-separate-citetypes: separate out combined citation-types in a single record into individual records, one citation-type for each. (Awk)
  • shep-separate-headnotes: separate out combined headnotes in a single record into individual records, skipping citations with no headnotes. (Awk)
  • shep-hn-pivot: a pivot-tables style yearly aggregator for headnote-separated output, to aggregate by year. (Awk)
  • shep-extract-headnotes: extract headnote text from HTML-exported case text from LexisNexis. (Perl)


Download and place somewhere in your $PATH, or call the scripts with a complete path on the command line.

Dependencies: shep2csv and shep-extract-headnotes depend on Perl and the packages HTML::Strip, Text::Unidecode, and Text::Trim. These can be downloaded from CPAN or are provided in Debian by the libhtml-strip-perl, libtext-unidecode-perl, and libtext-trim-perl packages respectively. The other tools need Awk -- they were developed using gawk but should work with other forms of Awk.

Typical usage

  1. Go to LexisNexis, find your case of interest (such as by using the "Look up a Legal Case" dropdown), then choose "Shepardize" from the the "Next Steps" dropdown in the upper-right, and click "Go." This brings you to the Shepard's report.
  2. Click the floppy disk icon in the upper right to begin downloading the report. This will pop up a new window. Choose "HTML" as the format and click the Download button. Once the report is ready, right-click on the link and "Save Link As.." to save the report to your local disk in HTML format.
  3. From a terminal, use shep2csv to rework the downloaded HTML file into a TAB-separated CSV file. It prints to the screen, so output may be saved using Unix-style redirection, or piped to other programs for further processing.
  4. Since Shepard's reports sometimes contain errors, you may wish to either save and manually edit the CSV output, or create an "errata" script (in Awk, for example), to remove or modify incorrect information.
  5. If desired, additional processing can be done -- for example, extracting then grouping based on headnotes or citation types, grouping citations by their jurisdiction or state, pivoting output based on groups to facilitate graphing, etc. Output can also be imported into a spreadsheet program or other statistical software. If the original case's text has also been downloaded, shep-extract-headnotes can be used to pull out the Headnote information in a more readable form.

Output fields

By default, shep2csv provides nine columns of output in a TAB-separated format, with the first row of output containing a header with column names.

  1. "cited_case": The reference citation for the case being Shepardized. For an individual report, this field will be entirely the same, but it has been included to facilitate combining output from multiple reports if desired.

  2. "shepnum": The number given to this citation in the Shepardize report. These are unique within each Shepardize report, but since they are simply sequential, they might change from year to year as more cites are added to a case's Shepard's record. Since this number is unique, it can be helpful for referring to an individual record when making your own corrections to the Shepard's data.

  3. "howused": A list of ways the case was used by the citing case (as determined by the experts at Shepard's), separated by commas. The terms have been lightly transformed from those used by Shepard's to save space and remove whitespace. Examples: Cited,Explained,DissentCite,Distinguished (etc.)

  4. "title": The title of the citing case.

  5. "reference": The reference to the citing case. Note: Shepard's frequently offers parallel citations, only the first one is reported here, the other parallel citations are discarded.

  6. "jurisdiction": String representing the jurisdiction of the citing case, as represented by headers in the Shepardize report.

  7. "year_string": Year information, contained in parentheses, as provided in the Shepardize report. For many jurisdictions, this string contains useful information about the court that is not captured in the "jurisdiction" field, such as the state in the case of a case from one of the federal circuits.

  8. "year_num": 4 digit year, as extracted from "year_string".

  9. "headnotes": A list of the headnote-abbreviations used by the citing case in the Shepardize report. Note that LexisNexis headnotes are numbered sequentially for each case (HN1, HN2, etc.) such that case A's HN3 bears no relation to case B's HN3. These headnote numbers are drawn from the cited case, not the citing case, and represent how the citing case used elements of the cited case.

  10. OPTIONAL: Several of the grouping programs above, such as shep-group and shep-group-state create a tenth column. Other tools should attempt to detect and accommodate 10-column output where possible.


Shepardize reports sometimes contain incomplete information -- shep2csv tries to handle this as gracefully as possible, leaving fields blank when information is unknown, and copying the "reference" to the "title" field when "title" is otherwise blank.

There are likely bugs (please report them!), and formatting changes to the downloaded reports will probably break shep2csv.

These programs were developed on a Debian Linux system. Suggestions to enhance portability to other Unix-like systems would be welcomed.

"LexisNexis," "Headnote," "Shepard's," and "Shepardize" are undoubtedly trademarks of their respective owners, not me.