Skip to content

nasa/ROSES-Compliance-Checking-Tools

Repository files navigation

Background

This directory contains some code to check compliance of proposals submitted to NASA ROSES calls.

Code author: Megan Ansdell @mansdell

Setup

Required Non-standard Packages

PyMuPDF: a useful package for importing PDF text (which confusingly is imported as import fitz)

Description

check_dapr_single.py

This code reads in an anonymized proposal for one of NASA's Dual-Anonymous Peer Reivew (DAPR) programs. It attempts to locate the references section (if not provided by the user) and then checks a variety of things to make sure it is DAPR compliant.

The code requires two inputs (in this order) and can take two additional optional inputs:

  1. REQUIRED: Path to the anonymized proposal PDF. This is not the full proposal generated by NSPIRES.
  2. REQUIRED: Path to a file with the team member information (Team_Info_Path). There are two options for this:
    • CSV file with first names, last names, institutions, and cities of each team member (an example is provided in this repo) OR
    • The non-anonymized NSPIRES-generated proposal
  3. OPTIONAL: Start page of the references section in the PDF (otherwise the code will attempt to guess this)
  4. OPTIONAL: End page of the references section in the PDF (otherwise the code will attempt to guess this)

Example command line inputs with only required inputs (where you would replace the paths with your own):

    python check_dapr_single.py "./anonproposal.pdf" "./NSPIRES_Full_Proposal.pdf"
    python check_dapr_single.py "./anonproposal.pdf" "./team_info.csv"

Example command line with optional inputs for start and end pages of references section:

    python check_dapr_single.py "./anonproposal.pdf" "./NSPIRES_Full_Proposal.pdf" 17 21
    python check_dapr_single.py "./anonproposal.pdf" "./team_info.csv" 17 21

The code outputs the following:

  • Page ranges for the STM and References sections

    • The code guesses these proposal sections by assuming the following order: STM, References, Other (e.g., budget).
    • You can input the references start/end pages manually (see above) to avoid this issue
    • The guesses are usually correct, but sometimes they're not. This only really matters for searching for the team member names: if the code got the references section wrong, it'll probably incorrectly flag the team member names as being in the main proposal text and/or miss DAPR violations in the budget section if the budget was improperly or not fully redacted.
  • Reference format

    • DAPR proposals are supposed to use bracketed number references, rather than "et al." references
    • The code reports the number of brackets found in the proposal and number of "et al." usages in proposal (the former number should be high, the latter should be zero)
  • Forbidden DAPR words

    • DAPR proposal shouldn't include any identifying team member information (names, institutions, cities, genders)
    • The code reports number of times such things are found and the page numbers on which they are found
    • Note that if you use the NSPIRES option for inputting team member names, cities are not included as that info is not in the NSPIRES cover pages.

check_dapr_multi.py

This is a version of check_dapr_single.py that can be used to check multiple proposals at a time.

The code requires two inputs (in this order):

  1. REQUIRED: Path to directory containing the anonymized proposal PDFs
  2. REQUIRED: Path to directory containing the full, non-anonymized NSPIRES-generated proposal PDFs

Note that the CSV option for the team member info is not available for this version.

Example command line inputs (where you would replace the paths with your own):

    python check_dapr_multi.py ./proposals_anon ./proposals_full

The output is the same as check_dapr_single.py except that it first prints the name of the anonymized PDF being checked and the name of the non-anonymized PDF that is being used for the team info, so that you can make sure the correct files are being compared. If the number of anonymized and non-anonymized PDFs are not equal, the program will quit before doing anything further.

check_format_single.py

This code reads in a proposal (either the anonymized version or the full, non-anonymized NSPIRES-generated PDF) and attempts to find the "Scientific / Technical / Management" section and then checks ROSES formatting requirements (font size, lines-per-inch, characters-per-inch). Please make sure to read the ROSES solicitation and NASA Guidebook for Proposers carefully, as formatting requirements may be different than those flagged below.

The code requires one input, the path to the proposal PDF. Example command line input (where you would replace the paths with your own):

    python check_format_single.py ./NSPIRES_Full_Proposal.pdf

The code outputs the following:

  • PI name and proposal number

    • These are taken from the cover page of the NSPIRES-formatted PDF
  • Font size

    • The median font size used in the proposal is calculated and output to the terminal
    • A histogram of the font sizes is saved to the current directory (the gray horizontal line indicates ~12-point font size)
  • Lines per inch (LPI) and counts per inch (CPI)

    • LPI is calculated per page and for pages with LPI > 5.5, the page number of the violation and the LPI value is provided.
    • CPI is calculated per line and the number of pages for which CPI > 16.0 is provided along with snippets of the line text
    • Note that PDF formats are weird and not inherently machine readable, so these calculations are not exact and results should be checked carefully. The limits for LPI and CPI used in the code are purposefully lenient compared to the current ROSES requirements for this reason, thus the code will only report blatant violations (or weird PDF formats that could not be read properly).

Disclaimer

This is not an official NASA product.

Contact Megan Ansdell @mansdell with questions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages