Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide documentation and examples on the use of redactions in JSON #446

Closed
philipashlock opened this issue Feb 27, 2015 · 10 comments
Closed

Comments

@philipashlock
Copy link
Contributor

The general guidance on redactions for federal agencies is as follows, but we need to provide examples of what this looks like as JSON.

Redaction text Brief FOIA exemption description
[[REDACTED-EX B3]] Specifically exempted from disclosure by statute (other than FOIA), provided that such statute (A) requires that the matters be withheld from the public in such a manner as to leave no discretion on the issue, or (B) establishes particular criteria for withholding or refers to particular types of matters to be withheld.
[[REDACTED-EX B4]] Trade secrets and commercial or financial information obtained from a person and privileged or confidential.
[[REDACTED-EX B5]] Inter-agency or intra-agency memorandums or letters which would not be available by law to a party other than an agency in litigation with the agency.
[[REDACTED-EX B6]] Personnel and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy.
@philipashlock
Copy link
Contributor Author

This has started to be implemented with GSA/project-open-data-dashboard#83 such that the following example below (taken from the existing example) would pass the schema validation

Note that the particular exemption reason denoted by "B3" in the example used [[REDACTED-EX B3]] might not make sense in some of the places it's used. More generally, the places where the redactions are used in this example might not make sense given the descriptions used in the other fields. The example here is intended only to demonstrate what the redaction text would look like in the JSON syntax.

It's worth considering whether some fields might never need to be redacted, eg (accessLevel, identifier, isPartOf, bureauCode, programCode). With a traditional redacted paper document, I imagine the page numbers are never redacted, even if the full page is. Similarly, it seems like it would be necessary to retain the identifier even if everything else was redacted so that you could at least distinguish between different redacted records.

{
    "@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld",
    "@id": "http://www.agency.gov/data.json",
    "@type": "dcat:Catalog",    
    "conformsTo": "https://project-open-data.cio.gov/v1.1/schema", 
    "describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
    "dataset": [
        {
            "@type": "dcat:Dataset",
            "accessLevel": "non-public", 
            "accrualPeriodicity": "R/P1Y", 
            "bureauCode": [
                "018:10"
            ],
            "conformsTo": "http://www.agency.gov/widget-taxonomy/",
            "contactPoint": {
                "@type": "vcard:Contact",
                "fn": "Jane Doe", 
                "hasEmail": "mailto:jane.doe@agency.gov"
            }, 
            "describedBy": "http://www.agency.gov/datasets/widgets-dictionary.html", 
            "dataQuality": true, 
            "description": "This dataset provides national statistics on the production of widgets for [[REDACTED-EX B4]]", 
            "distribution": [
                {
                    "@type": "dcat:Distribution",
                    "description": "[[REDACTED-EX B4]] widgets data as a CSV file", 
                    "downloadURL": "[[REDACTED-EX B4]]", 
                    "format": "CSV", 
                    "mediaType": "text/csv", 
                    "title": "[[REDACTED-EX B4]]-widgets.csv"
                }
            ], 
            "identifier": "https://metadata.agency.gov/10.7927/H4PZ56R2", 
            "issued": "2011-11-22", 
            "keyword": [
                "widget", 
                "manufacturing", 
                "factory"
            ], 
            "landingPage": "http://agency.gov/widgets/data", 
            "language": [
                "en-US"
            ], 
            "license": null, 
            "modified": "2011-11-19T12:00:00Z", 
            "primaryITInvestmentUII": "021-006227212", 
            "programCode": [
                "018:001"
            ], 
            "publisher": {
                "@type": "org:Organization",
                "name": "Widget Services", 
                "subOrganizationOf": {
                    "@type": "org:Organization",
                    "name": "Office of Widget Statistics"                    
                }
            }, 
            "references": [
                "https://agency.gov/docs/widgets-1.html", 
                "https://agency.gov/docs/widgets-2.html"
            ], 
            "rights": "This dataset cannot be made public because it includes trade secrets and commercial or financial information obtained from a person and is privileged or confidential.", 
            "spatial": "United States", 
            "systemOfRecords": "http://www.agency.gov/widgets/sorn/", 
            "temporal": "2009-09-01T12:00:00Z/2010-05-31T12:00:00Z", 
            "theme": [
                "manufacturing"
            ], 
            "title": "U.S. Widget Statistics for [[REDACTED-EX B4]]"
        }
    ]
}

@rebeccawilliams
Copy link
Contributor

I think all fields should be redacted with a presumption of openness. This is inline with the federal FOIA policy. An example that reflects this would be useful too.

@rebeccawilliams
Copy link
Contributor

Including the presumption of openness language (above) and DOT's PDL as a best practice would be good additions to this guidance as well.

@jlberryhill
Copy link
Contributor

Thanks, guys, and great example @philipashlock. I'd also note that certain parts of a field can be redacted rather than the whole field, if only certain words are subject to FOIA exemption. Agree with @rebeccawilliams on the presumption of openness. Think agencies should not redact entire metadata records and that there may be some fields that would never make sense to be redacted.

@bpushed
Copy link
Contributor

bpushed commented Mar 4, 2015

Greetings all -- As a foreign assistance agency, USAID is exempt from releasing data per the seven principled exceptions outlined in OMB 12-01 (see Attachment 1, page 4).

When we issued our open data policy, this is the guidance we provided to our staff for justifying exemptions. Our FOIA office agrees that these do not conflict with the FOIA act, but I wanted to flag this issue so that we can adopt an approach that keeps both documents in mind. Thanks.

@konklone
Copy link
Contributor

konklone commented Mar 4, 2015

@bpushed I believe that still means USAID needs to express those exemptions in the form of redacted JSON, with individualized determinations for each field and catalog entry.

@bpushed
Copy link
Contributor

bpushed commented Mar 5, 2015

Thanks. That is essentially our plan. For the Sunlight Foundation FOIA request, we were asked specifically to use FOIA exemptions but would plan to revert to OMB 12-01 moving forward.

@bbrotsos
Copy link

We are only planning on redacting (if any) on the PDL and leaving the EDI with the full description. This will increasingly become more difficult to manage without some additional metadata tags to automate generating PDL vs EDI. However if you add additional metadata tags for redaction, the simplicity of the POD Schema would be lost.

Is there an equivalent way to do inline tags on text in a JSON fields like in xml? For example:

    "description": "<Redacted type='exb4'>Non Public Title</> widgets data as a CSV file"

The only equivalent way I can think of to do this in json is:

   "description_redacted": "[[REDACTED-EX B4]] Non Public Title widgets data as a CSV file"   
   "description": "Non Public Title widgets data as a CSV file"

This would needlessly complicate the schema. Could Agencies submit both the PDL and EDI redacted?

@rebeccawilliams
Copy link
Contributor

@bbrotsos Following up on this thread -- PDLs @ /data.json should include non-public datasets including any required redactions. If redactions are present, an unredacted copy must also be submitted to OMB Max.

I think that was clear, but wanted to record that in this issue. Closing this issue as guidance is live: https://project-open-data.cio.gov/redactions/

New issues or pull requests to clarify that guidance are encouraged though.

@philipashlock
Copy link
Contributor Author

@bbrotsos For what it's worth, this is what we're going to try for inventory.data.gov - GSA/enterprise-data-inventory#182 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants