Define Dataset: State Incarceration (Crime and Justice) #38

Open
emily878 opened this Issue Mar 6, 2015 · 7 comments

Comments

Projects
None yet
3 participants
@emily878
Contributor

emily878 commented Mar 6, 2015

Define the essential substantive elements of the core State Incarceration dataset. What are the components that it must minimally include? Do we have a dataset that we could hold up as a model?

@emily878

This comment has been minimized.

Show comment
Hide comment
@emily878

emily878 Mar 13, 2015

Contributor

Hey, @beccasjames - could you help us out with a list of minimal necessary data elements?

Contributor

emily878 commented Mar 13, 2015

Hey, @beccasjames - could you help us out with a list of minimal necessary data elements?

@beccasjames

This comment has been minimized.

Show comment
Hide comment
@beccasjames

beccasjames Mar 20, 2015

Core elements for a particular state incarceration dataset, inmate population data, would include the following elements:

  • be regularly updated and archived, daily or weekly preferred
  • include number of inmates in each facility
  • include at what percentage of capacity the facility is operating
  • include numerical and percent change in population from same time period of previous year

A model (or "token") dataset can be found at the California Department of Corrections and Rehabilitation (CDCR). They produce weekly and monthly population reports for both inmate and parole populations, including an extensive archive: http://www.cdcr.ca.gov/Reports_Research/Offender_Information_Services_Branch/Population_Reports.html

Further, an ideal inmate population dataset would:

  • be in a machine readable format (.csv)
  • include breakdown of how many inmates are in maximum, medium and minimum security units as well as how many are in solitary confinement

As of now, I have yet to identify a state that fulfills all of these requirements. If discovered, updates will be provided.

Core elements for a particular state incarceration dataset, inmate population data, would include the following elements:

  • be regularly updated and archived, daily or weekly preferred
  • include number of inmates in each facility
  • include at what percentage of capacity the facility is operating
  • include numerical and percent change in population from same time period of previous year

A model (or "token") dataset can be found at the California Department of Corrections and Rehabilitation (CDCR). They produce weekly and monthly population reports for both inmate and parole populations, including an extensive archive: http://www.cdcr.ca.gov/Reports_Research/Offender_Information_Services_Branch/Population_Reports.html

Further, an ideal inmate population dataset would:

  • be in a machine readable format (.csv)
  • include breakdown of how many inmates are in maximum, medium and minimum security units as well as how many are in solitary confinement

As of now, I have yet to identify a state that fulfills all of these requirements. If discovered, updates will be provided.

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Mar 20, 2015

Contributor

Is it desirable, or even possible, to have identifiable, per-prisoner granularity?

Contributor

waldoj commented Mar 20, 2015

Is it desirable, or even possible, to have identifiable, per-prisoner granularity?

@emily878

This comment has been minimized.

Show comment
Hide comment
@emily878

emily878 Mar 20, 2015

Contributor

Becca and I talked about that and I personally don't think we want that as
our first cut at a dataset. It will increase the visibility of people's PII
in a way that I think will be problematic for the project.

On Fri, Mar 20, 2015 at 4:16 PM, Waldo Jaquith notifications@github.com
wrote:

Is it desirable, or even possible, to have identifiable, per-prisoner
granularity?


Reply to this email directly or view it on GitHub
#38 (comment)
.

Emily Shaw
National Policy Manager | Sunlight Foundation |
(o) 202-742-1520 x 282 | (c) 207-233-5684
@emilydshaw http://twitter.com/emilydshaw

Contributor

emily878 commented Mar 20, 2015

Becca and I talked about that and I personally don't think we want that as
our first cut at a dataset. It will increase the visibility of people's PII
in a way that I think will be problematic for the project.

On Fri, Mar 20, 2015 at 4:16 PM, Waldo Jaquith notifications@github.com
wrote:

Is it desirable, or even possible, to have identifiable, per-prisoner
granularity?


Reply to this email directly or view it on GitHub
#38 (comment)
.

Emily Shaw
National Policy Manager | Sunlight Foundation |
(o) 202-742-1520 x 282 | (c) 207-233-5684
@emilydshaw http://twitter.com/emilydshaw

@beccasjames

This comment has been minimized.

Show comment
Hide comment
@beccasjames

beccasjames Mar 20, 2015

Echoing Emily here, the PII shared with inmate-level micro-data is potentially problematic. A few states actually do produce extensive, machine-readable datasets with inmate-level micro-data. If you're interested in what those look like, see examples below:

Echoing Emily here, the PII shared with inmate-level micro-data is potentially problematic. A few states actually do produce extensive, machine-readable datasets with inmate-level micro-data. If you're interested in what those look like, see examples below:

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Mar 20, 2015

Contributor

Got it—thank you!

Contributor

waldoj commented Mar 20, 2015

Got it—thank you!

@waldoj

This comment has been minimized.

Show comment
Hide comment
@waldoj

waldoj Mar 20, 2015

Contributor

That Nebraska data is the weirdest thing. It's an Excel spreadsheet with two worksheets—one with 60,000 records, one with a suspicion-inducing 65,535—that contain just one row, with one number in each row. I feel a bit like I just bought a hard drive at Best Buy, got it home, opened the box, and found only a brick inside.

Contributor

waldoj commented Mar 20, 2015

That Nebraska data is the weirdest thing. It's an Excel spreadsheet with two worksheets—one with 60,000 records, one with a suspicion-inducing 65,535—that contain just one row, with one number in each row. I feel a bit like I just bought a hard drive at Best Buy, got it home, opened the box, and found only a brick inside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment