Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflation of USPS zip codes and US Census zip code tabulation areas (ZCTAs) #176

Closed
derekeder opened this issue Apr 29, 2014 · 9 comments

Comments

@derekeder
Copy link
Contributor

commented Apr 29, 2014

@robparal pointed out that the zip codes he has been working with to do data aggregation from the US Census are different than the zip codes on the Health Atlas.

US Postal Service (USPS) Zip Codes

The Health Atlas uses zip code boundaries from the City of Chicago data portal, which are the US Postal Service (USPS) zip codes.

screen shot 2014-04-29 at 9 57 21 am

US Census Zip Code Tabulation Areas (ZCTAs)

The US Census also has their own 'zip code-like' areas called Zip Code Tabulation Areas. They are comprised of census tracts for the purposes of aggregating census demographics.
screen shot 2014-04-28 at 2 52 53 pm

We are likely using both

We need to do an audit of our zip code data, as I believe we are using these two zip code boundaries interchangeably. Here's my assessment:

  • Health Atlas zipcode boundaries using USPS Zip Codes
  • CHITREC data this comes from patient records, so the likely boundaries are what people use as their zip code address, which are the USPS Zip Codes
  • Chicago Dept of Public Health (chronic diseases) Unknown, but it is likely US Census ZCTA's
  • Population demographics very likely US Census ZCTAs as it would be very difficult to roll up these demographics otherwise

Moving forward, we should pick one kind of zip code boundary and use it consistently. There are arguments for both.

  • when someone tells you their zip code, it is the USPS zip code. this is useful for data coming from patient records and businesses
  • if we want to show demographic information, the US Census is only able to do so with ZCTAs as they are derived from census tracts and don't change over time as much as USPS zip codes
@derekeder derekeder added the bug label Apr 29, 2014
@robparal

This comment has been minimized.

Copy link

commented Apr 30, 2014

@derekeder @KylaWilliams Regarding the problem of two different types of
zip codes: one used for census and possibly CDPH data, and another for
CHITREC and the upcoming hospital discharge data. How about if we leave
everything as is, and continue to use the postal zip codes for mapping
purposes. But, we would annotate census and other data along the lines of:

"These data are for Zip Code Tabulation Areas (ZCTAs) which differ from
postal zip codes used for mailing. To see the difference between ZCTAs
and postal zip codes, click here for a pair of maps."

On Tue, Apr 29, 2014 at 10:18 AM, Derek Eder notifications@github.comwrote:

@robparal https://github.com/robparal pointed out that the zip codes he
has been working with to do data aggregation from the US Census are
different than the zip codes on the Health Atlas.
US Postal Service (USPS) Zip Codes

The Health Atlas uses zip code boundaries from the City of Chicago data
portalhttps://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-ZIP-Codes-KML/syyx-k68t,
which are the US Postal Service (USPS) zip codeshttp://en.wikipedia.org/wiki/ZIP_code
.

[image: screen shot 2014-04-29 at 9 57 21 am]https://cloud.githubusercontent.com/assets/919583/2830429/9591b93e-cfae-11e3-8dfb-9f3fe697207a.png
US Census Zip Code Tabulation Areas (ZCTAs)

The US Census also has their own 'zip code-like' areas called Zip Code
Tabulation Areas http://en.wikipedia.org/wiki/ZIP_Code_Tabulation_Area.
They are comprised of census tracts for the purposes of aggregating census
demographics.
[image: screen shot 2014-04-28 at 2 52 53 pm]https://cloud.githubusercontent.com/assets/919583/2830461/de1104bc-cfae-11e3-978c-749773f79ca5.png
We are likely using both

We need to do an audit of our zip code data, as I believe we are using
these two zip code boundaries interchangeably. Here's my assessment:

  • Health Atlas zipcode boundaries using USPS Zip Codes
  • CHITREC data this comes from patient records, so the likely
    boundaries are what people use as their zip code address, which are the
    USPS Zip Codes
  • Chicago Dept of Public Health (chronic diseases) Unknown, but it
    is likely US Census ZCTA's
  • Population demographics very likely US Census ZCTAs as it would be
    very difficult to roll up these demographics otherwise

Moving forward, we should pick one kind of zip code boundary and use it
consistently. There are arguments for both.

  • when someone tells you their zip code, it is the USPS zip code. this
    is useful for data coming from patient records and businesses
  • if we want to show demographic information, the US Census is only
    able to do so with ZCTAs as they are derived from census tracts and don't
    change over time as much as USPS zip codes


Reply to this email directly or view it on GitHubhttps://github.com//issues/176
.

Rob Paral
773-609-4510
www.robparal.com
robparal.blogspot.com
http://www.linkedin.com/in/robparal

derekeder pushed a commit to datamade/chicago-atlas that referenced this issue May 1, 2014
@derekeder

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2014

I extracted and uploaded a GeoJSON file for Chicago ZCTAs which we can link to in @robparal's proposed annotation

https://github.com/smartchicago/chicago-atlas/blob/master/db/import/chicago_zctas.geojson
screen shot 2014-05-01 at 12 15 17 pm

@derekeder

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2014

Looked in to the Chicago Dept of Public Health chronic diseases data, and according to the dataset documentation, they are aggregated by US Postal Service Zip code

From https://data.cityofchicago.org/api/assets/4B3F1B0A-43FA-4BAF-9658-1DBDFB94E6F8:

Brief Description: This dataset contains the annual number of hospital discharges, crude
hospitalization rates with corresponding 95% confidence intervals, and age-adjusted hospitalization
rates (per 10,000 children and adults aged 5 to 64 years) with corresponding 95% confidence
intervals, for the years 2000 – 2011, by Chicago U.S. Postal Service ZIP code or ZIP code aggregate

With this, I believe the only data we are using that uses ZCTAs is the zip code population demographic data.

@JamyiaClark could you confirm this? If so, the rates calculated for Asthma Hospitalizations and Diabetes Hospitalizations are using US Postal Code data and dividing by US Census ZCTA population data. This would lead to some inaccuracy, right?

@robparal

This comment has been minimized.

Copy link

commented May 1, 2014

that would lead to some inaccuracy. It might be acceptable with proper
annotation.

(Sent from cell phone; please pardon typos.)
On May 1, 2014 3:33 PM, "Derek Eder" notifications@github.com wrote:

Looked in to the Chicago Dept of Public Health chronic diseases data, and
according to the dataset documentation, they are aggregated by US Postal
Service Zip code

From
https://data.cityofchicago.org/api/assets/4B3F1B0A-43FA-4BAF-9658-1DBDFB94E6F8
:

Brief Description: This dataset contains the annual number of hospital
discharges, crude
hospitalization rates with corresponding 95% confidence intervals, and
age-adjusted hospitalization
rates (per 10,000 children and adults aged 5 to 64 years) with
corresponding 95% confidence
intervals, for the years 2000 – 2011, by Chicago U.S. Postal Service ZIP
code or ZIP code aggregate

With this, I believe the only data we are using that uses ZCTAs is the zip
code population demographic data.

@JamyiaClark https://github.com/JamyiaClark could you confirm this? If
so, the rates calculated for Asthma Hospitalizationshttp://www.chicagohealthatlas.org/map/chronic_disease_asthma_hospitalizations#/?year=2011and Diabetes
Hospitalizationshttp://www.chicagohealthatlas.org/map/chronic_disease_diabetes_hospitalizations#/?year=2011are using US Postal Code data and dividing by US Census ZCTA population
data. This would lead to some inaccuracy, right?


Reply to this email directly or view it on GitHubhttps://github.com//issues/176#issuecomment-41953458
.

derekeder pushed a commit to datamade/chicago-atlas that referenced this issue May 2, 2014
@derekeder

This comment has been minimized.

Copy link
Contributor Author

commented May 2, 2014

Added disclaimer to zip code detail page

http://chicago-atlas-staging.herokuapp.com/place/60624
screen shot 2014-05-01 at 11 28 05 pm

I'd like to wait to hear back from @JamyiaClark before closing this issue

@derekeder derekeder closed this May 6, 2014
@janahirschtick

This comment has been minimized.

Copy link

commented May 19, 2015

Hi guys,
I wanted to chime in here since I have been facing this issue for some time and haven't found a good solution yet. I'm an epidemiologist and often need to calculate disease rates and hospitalizations by community (zip). As you mentioned above, data on hospitalizations is provided by USPS zip code. However, I need population data for the denominator to calculate a rate. I was using zip code population estimates from MCIC for a while but these data are now outdated and didn't include other demographic variables of interest for modeling. So, currently I'm planning to use census ZCTA data for the denominator.

I believe this is what CDPH does as well for their hospitalization rates, after making minor adjustments and aggregating the numerator data for a few small zip codes. I'm planning on making a note about this limitation but can't figure out any other way around it. If you have any suggestions please let me know!

Thanks,
Jana

@robparal

This comment has been minimized.

Copy link

commented May 19, 2015

Hi Jana: An issue is the extent to which the Zip Code Tabulations Areas of
the Census Bureau differ from the postal zip codes. They do, but the
difference may be minor enough to not matter substantially, depending on
your tolerance for inexactitude. I once overlayed ZCTAs over postal zips
in Chicago and looked very carefully at them. My memory is that the postal
zips are drawn in a way that you could not recreate them with blocks; they
do not strictly follow block boundaries.

Rob

On Tue, May 19, 2015 at 12:00 PM, janahirschtick notifications@github.com
wrote:

Hi guys,
I wanted to chime in here since I have been facing this issue for some
time and haven't found a good solution yet. I'm an epidemiologist and often
need to calculate disease rates and hospitalizations by community (zip). As
you mentioned above, data on hospitalizations is provided by USPS zip code.
However, I need population data for the denominator to calculate a rate. I
was using zip code population estimates from MCIC for a while but these
data are now outdated and didn't include other demographic variables of
interest for modeling. So, currently I'm planning to use census ZCTA data
for the denominator.

I believe this is what CDPH does as well for their hospitalization rates,
after making minor adjustments and aggregating the numerator data for a few
small zip codes. I'm planning on making a note about this limitation but
can't figure out any other way around it. If you have any suggestions
please let me know!

Thanks,
Jana


Reply to this email directly or view it on GitHub
#176 (comment)
.

Rob Paral
773-609-4510
www.robparal.com
robparal.blogspot.com
http://www.linkedin.com/in/robparal

@janahirschtick

This comment has been minimized.

Copy link

commented May 20, 2015

Thanks Rob. Yes, I know zips and ZCTAs don't align perfectly, but I haven't figured out a way to get around this. Perhaps I will try overlaying them as well to take a closer look so I can discuss it in my limitations.

Jana

@robparal

This comment has been minimized.

Copy link

commented May 20, 2015

​Good luck Jana and let us know if we might be able to help.
Rob​

On Wed, May 20, 2015 at 9:53 AM, janahirschtick notifications@github.com
wrote:

Thanks Rob. Yes, I know zips and ZCTAs don't align perfectly, but I
haven't figured out a way to get around this. Perhaps I will try overlaying
them as well to take a closer look so I can discuss it in my limitations.

Jana


Reply to this email directly or view it on GitHub
#176 (comment)
.

Rob Paral
773-609-4510
www.robparal.com
robparal.blogspot.com
http://www.linkedin.com/in/robparal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.