2017-2021 SVI for ZCTAs #12

usamabilal · 2023-01-12T23:29:32Z

years: 2017-2021
geo-unit: zcta
what areas your are intested in: Pennsylvania

Thanks!

heli-xu · 2023-02-10T15:06:20Z

Hi Usama, thanks for your patience. I'm attaching a zip file with data, a report and some documentation. Please refer to Readme in the zip file for more detailed information. I'd appreciate any suggestions and feedback from you and your team. Thanks!

2017to2021_PA_zcta_SVI.zip

usamabilal · 2023-02-13T22:06:10Z

Thanks so much Heli!! i'll Review and will let you know how things go

usamabilal · 2023-02-21T18:54:31Z

After reviewing, this looks great. I really like the validation. I understand that part of the differences in the validation stem from potential differences in aggregation from CT to ZCTA.
Let me know if my understanding below is correct:

Heli's SVI (hencefort hSVI) downloads data (using tidycensus) directly at the ZCTA level (geography="zcta") and then follows the same procedure as CDC's SVI (henceforth cSVI)
To validate, you first aggregate cSVI from CT to ZCTA. To do this, there would be to options
1. sum E_ variables and mean EP_ variables and then compute percentiles and follow the regular SVI calculation
1. take the mean of the percentiles
In the section "Aggregating ct data to ZCTA level" i understood you are doing 1, but in the code for "Percentile ranking (“RPL_xx”) by theme" i see option 2. Which one is happening?
Then once it is aggregated, you just compare hSVI to cSVI

A few notes (regardless of the 1 vs 2 thing above):

to me, the "real" validation is what you do in "SVI calculation and validation" which is "my code returns the same thing as CDC's (roughly, very minor differences)
the ZCTA vs CT validation, while nice, may be complicated to actually conduct properly. I say this because if a CT has 3 people and another CT has 1000 people (And they are both the only component CTs of a specific ZCTA) then a mean of EP_s would give the same weight to teach one, while the second CT should have 300 times the weight. Moreover, since CTs are not perfectly nested in ZCTAs (that is, a CT may be in more than one ZCTA) these weights would be ZCTA specific.
In other words: I'd keep the validation to making sure that when you use this at the scales CDC has worked at (county and tract) the results are as expected. It'd be good to replicate this at the county levle and compare hSVI with cSVI at the county level

Thanks again!!

heli-xu · 2023-02-21T23:43:19Z

Hi Usama, thanks for the feedback!

In the section "Aggregating ct data to ZCTA level" i understood you are doing 1, but in the code for "Percentile ranking (“RPL_xx”) by theme" i see option 2. Which one is happening?

You're right about how hSVI works, including the part that I used two aggregating methods. I did the sum E_variables and mean EP_variables without further computing percentiles and SVI, and I also took the mean of the percentiles separately. The purpose was to look at not only the aggregated cSVI, but also the individual variables in terms of their correlation with our calculation results. So by your standard, I was using option2 for cSVI aggregation from CT to ZCTA, and additionally I was using (part of) option1 for variable aggregation from CT to ZCTA. I'd be happy to do option1 for cSVI aggregation too if you'd like.

the ZCTA vs CT validation, while nice, may be complicated to actually conduct properly.

I completely agree with you about how tricky ZCTA vs CT validation can be, and the point about the ZCTA-specific weights makes a lot of sense. I got quite frustrated while trying to do the aggregation, but wanted to include them and hear your thoughts.

It'd be good to replicate this at the county levle and compare hSVI with cSVI at the county level

Here is a new report where I added the comparison between hSVI and cSVI at the county level (2018, 2020) and census tract level (2020) .

Thanks again for your time and advice, and please let me know if you have other questions/suggestions.

usamabilal · 2023-02-22T16:49:19Z

Thank you! I know get it. so "method" 1 for comparing variables and "method" 2 for comparing the SVI itself. Part of the issue may be that an aggregation of percentiles may not be comparable with an aggregation of variables and then creating percentiles. This is known as the STA vs ATS dilemma: summarize (aggregate) then analyze (percentile calculation) = STA vs analyze (percentile calculation) then summarize (aggregate)=ATS. Your approach for validation of the SVI is ATS (you first calculate percentiles and then aggregate by taking the mean of percentiles)

County-level validation looks great. I think CT (usual acronym for tracts) and CTY (usual acronym for counties) validation is all you need to ensure you are doing the right things.

Now one last thing: I do observe a few very minor differences in both CT and CTY. What do you attribute them to?

heli-xu · 2023-02-22T23:23:19Z

Good to know. Thank you very much! Indeed a dilemma...

For the minor differences, I think they may be due to the number of decimal places in EP_variables (percentage). CDC version keeps one decimal place, whereas ours have more because I didn't specify it in the function (at the time I preferred to preserve as much information as possible). Here is a report with more details with some examples. I'd appreciate your insight, and we could adjust the function to make it more consistent with CDC's data if needed.

Thanks again for your help!

usamabilal · 2023-02-23T17:32:22Z

Great! It'd be great to try to "fully replicate" their approach by matching their number of decimals. Interesting that they don't include the caveat in the 2020 documentation...

heli-xu · 2023-02-24T17:40:14Z

Sounds good! This is a report where I used the updated function (with matching decimal places) to reproduce CDC SVI. Thanks again for your input!

new get_svi() rounded EP_var and remove TOTPOP = 0 during ranking; refer to previous two commits on this file (forgot to link issue there)

heli-xu · 2023-02-24T17:56:21Z

If this looks good to you, I'll redo the zcta SVI (2017-2021, PA) using the new function and send them again.

usamabilal · 2023-02-24T19:58:54Z

Perfect!! Validation is 100% on point, so lets re-do them. thanks!

heli-xu · 2023-02-28T21:26:49Z

Sounds great! I'm attaching a zip folder with 5 updated tables of zcta-level SVI and a folder of CDC SVI tables and documentation for your reference (same as previously uploaded). I'd appreciate any further questions/suggestions. If they look good to you, please feel free to close the issue. Thanks again for your help with improving the result!

pa_zcta_svi_2017to2021_updated.zip

usamabilal · 2023-02-28T22:56:48Z

Thanks! All looks good,closing

heli-xu referenced this issue in heli-xu/svi-calculation Feb 21, 2023

create quarto blog for svi calculation notes #1

03f5f40

heli-xu referenced this issue in heli-xu/svi-calculation Feb 24, 2023

remove the old get_svi(), #1

99a6df4

new get_svi() rounded EP_var and remove TOTPOP = 0 during ranking; refer to previous two commits on this file (forgot to link issue there)

usamabilal closed this as completed Feb 28, 2023

heli-xu transferred this issue from heli-xu/svi-calculation Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2017-2021 SVI for ZCTAs #12

2017-2021 SVI for ZCTAs #12

usamabilal commented Jan 12, 2023

heli-xu commented Feb 10, 2023

usamabilal commented Feb 13, 2023

usamabilal commented Feb 21, 2023

heli-xu commented Feb 21, 2023

usamabilal commented Feb 22, 2023

heli-xu commented Feb 22, 2023

usamabilal commented Feb 23, 2023

heli-xu commented Feb 24, 2023

heli-xu commented Feb 24, 2023

usamabilal commented Feb 24, 2023

heli-xu commented Feb 28, 2023

usamabilal commented Feb 28, 2023

2017-2021 SVI for ZCTAs #12

2017-2021 SVI for ZCTAs #12

Comments

usamabilal commented Jan 12, 2023

heli-xu commented Feb 10, 2023

usamabilal commented Feb 13, 2023

usamabilal commented Feb 21, 2023

heli-xu commented Feb 21, 2023

usamabilal commented Feb 22, 2023

heli-xu commented Feb 22, 2023

usamabilal commented Feb 23, 2023

heli-xu commented Feb 24, 2023

heli-xu commented Feb 24, 2023

usamabilal commented Feb 24, 2023

heli-xu commented Feb 28, 2023

usamabilal commented Feb 28, 2023