Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

percentage estimate set to 100 #15

Closed
SPTKL opened this issue Dec 18, 2019 · 2 comments
Closed

percentage estimate set to 100 #15

SPTKL opened this issue Dec 18, 2019 · 2 comments

Comments

@SPTKL
Copy link
Contributor

SPTKL commented Dec 18, 2019

With a as (
select variable, count(*) as count from pff_social."Y2013-2017-old" 
where p = 100
GROUP BY variable),
b as (select variable, count(*) as count from pff_social."Y2013-2017" 
where p = 100
GROUP BY variable)
SELECT a.variable, a.count as old_count, b.count as new_count 
FROM a,b
WHERE a.variable = b.variable;

From this query we can see that for some variables we have fairly different results for the number of records with 100 as percentage estimate.
image

Need to figure out if we need to set P=100 for certain variables by definition, if so, how to identify those variables

@SPTKL
Copy link
Contributor Author

SPTKL commented Dec 18, 2019

Also note that for many records, we have P > 100, if we use the current defined way to identify the total, e.g. CNI1864_1
for SI28

>>> lst = ['1400000US36085020700', '1400000US36085021300', '1400000US36085024700', '1400000US36085025100']
>>> df.loc[df.GEO_ID.isin(lst), ['DP02_0074E', 'DP02_0001E']]
938      3521.0      1787.0
940      1917.0      1012.0
941      4294.0      2133.0
954      3192.0      1566.0

it's clear that for each of these tracts, DP02_0074E > DP02_0001E, so do we have a special calculation rule here that we need to apply?

@EricaMaurer
Copy link

anything labeled a base in the data dictionary will be set to 100 in the percent column.

@SPTKL SPTKL closed this as completed Jan 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants