Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are FLPDYR+h_seq+ffpos values not unique in cps.csv file? #1658

Closed
martinholmer opened this issue Nov 11, 2017 · 33 comments
Closed

Why are FLPDYR+h_seq+ffpos values not unique in cps.csv file? #1658

martinholmer opened this issue Nov 11, 2017 · 33 comments

Comments

@martinholmer
Copy link
Collaborator

Merged pull request #1635 added two new variables to the cps.csv file. Here is what was said in the taxdata repo about these two new variables:

This PR completes a request from @MattHJensen to add the household sequence and the unique family identifier variables, h_seq and ffpos, respectively, to the CPS file. They are names in accordance with the NBER documentation. Using the two in combination, you can identify individual families in the CPS.

For example, you may have a household with h_seq == 1 that contains two families. ffpos for the two families will be 1 and 2.

A corresponding Tax-Calculator PR [#1635] is forthcoming.

Combining the h_seq and ffpos values for each family will produce a unique identifier within a CPS sample. However, the cps.csv sample contains records from three different CPS samples (for different years). So, without knowing which of the three CPS samples each record is from, it is impossible to match a record in the cps.csv file with its complete information from a Census CPS file. Wasn't the whole idea behind adding these variables to allow users to get extra information from the Census CPS file to exact match with the records in the cps.csv file? If so, then it would seem as if they cannot do that.

The following tabulations illustrate the problem:

iMac:tab mrh$ conda install -c ospc taxcalc
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /Users/mrh/anaconda2:

The following NEW packages will be INSTALLED:

    taxcalc: 0.13.1-py27_0 ospc

Proceed ([y]/n)? y

taxcalc-0.13.1 100% |################################| Time: 0:00:04   9.07 MB/s

iMac:tab mrh$ tc cps.csv 2014 --sqldb
You loaded data for 2014.
Tax-Calculator startup automatically extrapolated your data to 2014.

iMac:tab mrh$ ls -l
total 512232
-rw-r--r--  1 mrh  staff        205 Nov 11 09:06 cps-14-#-#-doc.text
-rw-r--r--  1 mrh  staff   17681490 Nov 11 09:06 cps-14-#-#.csv
-rw-r--r--  1 mrh  staff  244576256 Nov 11 09:06 cps-14-#-#.db

iMac:tab mrh$ sqlite3 cps-14-#-#.db 
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
sqlite> select count(*) from dump;
456465
sqlite> select min(h_seq),max(h_seq) from dump;
1|99461
sqlite> select min(ffpos),max(ffpos) from dump;
1|11
sqlite> select x,count(*)
          from (select h_seq*1000+ffpos as x from dump)
	  group by x having count(*)>1
	  order by x limit 9;
2001|3
2002|2
3001|2
5001|4
8001|2
9001|2
10001|3
11001|2
13001|2
sqlite> select id,RECID
          from (select h_seq*1000+ffpos as id, RECID from dump)
	  where id<10000 order by id;
1001|178879
1002|178880
2001|1
2001|178881
2001|178882
2002|2
2002|178883
3001|3
3001|292368
5001|4
5001|178884
5001|292369
5001|292370
6001|5
7001|292371
8001|6
8001|178885
9001|7
9001|292372
sqlite> .q

iMac:tab mrh$

So, for example, the three records with h_seq=2 and ffpos=1 have unique RECID values, but that doesn't help a user figure out which Census CPS file those three cps.csv records were drawn from.

@MattHJensen @Amy-Xu @andersonfrailey @hdoupe @GoFroggyRun

@martinholmer martinholmer changed the title CPS h_seq variable is not unique in cps.csv file CPS h_seq+ffpos values are not unique in cps.csv file Nov 11, 2017
@martinholmer
Copy link
Collaborator Author

Another question about the new h_seq and ffpos variables in the cps.csv file:

Assuming that h_seq+ffpos is unique within each CPS sample, 
wouldn't a user expect there to be no more than three records 
with the same h_seq+ffpos value in the cps.csv file?

Maybe I'm confused, but that was what I was expecting.
So the following tabulation made we wonder about what's going on.

iMac:junk mrh$ sqlite3 cps-14-#-#.db 
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
sqlite> select id,count(*)
          from (select h_seq*1000+ffpos as id from dump)
	  group by id having count(*)>40
	  order by id;
6809001|45
8449001|45
14597001|46
16499001|45
23946001|46
24486001|47
35717001|46
37989001|45
41250001|45
57442001|60
77610001|45
93498001|46
95198001|45
97102001|45
97439001|47
sqlite> .q

iMac:junk mrh$ 

Why is it that there are 60 records in the cps.csv file that have h_seq=57442 and ffpos=1?
Am I doing the tabulations wrong?
Or is there a reason there can be sixty records with the same h_seq+ffpos value?

@MattHJensen @Amy-Xu @andersonfrailey @hdoupe @GoFroggyRun

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Nov 13, 2017

Don't we know based on FLPDYR in the original unaged cps.csv which CPS sample each is from? That is, it's from the FLPDYR + 1 ASEC?

To aid with CPS merging, I created a simple database with just RECID, h_seq, ffpos, and a new variable called yearcps equal to FLPDYR + 1.

@martinholmer
Copy link
Collaborator Author

@evtedeschi3 said:

Don't we know based on FLPDYR in the original unaged cps.csv which CPS sample each is from? That is, it's from the FLPDYR + 1 ASEC?

Yes, good point. But any Tax-Calculator output file will have all the filing units having the same FLPDYR value, which is set to tax analysis year. This is why you did the following with the cps.csv input file:

To aid with CPS merging, I created a simple database with just RECID, h_seq, ffpos, and a new variable called yearcps equal to FLPDYR + 1.

But when I combine FLPDYR with h_seq and ffpos, things are better but still not good enough to exact match other CPS variables to the cps.csv file. After converting the cps.csv file into an SQLite3 database, I get these results:

$ sqlite3 cps.db
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
sqlite> select count(*) from data;                                       
456465
sqlite> select id,count(*) from (select FLPDYR*1000000000+h_seq*1000+ffpos as id from data) group by id having count(*)>13 order by id limit 9;
2012000115001|15
2012000153001|16
2012000155001|15
2012000181001|15
2012000260001|15
2012000285001|15
2012000318001|15
2012000322001|15
2012000369001|15

Notice the limit 9; the number of FLPDYR/h_seq/ffpos values that are assigned to more the 13 filing units is far larger than 9.

Again, maybe I'm doing the calculations wrong, but it seems as if FLPDYR/h_seq/ffpos values are not unique.

Have you checked your simple database to see if the FLPDYR/h_seq/ffpos values are unique?

@andersonfrailey
Copy link
Collaborator

Thanks for pointing this out @martinholmer. @evtedeschi3 is correct that you would also need to account for the year of the CPS in the tabulations, which it appears you do in your most recent comment.

For my own clarification, when you converted cps.csv to an SQLite database and produced the numbers above, that is still the input file correct?

It's possible that some families may have been split into multiple tax units (think dependent filers particularly), in which case there would be multiple tax units with identical h_seq, ffpos, and FLPDYR, though I wouldn't expect it to happen as frequently as it appears to based on your tabulations.

I will also read through the CPS documentation to see if there are any identification variables which would be better for what we're trying to use h_seq and ffpos for.

@Amy-Xu
Copy link
Member

Amy-Xu commented Nov 13, 2017

Just to add one quick thought -- imputations to remove top-coding could be a source for h_seq + ffpos duplicates.

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Nov 13, 2017 via email

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Nov 13, 2017 via email

@martinholmer martinholmer changed the title CPS h_seq+ffpos values are not unique in cps.csv file FLPDYR+h_seq+ffpos values are not unique in cps.csv file Nov 13, 2017
@martinholmer
Copy link
Collaborator Author

@andersonfrailey, Here is how I converted cps.csv into cps.db:

import sqlite3
import pandas as pd

CSV_INPUT_FILENAME = 'cps.csv'
SQLDB_OUT_FILENAME = 'cps.db'

dframe = pd.read_csv(CSV_INPUT_FILENAME)
dbcon = sqlite3.connect(SQLDB_OUT_FILENAME)
dframe.to_sql('data', dbcon, if_exists='replace', index=False)
dbcon.close()

@martinholmer
Copy link
Collaborator Author

Thanks for all the comments on #1658. I can see that, for several reasons, CPS families are split into separate tax filing units. But consider the following tabulation and my question below the tabulation results:

$ sqlite3 cps.db
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
sqlite> select count(*) from data;
456465
sqlite> select id,count(*) from (select FLPDYR*1000000000+h_seq*1000+ffpos as id from data) group by id having count(*)>30 order by id;
2012054455001|31
2012088531001|31
2013001435001|31
2013030759001|31
2013053994001|31
2013084195001|31
2013095063001|31
2014020301001|31
2014024797001|31
2014025287001|32
2014028517001|31
2014034002001|31
2014036978001|32
2014044160001|32
2014057442001|45
2014060481001|31
2014072395001|33
2014080568001|31
2014089284001|32
sqlite> .q
$ 

What kind of family is split into more than thirty filing units?

@MattHJensen @Amy-Xu @andersonfrailey @hdoupe @evtedeschi3

@ernietedeschi
Copy link
Contributor

So I've highlighted all 30 of the tax units created from one such family (this is extrapolated to 2026).

That's... a lot of very rich young people.

screen shot 2017-11-13 at 1 15 01 pm

@martinholmer
Copy link
Collaborator Author

@evtedeschi3, Thanks for the extra information in issue #1658.
The thirty filing units you highlighted are all from the March 2013 CPS record with h_seq=14597 and ffpos=1 as can be seen in the following tabulation:

$ sqlite3 cps.db
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
sqlite> select count(*) from data;
456465
sqlite> select count(*) from data where RECID>=317095 and RECID<=317124; 
30
sqlite> select RECID,FLPDYR,h_seq,ffpos from data where RECID>=317095 and RECID<=317124;
317095|2012|14597|1
317096|2012|14597|1
317097|2012|14597|1
317098|2012|14597|1
317099|2012|14597|1
317100|2012|14597|1
317101|2012|14597|1
317102|2012|14597|1
317103|2012|14597|1
317104|2012|14597|1
317105|2012|14597|1
317106|2012|14597|1
317107|2012|14597|1
317108|2012|14597|1
317109|2012|14597|1
317110|2012|14597|1
317111|2012|14597|1
317112|2012|14597|1
317113|2012|14597|1
317114|2012|14597|1
317115|2012|14597|1
317116|2012|14597|1
317117|2012|14597|1
317118|2012|14597|1
317119|2012|14597|1
317120|2012|14597|1
317121|2012|14597|1
317122|2012|14597|1
317123|2012|14597|1
317124|2012|14597|1
sqlite> .q
$ 

The characteristics of household, as you point out, are very unusual. First of all, there are 45 people (some single and some married) living at the same address and they are all considered by Census to be in the same family. And, as you point out, most have very high incomes. So, is this a mistake in the preparation of the cps.csv file or has Census actually sampled a rich commune? Or maybe there is a different explanation.

@MattHJensen @Amy-Xu @andersonfrailey @hdoupe

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Nov 13, 2017 via email

@Amy-Xu
Copy link
Member

Amy-Xu commented Nov 13, 2017

This might relate to the top-coding imputation I mentioned earlier. The algorithm is documented like so:

screen shot 2017-11-13 at 1 46 26 pm

This top-coding imputation is implemented for records with high income.

Source: http://www.quantria.com/assets/img/TechnicalDocumentationV4-2.pdf

@andersonfrailey
Copy link
Collaborator

andersonfrailey commented Nov 13, 2017

@Amy-Xu beat me to it. We also have that noted in our documentation for the CPS file.

The one change between what we do and what she posted is we repeat the process 15 times rather than 10.

@ernietedeschi
Copy link
Contributor

ernietedeschi commented Nov 13, 2017 via email

@Amy-Xu
Copy link
Member

Amy-Xu commented Nov 13, 2017

@evtedeschi3 Why do you think we have families of 30+ people? Looking at this post by Martin and the 15 highlighted by Anderson, I'm guessing most likely we have a dozen of high-income families of 3-4 people.

@martinholmer
Copy link
Collaborator Author

@andersonfrailey pointed to the taxdata documentation of the cps.csv file which says this:

Top-Coding

Certain income items in the CPS are top-coded to protect the identity of the individuals surveyed. Each tax unit with an individual who has been top-coded is flagged for further processing to account for this.

For the records that have been flagged, the top-coded value is replaced with a random variable whose mean is the top-coded amount. This process is repeated 15 times per flagged record, resulting in the record being represented 15 times in the production file with a different value for the top-coded fields in each and their weight divided by 15.

Thanks for the this helpful information. I still have two questions:

(1) The above documentation explains why there are duplicate cpsyear+h_sql+ffpos values in multiples of 15, 30 and 45.

What's the story for the many cps.csv filing units like these?

sqlite> select id,count(*) from (select FLPDYR*1000000000+h_seq*1000+ffpos as id from data) group by id having (count(*)>1 and count(*)!=15 and count(*)!=30 and count(*)!=45) order by id limit 9;
2012000005001|2
2012000010001|2
2012000017001|2
2012000021001|3
2012000031001|2
2012000059001|3
2012000067001|2
2012000077001|3
2012000079001|2

There's no top-coding, so what explains the fact that several filing units have the same cpsyear+h_sql+ffpos value?

(2) This top-coding documentation raises a completely different issue. A variable being top-coded means the actual value of that variable is larger than the top-coded amount, right? If so, then when the top-coded value is replaced with a random variable whose mean is the top-coded amount, many of the randomly assigned values will be less than the top-coded amount. That is an enormous bias, especially when the reform everybody is interested in right now contains major changes in the tax treatment of high-income filers.

@MattHJensen @Amy-Xu @hdoupe @codykallen @evtedeschi3

@ernietedeschi
Copy link
Contributor

@martinholmer wrote

(2) This top-coding documentation raises a completely different issue. A variable being top-coded means the actual value of that variable is larger than the top-coded amount, right? If so, then when the top-coded value is replaced with a random variable whose mean is the top-coded amount, many of the randomly assigned values will be less than the top-coded amount. That is an enormous bias, especially when the reform everybody is interested in right now contains major changes in the tax treatment of high-income filers.

My impression is that the ASEC has not been strictly "top-coded" post-2011. Rather, the top-codes really act more as thresholds. Above them, values are swapped with other super-top-coded values to protect identities.

IPUMS has a very useful description of how this changed over time:

Starting in 2011, the Census Bureau shifted from the average replacement value system to a rank proximity swapping procedure. In this technique, all values greater than or equal to the income topcode are ranked from lowest to highest and systematically swapped with other values within a bounded interval. All swapped values are also rounded to two significant digits.

@martinholmer martinholmer added question and removed bug labels Nov 14, 2017
@martinholmer
Copy link
Collaborator Author

Thanks, @Amy-Xu, for citing John's documentation of the CPS tax file. The discussion before the passage you quoted describes the Census CPS top-coding procedure, as @evtedeschi3 did in his helpful comment. That information has clearer up several misconceptions I had about how Census handles high income amounts.

But I have a couple of remaining questions about how the cps.csv file handles top-coded amounts.
Here is what John says:

Top coding in this way [by Census] means that aggregate control totals derived from the CPS are
meaningful in that they represent an estimate of the population total. However, in order to
make the data suitable for tax analysis, we remove the top coding for each record and
create “replicate” tax units for each top coded record according to the following
algorithm:

Question (1): Exactly why do we need to replace the top-coded amount with 15 replicates? Exactly why are the Census top-coded amounts not "suitable for tax analysis"?

John continues:

  1. We search each of the income fields in the tax unit and identify those amounts
    that have been top coded and the tax unit is “flagged” for subsequent processing
    to remove the top coding.
  2. We replace each top coded amount with a lognormal random variable whose
    mean is the top coded amount and whose variance is initially set to equal the
    standard error of the estimate (SEE) of the regression equation. We then
    multiply the SEE by a factor to more closely match published SOI totals of the
    income distribution at the upper end of the income scale.

Question (2): What "regression equation" is being referred to in the second item?
I looked but didn't find the one being referred to here. Did I miss it?

@MattHJensen @andersonfrailey

@martinholmer
Copy link
Collaborator Author

Now that I understand that 15 (or 30 or 45) replicates have been used to replace a single CPS family with high income, the range of my questions is much narrower.

What's the story behind the many cps.csv filing units like these?

sqlite> select id,count(*) 
          from (select FLPDYR*1000000000+h_seq*1000+ffpos as id from data) 
          group by id having (count(*)>1 and count(*)<15) order by id
          limit 9;
2012000005001|2
2012000010001|2
2012000017001|2
2012000021001|3
2012000031001|2
2012000059001|3
2012000067001|2
2012000077001|3
2012000079001|2

There's no top-coding (because there are only two or three duplicates among the first nine of many), so what explains the fact that these cps.csv filing units have the same cpsyear+h_sql+ffpos value?

@Amy-Xu
Copy link
Member

Amy-Xu commented Nov 14, 2017

@martinholmer my intuition is that these 2-3 replicates are from families with more than one filing units, but not necessarily high-income. Say a 20-year-old daughter has a part-time job and might just file a separate return from her parents, while all three of them just 'normal' income people.

@Amy-Xu
Copy link
Member

Amy-Xu commented Nov 14, 2017

@martinholmer also asked:

Question (2): What "regression equation" is being referred to in the second item?
I looked but didn't find the one being referred to here. Did I miss it?

My memory is a bit rusty, but I roughly remember that those averages are predictions from regressions of income on gender, race and work experience. If you take a look at 2010 doc chart #2, it gives the predictions by cell. And the standard error most likely refers to the error of predictions.

If that's true, then it seems there's another problem we have here. Census only use this top coding method for CPS ASEC prior to 2011, which has nothing to do our current file that includes 2013-2015 CPS. The old top coding method isn't good for tax analysis (I guess) because distributions are collapsed into averages. Restoring the distribution was a doable and sensible option. But now the top coding has been revised to a swapping method. I don't know whether restoring the distribution through the old method still makes sense.

@martinholmer
Copy link
Collaborator Author

@Amy-Xu said in issue #1658:

my intuition is that these 2-3 replicates are from families with more than one filing unit, but not necessarily high-income. Say a 20-year-old daughter has a part-time job and might just file a separate return from her parents, while all three of them just 'normal' income people.

Thanks for the explanation, which seems perfectly sensible.

@martinholmer
Copy link
Collaborator Author

@Amy-Xu said in issue #1658:

My memory is a bit rusty, but I roughly remember that those averages are predictions from regressions of income on gender, race and work experience. If you take a look at 2010 doc Chart 2, it gives the predictions by cell. And the standard error most likely refers to the error of predictions.

Chart 2 simply shows the average value above the top-coding threshold. There's no regression results in either Chart 1 or Chart 2, as you can see from this reproduction of those tables:

screen shot 2017-11-14 at 7 24 51 pm

@martinholmer
Copy link
Collaborator Author

@Amy-Xu said in issue #1658:

[I]t seems there's another problem we have here. Census only use this top coding method for CPS ASEC prior to 2011, which has nothing to do our current file that includes 2013-2015 CPS. The old top coding method isn't good for tax analysis (I guess) because distributions [above the top-coding thresholds] are collapsed into averages. Restoring the distribution was a doable and sensible option. But now the top coding has been revised to a swapping method. I don't know whether restoring the distribution through the old method still makes sense.

That why I asked why all the replicates were being generated. Unless I'm missing something, there is no reason to go through all this replicates business for our more recent CPS files. And it's worse than just being inefficient, it's wrong. Because the regression imputation scheme used to construct the cps.csv file can often replace a top-coded value, which is, by definition, above the top-coding threshold, with a value that is below the top-coding threshold.

I'm starting to wonder if this is a contributing factor to the understatement of income tax revenue when using the cps.csv file as Tax-Calculator input.

@MattHJensen @andersonfrailey @hdoupe @codykallen @evtedeschi3

@MattHJensen
Copy link
Contributor

@martinholmer said:

I'm starting to wonder if this is a contributing factor to the understatement of income tax revenue when using the cps.csv file as Tax-Calculator input.

That seems possible. Should a new issue be opened to describe the change that should be made to the file prep?

@Amy-Xu
Copy link
Member

Amy-Xu commented Nov 15, 2017

Martin said:

there is no reason to go through all this replicates business for our more recent CPS files.

I agree that it seems we should turn off this imputation for top-coding removal, given that the top coding method has been updated in more recent CPS ASEC files.

@martinholmer
Copy link
Collaborator Author

@MattHJensen said:

I'm starting to wonder if this is a contributing factor to the understatement of income tax revenue when using the cps.csv file as Tax-Calculator input.

That seems possible. Should a new issue be opened to describe the change that should be made to the file prep?

I'm not sure where in the sequence of processing the three raw CPS files into the cps.csv file that the 15 replicates replace a single top-coded observation. @andersonfrailey, where in the sequence of scripts you list in this README.md file are the top-coded replicates added? And more to the point, what are your views on the thoughts first expressed by @Amy-Xu that the whole top-coded replicates business is unnecessary for our three (post-2011) CPS files?

@martinholmer martinholmer changed the title FLPDYR+h_seq+ffpos values are not unique in cps.csv file Why are FLPDYR+h_seq+ffpos values not unique in cps.csv file Nov 16, 2017
@andersonfrailey
Copy link
Collaborator

@martinholmer asked:

where in the sequence of scripts you list in this README.md file are the top-coded replicates added?

This occurs in the TopCodingV1.sas scripts. This is after tax units have been created and adjustments are being made to the final file.

And

what are your views on the thoughts first expressed by @Amy-Xu that the whole top-coded replicates business is unnecessary for our three (post-2011) CPS files?

I talked with John about this a few months ago. His opinion was that there wasn't a huge need to revise/remove the top coding scripts, though he hadn't run the numbers to verify this.

I'm in favor of at least comparing the results that we would get from removing this part of the creation process. There are two problems preventing this from happening immediately though. First, our SAS license has expired and I just checked and the general AEI one has as well. Second, John hasn't sent me the script to recreate the weights file yet and I haven't completed my work on a python version yet so we can't create a new weights file at this time.

@martinholmer martinholmer changed the title Why are FLPDYR+h_seq+ffpos values not unique in cps.csv file Why are FLPDYR+h_seq+ffpos values not unique in cps.csv file? Nov 16, 2017
@martinholmer
Copy link
Collaborator Author

@andersonfrailey said almost thee week ago in Tax-Calculator issue #1658:

I talked with John about this a few months ago. His opinion was that there wasn't a huge need to revise/remove the top coding scripts, though he hadn't run the numbers to verify this.

I'm in favor of at least comparing the results that we would get from removing this part of the creation process. There are two problems preventing this from happening immediately though. First, our SAS license has expired and I just checked and the general AEI one has as well. Second, John hasn't sent me the script to recreate the weights file yet and I haven't completed my work on a python version yet so we can't create a new weights file at this time.

@andersonfrailey, Is this a formal issue in the taxdata repository?
If not, should you create one?
If so, should we close this issue (and have the taxdata issue point to it for the historical record)?

@martinholmer
Copy link
Collaborator Author

@martinholmer asked:

@andersonfrailey said almost thee week ago in Tax-Calculator issue #1658:

I talked with John about this a few months ago. His opinion was that there wasn't a huge need to revise/remove the top coding scripts, though he hadn't run the numbers to verify this.

I'm in favor of at least comparing the results that we would get from removing this part of the creation process. There are two problems preventing this from happening immediately though. First, our SAS license has expired and I just checked and the general AEI one has as well. Second, John hasn't sent me the script to recreate the weights file yet and I haven't completed my work on a python version yet so we can't create a new weights file at this time.

@andersonfrailey, Is this a formal issue in the taxdata repository?
If not, should you create one?
If so, should we close this issue (and have the taxdata issue point to it for the historical record)?

@andersonfrailey, Is the issue in Tax-Calculator #1658 covered in the PSLmodels/taxdata#125 issue?

@andersonfrailey
Copy link
Collaborator

@martinholmer asked:

Is the issue in Tax-Calculator #1658 covered in the PSLmodels/taxdata#125 issue?

In my opinion it is not. The two are related and the issue in #1658 (removing/revamping our top coding routine) is dependent on TaxData issue #125, but it will not be completely addressed in that issue. I would prefer a separate issue is opened when we begin to work on reviewing our top coding routine.

@martinholmer
Copy link
Collaborator Author

Closing issue #1658 in favor of open taxdata issue 174.
Please continue the discussion of CPS top-coding in 174.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants