Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_imm_resid - data issues #2

Open
jpycroft opened this issue May 6, 2021 · 0 comments
Open

get_imm_resid - data issues #2

jpycroft opened this issue May 6, 2021 · 0 comments

Comments

@jpycroft
Copy link
Owner

jpycroft commented May 6, 2021

@jdebacker @rickecon

As a number of data issues arose when producing the get_imm_resid function within demographics.py, I'm starting this issue to keep a record of them and to allow others to comment on the decisions taken.

The production of net immigration rates, imm_rates, went through the following steps:

  1. Try to use Eurostat immigration rates directly ... but Eurostat no longer publishes by age for the UK (presumably a post-Brexit change). For future reference, other EU countries are there.

  2. Return to the OG-USA style of backing out the immigration rates from the population total, as done in get_imm_resid in demographics.py in OG-USA-Calibration. This works well for most ages, but does not work well for:
    a. age 0, new borns (see point 3).
    b. the oldest ages, especially age 90+ (see point 4).

2  OG-UK_imm_rates-originalOG-USAmethodology_06may2021

  1. Adjust new born values:
    The OG-USA methodology uses fert_rates and applies them to 2015, 2016, 2017 populations to obtain 2016, 2017, 2018. The problem with this is that the fert_rates for 2018 are much lower than the 2015 rates. Therefore, the calculated new borns in 2016 are more than 40,000 below the actual new borns. The standard get_imm_resid allocates this shortfall to net immigration of babies, leading to a net immigration rate of 7%, while the actual rate is closer to 0.7%.
    Instead of approximating, I downloaded the actual numbers of new borns in 2015, 2016 and 2017 from Eurostat. These then become the "newborn" array, from which the imm_rates[0] is calculated.

3  OG-UK_imm_rates-recalculatedNewborns_06may2021

  1. The over 90s:
    The over 90s data is not consistent. The standard methodology suggests that imm_rates for some years over 90 rise to over 5%, hitting 19% for age 99. This is vanishingly unlikely to be accurate. The overall population and mortality numbers are not consistent (even when one downloads the full data for all years). Any errors in the data are amplified by the small denominators, e.g. there are less than 10,000 people aged 99.
    To fix this, I have replaced the over 90s values with the average value for ages 80 to 89. This allows for some continued migration of the over 90s, but by using the data for aged 80 to 89, the errors are smoothed out and the denominators used for the calculation are much larger.

4  OG-UK_imm_rates-recalcNewborns-ave90plus_06may2021

  1. Moving average smoothing:
    The above adjustments lead to a much improved imm_rates. However, there are still a number of spikes in the data, which are unlikely to contain real long-term information. Therefore a simple three-year moving average is applied.

5  OG-UK_imm_rates-recalcNewborns-ave90plus-smooth_06may2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant