Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 1.38 KB

DATA.md

File metadata and controls

14 lines (9 loc) · 1.38 KB

Data sources and pipeline for major versions > 1

  1. The origin base data database Excel workbook BASE data 2023.xlsx was prepared by Bernadette O'Hare, using data from the WDI and UNU WIDER datasets. Detailed notes on the data sources and preparation are contained in the info sheet of the workbook.

  2. The intermediate base data CSV file Base data 2023.csv was generated from Base data 2023.xlsx data sheet as follows:

  • exporting the data sheet to .csv format;
  • making the spacing in column heading names consistent;
  • renaming the column heading id to countryyearcode in the first column;
  • replacing all instances of #N/A, .. and #VALUE! with NaN using find and replace.
  1. The final base data CSV file Base data 2023 interpolated.csv, which is used to drive the model, was generated from Base data 2023.csv by using linear interpolation implemented this notebook to interpolate undefined (i.e. null or NaN) values for GRPC and all school measures, i.e. in-school measures and school populations.

  2. Finally, the CSV file is packaged as a string in the source.