Add chart revision suggester and enhance World Bank WDI bulk import #5

bnjmacdonald · 2021-06-10T03:39:53Z

Features and enhancements

Adds a ChartRevisionSuggester class for suggesting chart revisions after a dataset bulk import has been executed.
Refactors and enhances old World Bank World Development Indicator (WDI) code and adds functionality for suggesting chart revisions following the bulk dataset import.
Adds bulk import CSV files for World Bank World Development Indicator (WDI) 2021.05.25 version.

Breaking changes

standard_importer/import_dataset.py: now intended for use as an imported module, rather than as a standalone script to be executed from the command line. Also includes minor changes to expected column names in input CSV files.
.env.example: a USER_ID must now be specified in your .env file.
db.py: previous usage for connecting to MySQL:

from db import connection

New usage:

from db import get_connection
connection = get_connection()

Note: "wdi" = "World Development Indicators"

Fixes a bug where the dataset namespace was constructed from the whole current working directory (e.g. "Users/.../importers/worldbank_wdi") instead of just "worldbank_wdi".

Removes config and output files such as "variable_replacemennts.json" and "charts_to_update.json" that contain hard-coded SQL db ids. The problem with hard-coding these db ids is that the files may havve been constructed from a SQL db instance that was not up to date with the production db.

…dard_importer` + other refactoring - Moves suggested revision upsert code to `standard_importer.chart_revision_suggester` - Refactors `worldbank_wdi` folder to store all generated json/csv files (e.g. `variable_replacements.json) in the `output` folder instead of `config` folder, while storing manually constructed json/csv files (e.g. `standardized_entity_names.csv`) in the`config` folder. - Refactors `db` to return a `get_connection` method instead of an active SQL connection, so that it is easier to create and close multiple connections in a single module as needed.

…factored code

Fixes errors raised in `worldbank_wdi/init_variables.py` and `worldbank_wdi/match_variables.py` when there are multiple versions of an old variable.

Removes "ON DUPLICATE KEY UPDATE..." from ChartRevisionSuggester.upsert b/c it updates suggested chart revisions that may have already been approved/rejected, which is undesired behavior.

larsyencken

Nice work! Didn't find any actual errors, but gave a bunch of stylistic feedback to suggest ways the Python and Pandas could be more idiomatic. Hope it's helpful!

standard_importer/chart_revision_suggester.py

worldbank_wdi/init_variables_to_clean.py

worldbank_wdi/match_variables.py

README.md

standard_importer/chart_revision_suggester.py

danielgavrilov · 2021-06-16T14:12:23Z

README.md

+1. Download the data.
+   - Example: [worldbank_wdi/download.py](worldbank_wdi/download.py).
+
+2. Specify which variables in the dataset are to be cleaned and imported into the database.


Sorry I haven't read this in detail, so totally uninformed question: Does this mean we are not importing all variables available in the WDI dataset? And if yes, is it possible to easily import others later?

Asking because as I understand, authors can decide to use any WDI variable at any point, it's likely not the case that the ones in the database are the only ones we want to use.

That's correct – the variables stored in variables_to_clean.json at the end of step 2 are a subset of all variables in the dataset.

In the case of the WDI bulk import in this PR, I've written init_variables_to_clean.py (which constructs variables_to_clean.json) so as to only keep the WDI variables that have been used in at least 1 chart.

My thinking is that this is probably a good rule of thumb for effectively keeping db clutter to a minimum while still providing authors with >90% of the variables they are ever going to use. But more discussion is certainly needed here, and it would be easy enough to alter init_variables_to_clean.py to include more variables.

bnjmacdonald · 2021-06-17T01:40:10Z

Thanks everyone! I'll make the requested changes before the end of the week.

bnjmacdonald added 14 commits June 9, 2021 20:03

wdi: add data cleaning script for 2021.03 dataset

cb456c1

Note: "wdi" = "World Development Indicators"

wdi: add chart update scripts for 2021.03 dataset

e227507

bugfix: fix issue with long dataset namespace

2de157f

Fixes a bug where the dataset namespace was constructed from the whole current working directory (e.g. "Users/.../importers/worldbank_wdi") instead of just "worldbank_wdi".

fix(wdi): attempt xls download if csv download fails

4583a27

data(worldbank_wdi): update WDI dataset to version 2021.05.25

d54b583

fix(standard_importer): update delete_dataset to work with newly re…

90f0fb6

…factored code

docs(readme): add more detail

ad42394

docs(standard_importer): minor doc improvements

231f639

fix(worldbank_wdi): fix bug with reading xlsx via pandas

bc7d17f

fix(worldbank_wdi): fix error with >1 old variable version

80b22b4

Fixes errors raised in `worldbank_wdi/init_variables.py` and `worldbank_wdi/match_variables.py` when there are multiple versions of an old variable.

fix(chart_revision_suggester): remove on duplicate key update

0699d38

Removes "ON DUPLICATE KEY UPDATE..." from ChartRevisionSuggester.upsert b/c it updates suggested chart revisions that may have already been approved/rejected, which is undesired behavior.

refactor(chart_revision_suggester): minor refactoring

0ecf4ff

larsyencken reviewed Jun 14, 2021

View reviewed changes

spoonerf reviewed Jun 14, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

bnjmacdonald mentioned this pull request Jun 14, 2021

Add first version of chart approval tool to admin UI owid/owid-grapher#937

Merged

spoonerf reviewed Jun 16, 2021

View reviewed changes

standard_importer/chart_revision_suggester.py Show resolved Hide resolved

danielgavrilov reviewed Jun 16, 2021

View reviewed changes

bnjmacdonald added 4 commits June 18, 2021 16:20

style(chart_revision_suggester): add suggested changes

a349de4

style(chart_revision_suggester): autoformat with black

6f5e878

style(import_dataset): autoformat with black

14c28f7

style(worldbank_wdi): autoformat with black

b3102d1

bnjmacdonald merged commit 9a69724 into master Jun 18, 2021

bnjmacdonald deleted the feature/wdi-bulk-import branch June 18, 2021 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chart revision suggester and enhance World Bank WDI bulk import #5

Add chart revision suggester and enhance World Bank WDI bulk import #5

bnjmacdonald commented Jun 10, 2021 •

edited

Loading

larsyencken left a comment

danielgavrilov Jun 16, 2021

bnjmacdonald Jun 17, 2021

bnjmacdonald commented Jun 17, 2021

Add chart revision suggester and enhance World Bank WDI bulk import #5

Add chart revision suggester and enhance World Bank WDI bulk import #5

Conversation

bnjmacdonald commented Jun 10, 2021 • edited Loading

Features and enhancements

Breaking changes

larsyencken left a comment

Choose a reason for hiding this comment

danielgavrilov Jun 16, 2021

Choose a reason for hiding this comment

bnjmacdonald Jun 17, 2021

Choose a reason for hiding this comment

bnjmacdonald commented Jun 17, 2021

bnjmacdonald commented Jun 10, 2021 •

edited

Loading