API Highways : "2011 PPP and $1.9/day poverty line"

10/4/18 - 14/4/18

This brief write-up is some simple feedback to "API Highways" (re. a Data4DevFest conversation) and a note of where I might go back and retry some (pipeline) things I failed at.

I wanted to see whether it was easy to use an API (instead of csv/etc file) and whether the site would be a good source of datasets to practice on. Etc etc. Chose the 1st dataset on the main page.

I wanted to use R for the analysis. As R code wasn't listed on the dataset's page, this would also be a first attempt to use Python and R in the same pipeline/notebook.

Began with the Extreme-Poverty.Rmd RMarkdown notebook, after installing the reticulate package in my Anaconda setup. Importing the data failed, but solved it by Sys.which("python") and then specifying use_python("/Users/markbeveridge/anaconda3/bin/python") instead.

To reach this point (with testing) I'd also created the Extreme-Poverty.py script and Extreme-Poverty.ipynb Jupyter notebook, both of which worked for the import.

Extreme-Poverty.Rmd continued to fail to pass variables to a 2nd Python chunk (which is what reticulate was supposed to enable). Extreme-Poverty_not-notebook.Rmd (an RMarkdown document) also failed at this, and I haven't solved the issue yet.

Continued instead with Extreme-Poverty.ipynb, I could pass variables between Python cells, but failed to convert the JSON data into a dataframe for pandas (and hopefully then dplyr and ggplot2) to use. This might be due to the 'meta' data at the end ¹, but I haven't solved the issue yet.

Continued instead with the csv version of the dataset (poverty-190.csv), which doesn't have the 'meta' data, and a new RMarkdown notebook (Extreme-Poverty_R-only.Rmd) with only R, not Python as well.

R doesn't like - in column headings, so changed those. Couldn't find an definition/dictionary ² for the 3 numerical fields (initially value-1, value-2, value-3), despite going back through links and documents. It's guessable in this case (% of population), but still don't know why there are 3 separate fields ³, as for any given row they seem to have the same value (or value-3 is blank)

Not a lot of fields to play with, for the effort. There was also an entity.csv file (which the API data didn't have) ⁴, which I JOINED, in order to use the region field. And then did a few quick visualisations, which I enjoyed. (They can be seen in the 'knitted' Extreme-Poverty_R-only.md, and a couple of them are on this page, below.)

Would have been more scope for regional aggregations etc if poverty-190.csv contained population numbers, rather than just fields calculated from them ⁵. The (apparent) original source does have them ...but not an API :)

% of population below USD1.90/day poverty line in 2013, for all 46 countries in the 'south-of-sahara' region. (50% was an arbitrary choice by me) :

The 15 countries at 50% in 2013 (blue dots) : % of population for every year (trend) :

PS. This 'continues', sort of, with a look at the 'same' dataset from a different supplier (on "API Highways" again). And a success in using the API, and Python & R in the same notebook.

'Feedback' links/list/footnotes :

"This might be due to the 'meta' data at the end" ↩
"Couldn't find an definition/dictionary" ↩
"still don't know why there are 3 separate fields" ↩
"entity.csv file (which the API data didn't have)" ↩
"population numbers, rather than just fields calculated from them" ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme-Poverty_R-only_files/figure-markdown_github-ascii_identifiers

Extreme-Poverty_R-only_files/figure-markdown_github-ascii_identifiers

data

data

Extreme-Poverty.Rmd

Extreme-Poverty.Rmd

Extreme-Poverty.ipynb

Extreme-Poverty.ipynb

Extreme-Poverty.py

Extreme-Poverty.py

Extreme-Poverty_R-only.Rmd

Extreme-Poverty_R-only.Rmd

Extreme-Poverty_R-only.md

Extreme-Poverty_R-only.md

README.md

README.md

Repository files navigation

API Highways : "2011 PPP and $1.9/day poverty line"

10/4/18 - 14/4/18

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Extreme-Poverty_R-only_files/figure-markdown_github-ascii_identifiers		Extreme-Poverty_R-only_files/figure-markdown_github-ascii_identifiers
data		data
Extreme-Poverty.Rmd		Extreme-Poverty.Rmd
Extreme-Poverty.ipynb		Extreme-Poverty.ipynb
Extreme-Poverty.py		Extreme-Poverty.py
Extreme-Poverty_R-only.Rmd		Extreme-Poverty_R-only.Rmd
Extreme-Poverty_R-only.md		Extreme-Poverty_R-only.md
README.md		README.md

mbeveridge/SDG-API_Extreme-Poverty

Folders and files

Latest commit

History

Repository files navigation

API Highways : "2011 PPP and $1.9/day poverty line"

10/4/18 - 14/4/18

Footnotes

About

Resources

Stars

Watchers

Forks

Languages