<a href="https://colab.research.google.com/github/jhellingsdata/RADataHub/blob/main/misc/loops_apis.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Richard Davies** Data Science Masterclass - 2024

In this notebook we download a chart that we like and want to recreate, then looping over a list of countries to create multiple copies using different API-linked datasets.

<br>
<br>

### Preparatory Steps

There are a few add-ons to Python that we import to our session at the start. Run this to prepare your session for what follows.

In [1]:
# 1. PREPARATORY STEPS - ACCESS PACKAGES WE NEED

## // The "requests" package, for opening web sites and retrieving information:
import requests

## // The "json" package, for helping us: make JSON easier to read, converting to JSON from Python data (dictionaries).
import json

## /// Altair. This is a way of visualiting Vega charts in Colab
%pip install altair   # Some packagaes need to be installed to the virtual machine before we can import them into our notebook. We can do this with '!pip install'
import altair as alt

Note: you may need to restart the kernel to use updated packages.


<br>
<br>

## Background: Lists

Lists are a simple datatype. These are written with comma separated values (items) between square brackets. Just like with numbers or strings, we can assign these to a variable using =.

In the code below we have a list of places. We define a variable "locations" and assign our list to this variable.

In [2]:
locations = ["London", "Cardiff", "Belfast"]   # Creating a list of locations

# We have a list of locations, let's print these out
print(locations)

['London', 'Cardiff', 'Belfast']


<br>
<br>

### Printing items from a list

If we want to retrieve individual items in the list, we use "indexing".

**Note:** One rule to remember is that indexing starts at 0. So the array above has positions 0, 1 and 2. Asking for position 3--which would seem to be London--will throw an error.

In [3]:
print(locations[0])
print(locations[2])
print(locations[1])

London
Belfast
Cardiff


<br>
<br>

### Loops

Any time we have repetitive code like that above, we should consider a loop. This is not just to show off. Manually copying code like the above leads to errors, and it is time consuming. Loops make you more accurate, and more efficient.

With the "for" loop we can execute a set of statements, once for each item in a list.

In [4]:
## Here is our first loop:

locations = ["Darlington", "London", "Newport"]

for i in locations:
  print(i)

Darlington
London
Newport


### Format strings in Python

To get the most out of loops, we will want to change strings in each iteration. To do this we can use something called F-strings. You can read about this [here](https://realpython.com/python-f-strings/).

In [5]:
best_team = 'Wales'

print(f"The best rugby team in the world is {best_team}")

The best rugby team in the world is Wales


<br>
<br>

### A loop with F-strings:
We next combine the formatting a loop, in this case to print out a list of claims about football teams.

In [6]:
# Define a list of team names.
teams = ['Man Utd', 'AC Milan', 'Barcelona', 'PSG', 'Bayern', 'River Plate']

# Finally, create a loop where we deal with the teams one by one.
for i in teams:
    print(f"The best team is {i}")

The best team is Man Utd
The best team is AC Milan
The best team is Barcelona
The best team is PSG
The best team is Bayern
The best team is River Plate


## Background: Dictionaries

Dictionaries are another built-in data type. Dictionaries are used to store data values in key:value pairs. They look at lot like the `JSON` structure we've worked with for Vega-lite specs.

In [7]:
## Create an example dictionary, using the dict() constructor:
x = dict(borough="X", city = "London", temperature = 5, country = "England")
print(x)
print('\n')

## This is the same as 
x = {
    "borough": "X",
    "city": "London",
    "temperature": 5,
    "country": "England"
}

{'borough': 'X', 'city': 'London', 'temperature': 5, 'country': 'England'}




<br>
<br>

### Loop + Dictionary

Now we can combine a loop and a dictionary to make multiple different, but similar, dictionaries.

*(To see why this is useful, recall from before that ALL we need to make a different chart is a different data source, and that this is just a value in a JSON object.)*


In [10]:

boroughs = ["Westminster", "Camden", "Southwark"]

## Now loop over the counties, printing each one.
for i in boroughs:
  print(i)


## Now loop over the counties, printing each one, and calculating their length, and printing this out
for i in boroughs:
  print(i)
  y = len(i)
  print(y)

## Now loop over the counties, printing each one.
for i in boroughs:
  x['borough'] = i
  print(x)



Westminster
Camden
Southwark
Westminster
11
Camden
6
Southwark
9
{'borough': 'Westminster', 'city': 'London', 'temperature': 5, 'country': 'England'}
{'borough': 'Camden', 'city': 'London', 'temperature': 5, 'country': 'England'}
{'borough': 'Southwark', 'city': 'London', 'temperature': 5, 'country': 'England'}


# Building charts with loops

<br>
<br>

### Access a chart specification that I like

Suppose that you see a chart you like on the library page of our website. https://rdeconomist.github.io/library.

Here is a spec that we might want to use:
https://github.com/RDeconomist/RDeconomist.github.io/blob/main/charts/library/chartLine0.json.

Lets first get that onto our machines, and edit it.


In [11]:
# 2.  ACCESSING AND EXAMINING MY CHART SPEC:

## // Define my target URL (note that this is the RAW file)
url = "https://raw.githubusercontent.com/RDeconomist/RDeconomist.github.io/main/charts/library/chartLine0.json"

## // Get this
chartSpec = requests.get(url).json()

## // Now let's print it out, two different ways:

## // First, just the data (no formatting)
print(chartSpec)
print('\n')

## // Convert to json [using json.dumps()] then print with formatting
print(json.dumps(chartSpec, indent=4))

{'$schema': 'https://vega.github.io/schema/vega-lite/v5.json', 'data': {'url': 'https://raw.githubusercontent.com/RDeconomist/RDeconomist.github.io/main/charts/library/ukProdOutPerWork.csv'}, 'title': {'text': 'UK Productivity 1960-2023'}, 'width': 300, 'height': 300, 'mark': {'type': 'line', 'color': 'red'}, 'encoding': {'x': {'field': 'Year', 'type': 'temporal'}, 'y': {'field': 'outputPerWorker', 'type': 'quantitative'}}}


{
    "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
    "data": {
        "url": "https://raw.githubusercontent.com/RDeconomist/RDeconomist.github.io/main/charts/library/ukProdOutPerWork.csv"
    },
    "title": {
        "text": "UK Productivity 1960-2023"
    },
    "width": 300,
    "height": 300,
    "mark": {
        "type": "line",
        "color": "red"
    },
    "encoding": {
        "x": {
            "field": "Year",
            "type": "temporal"
        },
        "y": {
            "field": "outputPerWorker",
            "type": "quan

<br>
<br>

### Editing the specification of a chart. Python Dictionaries

Next, note that I can edit parts of a chart spec in Python. Following the steps that we have taken about, the variable we have is a Python "dictionary". Once dictionaries have been created we can edit them as we please. You can read about Python dictionaries [here](https://www.w3schools.com/python/python_dictionaries.asp).

In [12]:
chartSpec['data']

{'url': 'https://raw.githubusercontent.com/RDeconomist/RDeconomist.github.io/main/charts/library/ukProdOutPerWork.csv'}

In [13]:
chartSpec['description'] = 'This is a line chart showing UK productivity'

In [14]:
## Print the width of the chart:
print(chartSpec["width"])

## Change the width of the chart to 500
chartSpec["width"] = 1000

## Print the title of the chart:
print(chartSpec["title"]["text"])

## Change the title of the chart:
chartSpec["title"]["text"] = "I like Data"

## Print out our new Spec:
print(json.dumps(chartSpec, indent=2))

300
UK Productivity 1960-2023
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": "https://raw.githubusercontent.com/RDeconomist/RDeconomist.github.io/main/charts/library/ukProdOutPerWork.csv"
  },
  "title": {
    "text": "I like Data"
  },
  "width": 1000,
  "height": 300,
  "mark": {
    "type": "line",
    "color": "red"
  },
  "encoding": {
    "x": {
      "field": "Year",
      "type": "temporal"
    },
    "y": {
      "field": "outputPerWorker",
      "type": "quantitative"
    }
  },
  "description": "This is a line chart showing UK productivity"
}


<br>
<br>

### API + Loop + Dictionary

Combining three of the tools we have learned. In the code below there are two main steps.
1. Prepare for the loop, by creatings a kind of "shell" dictionary (a simi complete chart spec) that needs some more information (the data source).
2. Loop over the list of countries, creating an API link, inserting this into a chart spec, and visualise this.

In [16]:
### PREPARING OUR BASE SPEC

# // Get out base spec (as above)
url = "https://raw.githubusercontent.com/RDeconomist/RDeconomist.github.io/main/charts/library/chartLine0.json"
base_spec = requests.get(url).json()

# // Now since all of our work is going to be on unemployment, we need to change the base spec:
base_spec['title']['text'] = "Unemployment"
base_spec['encoding']['x']['field'] = 'date'
base_spec['encoding']['y']['field'] = 'value'
base_spec['data']['url'] = 'XYZ'

# // Print out our new Spec:
print(json.dumps(base_spec, indent=2))


{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": "XYZ"
  },
  "title": {
    "text": "Unemployment"
  },
  "width": 300,
  "height": 300,
  "mark": {
    "type": "line",
    "color": "red"
  },
  "encoding": {
    "x": {
      "field": "date",
      "type": "temporal"
    },
    "y": {
      "field": "value",
      "type": "quantitative"
    }
  }
}


With that preparation in place we can run our loop:

In [17]:
### RUNNING OUR LOOP

# // Define our base url with the {} placeholder for the country code.
base_api = 'https://api.economicsobservatory.com/{}/unem?vega'

# // Create a list of countries we want to get data for:
countries = ['gbr', 'usa', 'can', 'egy']

for i in countries:
  
  ## Build the api that we want to use:
  apiToUse = base_api.format(i)
  # print(apiToUse)

  ## Now build the chart spec:
  base_spec['data']['url'] = apiToUse
  base_spec['title']['subtitle'] = i

  # /// Turn the spec into JSON
  specJSON = json.dumps(base_spec)

  # /// Turn the json into an Altair chart and display it:
  new_chart = alt.Chart.from_json(specJSON)
  new_chart.display()
