# Data Collection & Data Formats

Term 1 2019 - Instructor: Teerapong Leelanupab

Teaching Assistant: Suttida Satjasunsern
***

## Downloading Data
The built-in Python *urllib.request* module has functions which help in downloading content from HTTP URLs using minimal code.

In [16]:
import urllib.request
url = "https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/kmitl.txt"
response = urllib.request.urlopen(url)
text = response.read().decode()
# print(text)

In practice, we may often want to wrap code to fetch URLs in a try block, to handle the case where we cannot access the URL.

In [17]:
url = "https://somemissinglink.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/kmitl.txt"
try:
    response = urllib.request.urlopen(url)
    text = response.read().decode()
except:
    print("Failed to retrieve %s" % url)

Failed to retrieve https://somemissinglink.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/kmitl.txt


## Working with CSV

The CSV ("Comma Separated Values") file format is often used to exchange tabular data between different applications, like Excel. Essentially a CSV file is a plain text file where values are split by a comma separator. Alternatively can be tab or space separated. 

We could download a CSV file using *urllib.request* and manually parse it...

In [18]:
# Download the CSV and store as a string
url = "https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/goal_scorers.csv"
response = urllib.request.urlopen(url)
raw_csv = response.read().decode()
# Parse each line
lines = raw_csv.split("\n")
for l in lines:
    l = l.strip()
# why if because there is a "," to split each
    if len(l) > 0:
        # split based on a comma separator
        parts = l.split(",")
        print(parts)

['Player', 'Team', 'Total Goals', 'Penalties', 'Home Goals', 'Away Goals']
['J Vardy', 'Leicester City', '19', '4', '11', '8']
['H Kane', 'Tottenham', '16', '4', '7', '9']
['R Lukaku', 'Everton', '16', '1', '8', '8']
['O Ighalo', 'Watford', '15', '0', '8', '7']
['S Aguero', 'Manchester City', '14', '1', '10', '4']
['R Mahrez', 'Leicester City', '14', '4', '4', '10']
['O Giroud', 'Arsenal', '12', '0', '4', '8']
['D Costa', 'Chelsea', '10', '0', '7', '3']
['J Defoe', 'Sunderland', '10', '0', '3', '7']
['G Wijnaldum', 'Newcastle Utd', '9', '0', '9', '0']
['T Deeney', 'Watford', '8', '5', '2', '6']
['R Barkley', 'Everton', '8', '2', '5', '3']
['A Ayew', 'Swansea City', '8', '0', '5', '3']
['G Sigurdsson', 'Swansea City', '7', '3', '2', '5']
['W Rooney', 'Manchester Utd', '7', '1', '3', '4']
['A Martial', 'Manchester Utd', '7', '0', '4', '3']
['D Alli', 'Tottenham', '7', '0', '1', '6']
['D Payet', 'West Ham Utd', '7', '0', '3', '4']
['M Arnautovic', 'Stoke City', '7', '2', '4', '3']
['Y Tou

But we can also use Pandas to directly download and parse CSV data for us, to create a Data Frame which is ready to analyse.

In [19]:
import pandas as pd
df = pd.read_csv("https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/goal_scorers.csv")
df

Unnamed: 0,Player,Team,Total Goals,Penalties,Home Goals,Away Goals
0,J Vardy,Leicester City,19,4,11,8
1,H Kane,Tottenham,16,4,7,9
2,R Lukaku,Everton,16,1,8,8
3,O Ighalo,Watford,15,0,8,7
4,S Aguero,Manchester City,14,1,10,4
5,R Mahrez,Leicester City,14,4,4,10
6,O Giroud,Arsenal,12,0,4,8
7,D Costa,Chelsea,10,0,7,3
8,J Defoe,Sunderland,10,0,3,7
9,G Wijnaldum,Newcastle Utd,9,0,9,0


## Working with JSON

[JSON](http://json.org/) is a lightweight format which is becoming increasingly popular for online data exchanged. Based originally on the JavaScript language and (relatively) easy for humans to read and write

The built-in module *json* provides an easy way to encode and decode data in JSON in Python.

In [20]:
import json

Let's try downloading and parsing a simple JSON file which contains information about a number of books, originally from librarything.com:

In [21]:
url = "https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/books.json"
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")

In [22]:
print(raw_json)
# this is just the decode that you got

[{
	"book_id": "13585350",
	"title": "The World Treasury of Science Fiction",
	"ISBN": "",
	"year": 1989,
	"rating": 3,
	"language": "eng"
}, {
	"book_id": "124205572",
	"title": "The War of the Worlds",
	"ISBN": "1936594056",
	"year": 2013,
	"rating": 4,
	"language": "eng"
}, {
	"book_id": "127360065",
	"title": "Under the Dome: A Novel",
	"ISBN": "1439149038",
	"year": 2013,
	"rating": 2,
	"language": "eng"
}, {
	"book_id": "13908800",
	"title": "The Ultimate Hitchhiker's Guide to the Galaxy",
	"ISBN": "0345453743",
	"year": 2002,
	"rating": 5,
	"language": "eng"
}, {
	"book_id": "123734934",
	"title": "The Time Traveler's Wife",
	"ISBN": "1476764832",
	"year": 2014,
	"rating": 5,
	"language": "eng"
}, {
	"book_id": "13603020",
	"title": "Salem's Lot",
	"ISBN": "0451098277",
	"year": 1976,
	"rating": 3,
	"language": "eng"
}, {
	"book_id": "124173974",
	"title": "Republic",
	"ISBN": "039395501X",
	"year": 1985,
	"rating": 3,
	"language": "eng"
}, {
	"book_id": "123102859",
	"title": "

# JSON PARSE

We can now parse the JSON, converting it from a string into a useful Python data structure:

In [23]:
data = json.loads(raw_json)
print(data)

[{'book_id': '13585350', 'title': 'The World Treasury of Science Fiction', 'ISBN': '', 'year': 1989, 'rating': 3, 'language': 'eng'}, {'book_id': '124205572', 'title': 'The War of the Worlds', 'ISBN': '1936594056', 'year': 2013, 'rating': 4, 'language': 'eng'}, {'book_id': '127360065', 'title': 'Under the Dome: A Novel', 'ISBN': '1439149038', 'year': 2013, 'rating': 2, 'language': 'eng'}, {'book_id': '13908800', 'title': "The Ultimate Hitchhiker's Guide to the Galaxy", 'ISBN': '0345453743', 'year': 2002, 'rating': 5, 'language': 'eng'}, {'book_id': '123734934', 'title': "The Time Traveler's Wife", 'ISBN': '1476764832', 'year': 2014, 'rating': 5, 'language': 'eng'}, {'book_id': '13603020', 'title': "Salem's Lot", 'ISBN': '0451098277', 'year': 1976, 'rating': 3, 'language': 'eng'}, {'book_id': '124173974', 'title': 'Republic', 'ISBN': '039395501X', 'year': 1985, 'rating': 3, 'language': 'eng'}, {'book_id': '123102859', 'title': 'The Road', 'ISBN': '0307387895', 'year': 2006, 'rating': 5,

We can now iterate through the books in the list and extract the relevant information that we require.

In [24]:
for book in data:
#     print( "%s = %d" % ( book["title"], book["year"] ) )
    print(book)

{'book_id': '13585350', 'title': 'The World Treasury of Science Fiction', 'ISBN': '', 'year': 1989, 'rating': 3, 'language': 'eng'}
{'book_id': '124205572', 'title': 'The War of the Worlds', 'ISBN': '1936594056', 'year': 2013, 'rating': 4, 'language': 'eng'}
{'book_id': '127360065', 'title': 'Under the Dome: A Novel', 'ISBN': '1439149038', 'year': 2013, 'rating': 2, 'language': 'eng'}
{'book_id': '13908800', 'title': "The Ultimate Hitchhiker's Guide to the Galaxy", 'ISBN': '0345453743', 'year': 2002, 'rating': 5, 'language': 'eng'}
{'book_id': '123734934', 'title': "The Time Traveler's Wife", 'ISBN': '1476764832', 'year': 2014, 'rating': 5, 'language': 'eng'}
{'book_id': '13603020', 'title': "Salem's Lot", 'ISBN': '0451098277', 'year': 1976, 'rating': 3, 'language': 'eng'}
{'book_id': '124173974', 'title': 'Republic', 'ISBN': '039395501X', 'year': 1985, 'rating': 3, 'language': 'eng'}
{'book_id': '123102859', 'title': 'The Road', 'ISBN': '0307387895', 'year': 2006, 'rating': 5, 'langua

We then use json_normalize in Pandas to create a Data Frame of semi-structured JSON data to make it ready to analyse.

In [25]:
from pandas.io.json import json_normalize

df = json_normalize(data)
df.head(5)

Unnamed: 0,ISBN,book_id,language,rating,title,year
0,,13585350,eng,3,The World Treasury of Science Fiction,1989
1,1936594056.0,124205572,eng,4,The War of the Worlds,2013
2,1439149038.0,127360065,eng,2,Under the Dome: A Novel,2013
3,345453743.0,13908800,eng,5,The Ultimate Hitchhiker's Guide to the Galaxy,2002
4,1476764832.0,123734934,eng,5,The Time Traveler's Wife,2014


### OR
Alternatively, we can also use Pandas to directly download and parse JSON data for us, to create a Data Frame which is ready to analyse.

In [26]:
import pandas as pd
link = "https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/books.json" 
df = pd.read_json( link, orient="records")
df.head(5)

Unnamed: 0,ISBN,book_id,language,rating,title,year
0,,13585350,eng,3,The World Treasury of Science Fiction,1989
1,1936594056.0,124205572,eng,4,The War of the Worlds,2013
2,1439149038.0,127360065,eng,2,Under the Dome: A Novel,2013
3,345453743.0,13908800,eng,5,The Ultimate Hitchhiker's Guide to the Galaxy,2002
4,1476764832.0,123734934,eng,5,The Time Traveler's Wife,2014


## Working with XML

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable. XML is a widely-adopted format. Python includes several built-in modules for parsing XML data.

The *xml.etree.ElementTree* module can be used to extract data from a simple XML file based on its tree structure. 

In [27]:
# download the content
url = "https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/books.xml"
response = urllib.request.urlopen(url)
raw_xml = response.read().decode()
print(raw_xml)

<?xml version="1.0" encoding="UTF-8"?>
<booklist>
   <book id="13585350">
      <title>The World Treasury of Science Fiction</title>
      <ISBN />
      <year>1989</year>
      <rating>3</rating>
      <language>eng</language>
   </book>
   <book id="124205572">
      <title>The War of the Worlds</title>
      <ISBN>1936594056</ISBN>
      <year>2013</year>
      <rating>4</rating>
      <language>eng</language>
   </book>
   <book id="127360065">
      <title>Under the Dome: A Novel</title>
      <ISBN>1439149038</ISBN>
      <year>2013</year>
      <rating>2</rating>
      <language>eng</language>
   </book>
   <book id="13908800">
      <title>The Ultimate Hitchhiker's Guide to the Galaxy</title>
      <ISBN>0345453743</ISBN>
      <year>2002</year>
      <rating>5</rating>
      <language>eng</language>
   </book>
   <book id="123734934">
      <title>The Time Traveler's Wife</title>
      <ISBN>1476764832</ISBN>
      <year>2014</year>
      <rating>5</rating>
      <language>eng

We can use the *xml.etree.ElementTree.fromstring()* function to parse content from a string containing XML data.

In [28]:
import xml.etree.ElementTree as et
xroot = et.fromstring(raw_xml)

An XML tree has a root node (i.e. the top level of the document), with child nodes at lower levels. We can iterate over these:

In [29]:
for child in xroot:
    # get the name of the tag, along with any XML attributes which the tag has
    print( child.tag, child.attrib )

book {'id': '13585350'}
book {'id': '124205572'}
book {'id': '127360065'}
book {'id': '13908800'}
book {'id': '123734934'}
book {'id': '13603020'}
book {'id': '124173974'}
book {'id': '123102859'}


We can also query to find tags with specific names, such as '<book>' and then in turn find child nodes of that tag with a specific name.

In [30]:
for book in xroot.findall("book"):
    # get the text inside a <title> tag, contained within a <book> tag
    title = book.find("title").text
    print(title)

The World Treasury of Science Fiction
The War of the Worlds
Under the Dome: A Novel
The Ultimate Hitchhiker's Guide to the Galaxy
The Time Traveler's Wife
Salem's Lot
Republic
The Road


We can parse xml to Pandas dataframes, which is ready to analyse.

In [31]:
df_cols = ["id", "title", "ISBN", "year", "rating", "language"]
df = pd.DataFrame(columns = df_cols)

for node in xroot: 
    s_id = node.attrib.get("id")
    s_title = node.find("title").text
    s_isbn = node.find("ISBN").text
    s_year = node.find("year").text
    s_rating = node.find("rating").text
    s_language = node.find("language").text
    
    #print("%s\t%s\t%s\t%s\t%s\t%s " % (s_id, s_title, s_isbn, s_year, s_rating, s_language))
    df = df.append(pd.Series([s_id, s_title, s_isbn, s_year, s_rating, s_language], 
                                index = df_cols), 
                                ignore_index=True)
    
df

Unnamed: 0,id,title,ISBN,year,rating,language
0,13585350,The World Treasury of Science Fiction,,1989,3,eng
1,124205572,The War of the Worlds,1936594056,2013,4,eng
2,127360065,Under the Dome: A Novel,1439149038,2013,2,eng
3,13908800,The Ultimate Hitchhiker's Guide to the Galaxy,0345453743,2002,5,eng
4,123734934,The Time Traveler's Wife,1476764832,2014,5,eng
5,13603020,Salem's Lot,0451098277,1976,3,eng
6,124173974,Republic,039395501X,1985,3,eng
7,123102859,The Road,0307387895,2006,5,eng


## Working with HTML

[HyperText Markup Language (HTML)](https://en.wikipedia.org/wiki/HTML) is a language that web pages are created in. HTML isn’t a programming language, like Python — instead, it’s a markup language that tells a browser how to layout content. HTML allows you to do similar things to what you do in a word processor like Microsoft Word — make text bold, create paragraphs, and so on. Because HTML isn’t a programming language, it isn’t nearly as complex as Python.

The built-in Python urllib.request module has functions which help in downloading content from HTTP URLs using minimal code:

In [32]:
import urllib.request
link = "https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/sample_web/sample.html" 
response = urllib.request.urlopen(link)
html = response.read().decode()

We can simple use the for-loop to read the html file line by line to see its structure.

In [33]:
lines = html.strip().split("\n")
for l in lines:
    print(l)

<html>
  <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <title> KMITL Faculty/College</title>
  <link href="./css/sample.css" rel="stylesheet" type="text/css">
</head>
<body>
  <div class="topbar">
    <div id="top-logo2f">
      <img src="http://kmitl.ac.th/frontend/images/logo.png" alt="">
    </div>
    <div id="top-text2f">
      <h1>King Mongkut's Institute of Technology Ladkrabang</h1>
    </div>
  </div>

  <div id="main" style="margin: 20px;">
    <h1>Faculty/College in KMITL</h1>
    <div id="content">
        <h3><a href="http://www.ceir.kmitl.ac.th">College of Educational Innovation Research</a></h3>
        <h3><a href="http://www.music-engineering.kmitl.ac.th/">Institute of Music Science and Engineering</a></h3>
        <h3><a href="http://www.kosen.kmitl.ac.th/">KOSEN-KMITL</a></h3>
        <h3><a href="http://www.chumphon.kmitl.ac.th">KMITL Prince of Chumphon Campus </a></h3>
        <h3><a href="http://engineer.kmitl.ac.th/">Faculty of Engin

### The requests library

The first thing we’ll need to do to scrape a web page is to download the page. We can download pages using the Python [requests](https://2.python-requests.org//en/master/) library. The requests library will make a `GET` request to a web server, which will download the HTML contents of a given web page for us. There are several different types of `requests` we can make using requests, of which `GET` is just one. If you want to learn more, check out this tutorial for using [API](https://www.dataquest.io/blog/python-api-tutorial/) requests in Python.

Let’s try downloading a simple sample website, [https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/sample_web/sample.html](https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/sample_web/sample.html). We’ll need to first download it using the requests.get method.

In [34]:
import requests
page = requests.get("https://www.it.kmitl.ac.th/~teerapong/resources/ds4biz/week4/sample_web/sample.html")
page

<Response [200]>

After running our request, we get a [Response](https://2.python-requests.org//en/master/user/quickstart/#response-content) object. This object has a `status_code` property, which indicates if the page was downloaded successfully:

In [35]:
page.status_code

200

A `status_code` of `200` means that the page downloaded successfully. We won’t fully dive into status codes here, but a status code starting with a `2` generally indicates success, and a code starting with a `4` or a `5` indicates an error.

We can print out the HTML content of the page using the content property:

In [36]:
page.content

b'<html>\n  <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n  <title> KMITL Faculty/College</title>\n  <link href="./css/sample.css" rel="stylesheet" type="text/css">\n</head>\n<body>\n  <div class="topbar">\n    <div id="top-logo2f">\n      <img src="http://kmitl.ac.th/frontend/images/logo.png" alt="">\n    </div>\n    <div id="top-text2f">\n      <h1>King Mongkut\'s Institute of Technology Ladkrabang</h1>\n    </div>\n  </div>\n\n  <div id="main" style="margin: 20px;">\n    <h1>Faculty/College in KMITL</h1>\n    <div id="content">\n        <h3><a href="http://www.ceir.kmitl.ac.th">College of Educational Innovation Research</a></h3>\n        <h3><a href="http://www.music-engineering.kmitl.ac.th/">Institute of Music Science and Engineering</a></h3>\n        <h3><a href="http://www.kosen.kmitl.ac.th/">KOSEN-KMITL</a></h3>\n        <h3><a href="http://www.chumphon.kmitl.ac.th">KMITL Prince of Chumphon Campus </a></h3>\n        <h3><a href="http://engineer.kmitl

### Parsing a page with BeautifulSoup

As you can see above, we now have downloaded an HTML document.

We can use the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) library to parse this document, and extract the text from the `h3` tag. We first have to import the library, and create an instance of the `BeautifulSoup` class to parse our document:

In [37]:
import bs4 
soup = bs4.BeautifulSoup(page.content, 'html.parser')

for match in soup.find_all("h3"):
    text = match
    print(text)

<h3><a href="http://www.ceir.kmitl.ac.th">College of Educational Innovation Research</a></h3>
<h3><a href="http://www.music-engineering.kmitl.ac.th/">Institute of Music Science and Engineering</a></h3>
<h3><a href="http://www.kosen.kmitl.ac.th/">KOSEN-KMITL</a></h3>
<h3><a href="http://www.chumphon.kmitl.ac.th">KMITL Prince of Chumphon Campus </a></h3>
<h3><a href="http://engineer.kmitl.ac.th/">Faculty of Engineering</a></h3>
<h3><a href="http://www.arch.kmitl.ac.th/">Faculty of Architecture</a></h3>
<h3><a href="http://www.science.kmitl.ac.th/main.php">Faculty of Science</a></h3>
<h3><a href="http://www.ietech.kmitl.ac.th">Faculty of Industrial Education and Technology</a></h3>
<h3><a href="http://www.agri.kmitl.ac.th/AgriTH/">Faculty of Agricultural Technology</a></h3>
<h3><a href="http://www.it.kmitl.ac.th/">Faculty of Information Technology</a></h3>
<h3><a href="http://www.agroind.kmitl.ac.th/th/home-agro">Faculty of Agro-Industry</a></h3>
<h3><a href="http://www.fam.kmitl.ac.th/">

## Working with APIs

### Example - Wikipedia

As a simple example of using an Online API, we will retrieve JSON data from the Wikipedia web API. The Wikipedia page for 'KMITL' is [here](https://en.wikipedia.org/wiki/King_Mongkut%27s_Institute_of_Technology_Ladkrabang). We can retrieve this data in a cleaner JSON format from the Wikipedia API endpoint (https://en.wikipedia.org/w/api.php).

In [38]:
title = "King_Mongkut%27s_Institute_of_Technology_Ladkrabang"
url = "https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=true&titles=" + title
print(url)

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=true&titles=King_Mongkut%27s_Institute_of_Technology_Ladkrabang


In [39]:
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")

Once we have downloaded the JSON data into a string, we parse it using the *loads()* function, which will convert it into an actual Python dictionary.

In [40]:
import json

In [41]:
data = json.loads(raw_json)
data

{'batchcomplete': '',
 'query': {'normalized': [{'from': "King_Mongkut's_Institute_of_Technology_Ladkrabang",
    'to': "King Mongkut's Institute of Technology Ladkrabang"}],
  'pages': {'1232312': {'pageid': 1232312,
    'ns': 0,
    'title': "King Mongkut's Institute of Technology Ladkrabang",
    'extract': "<p><b>King Mongkut's Institute of Technology Ladkrabang</b> (<b>KMITL</b> or <b>KMIT Ladkrabang</b> for short) is a research and educational institution in Thailand. It is in the city of It is approximately 30 km east of the center of Bangkok in Lat Krabang District, Bangkok, Thailand, and has seven faculties: engineering, architecture, agricultural technology, science, industrial education, agricultural industry, information technology, and liberal arts.\n</p>\n\n\n"}}}}

The response still needs to be inspected. Note that the results we want are are in *data["query"]["pages"]*:

In [42]:
print(data["query"]["pages"])

{'1232312': {'pageid': 1232312, 'ns': 0, 'title': "King Mongkut's Institute of Technology Ladkrabang", 'extract': "<p><b>King Mongkut's Institute of Technology Ladkrabang</b> (<b>KMITL</b> or <b>KMIT Ladkrabang</b> for short) is a research and educational institution in Thailand. It is in the city of It is approximately 30 km east of the center of Bangkok in Lat Krabang District, Bangkok, Thailand, and has seven faculties: engineering, architecture, agricultural technology, science, industrial education, agricultural industry, information technology, and liberal arts.\n</p>\n\n\n"}}


In [43]:
result = data["query"]["pages"]["1232312"]
print(result["title"])
print(result["extract"])

King Mongkut's Institute of Technology Ladkrabang
<p><b>King Mongkut's Institute of Technology Ladkrabang</b> (<b>KMITL</b> or <b>KMIT Ladkrabang</b> for short) is a research and educational institution in Thailand. It is in the city of It is approximately 30 km east of the center of Bangkok in Lat Krabang District, Bangkok, Thailand, and has seven faculties: engineering, architecture, agricultural technology, science, industrial education, agricultural industry, information technology, and liberal arts.
</p>





### Example - Currency Exchange Rates

In the next example, we will use the *Fixer.io* API to get currency exchange rate information: http://fixer.io

For API documentation: https://fixer.io/documentation

To retrieve all rates in EUROs, we retrieve the following:

In [44]:
ACCESS_KEY = "0c9904dea3d2c46b78686bc16bbba722"

In [45]:
url = "http://data.fixer.io/api/latest?access_key=" + ACCESS_KEY
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")
print(raw_json)

{"success":true,"timestamp":1573386246,"base":"EUR","date":"2019-11-10","rates":{"AED":4.047045,"AFN":86.223727,"ALL":123.086065,"AMD":525.791674,"ANG":1.911029,"AOA":508.048637,"ARS":65.568799,"AUD":1.606612,"AWG":1.983411,"AZN":1.877596,"BAM":1.953322,"BBD":2.223994,"BDT":93.344588,"BGN":1.955838,"BHD":0.415387,"BIF":2061.645515,"BMD":1.101895,"BND":1.497187,"BOB":7.61675,"BRL":4.587744,"BSD":1.101461,"BTC":0.000124,"BTN":78.554208,"BWP":12.011802,"BYN":2.249481,"BYR":21597.141684,"BZD":2.220199,"CAD":1.457532,"CDF":1834.655547,"CHF":1.098445,"CLF":0.02991,"CLP":825.319741,"CNY":7.708902,"COP":3679.227351,"CRC":644.667506,"CUC":1.101895,"CUP":29.200217,"CVE":110.524361,"CZK":25.484522,"DJF":195.82921,"DKK":7.472556,"DOP":58.422905,"DZD":132.372794,"EGP":17.796,"ERN":16.528819,"ETB":32.950931,"EUR":1,"FJD":2.406583,"FKP":0.895709,"GBP":0.86254,"GEL":3.26204,"GGP":0.8625,"GHS":6.088014,"GIP":0.895709,"GMD":56.439491,"GNF":10181.510043,"GTQ":8.481216,"GYD":230.441976,"HKD":8.625087,"HNL

Parse the JSON data

In [46]:
data = json.loads(raw_json)
# List all the rates
data

{'success': True,
 'timestamp': 1573386246,
 'base': 'EUR',
 'date': '2019-11-10',
 'rates': {'AED': 4.047045,
  'AFN': 86.223727,
  'ALL': 123.086065,
  'AMD': 525.791674,
  'ANG': 1.911029,
  'AOA': 508.048637,
  'ARS': 65.568799,
  'AUD': 1.606612,
  'AWG': 1.983411,
  'AZN': 1.877596,
  'BAM': 1.953322,
  'BBD': 2.223994,
  'BDT': 93.344588,
  'BGN': 1.955838,
  'BHD': 0.415387,
  'BIF': 2061.645515,
  'BMD': 1.101895,
  'BND': 1.497187,
  'BOB': 7.61675,
  'BRL': 4.587744,
  'BSD': 1.101461,
  'BTC': 0.000124,
  'BTN': 78.554208,
  'BWP': 12.011802,
  'BYN': 2.249481,
  'BYR': 21597.141684,
  'BZD': 2.220199,
  'CAD': 1.457532,
  'CDF': 1834.655547,
  'CHF': 1.098445,
  'CLF': 0.02991,
  'CLP': 825.319741,
  'CNY': 7.708902,
  'COP': 3679.227351,
  'CRC': 644.667506,
  'CUC': 1.101895,
  'CUP': 29.200217,
  'CVE': 110.524361,
  'CZK': 25.484522,
  'DJF': 195.82921,
  'DKK': 7.472556,
  'DOP': 58.422905,
  'DZD': 132.372794,
  'EGP': 17.796,
  'ERN': 16.528819,
  'ETB': 32.950931,


In [47]:
# Get a specific rate
data["rates"]["CHF"]

1.098445

We can change the URL to get rates for a different currency, such as US Dollars (USD):

In [48]:
url = "http://data.fixer.io/api/latest?access_key=" + ACCESS_KEY + "&symbols=USD"
print(url)
# Retrieve the JSON
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")
# Parse the JSON
data = json.loads(raw_json)
# Display the rates data for US dollars
data["rates"]

http://data.fixer.io/api/latest?access_key=0c9904dea3d2c46b78686bc16bbba722&symbols=USD


{'USD': 1.101895}

In [49]:
data

{'success': True,
 'timestamp': 1573386246,
 'base': 'EUR',
 'date': '2019-11-10',
 'rates': {'USD': 1.101895}}

In [50]:
df = json_normalize(data)
df

Unnamed: 0,base,date,rates.USD,success,timestamp
0,EUR,2019-11-10,1.101895,True,1573386246
