---
# INTERMEDIATE PYTHON PROGRAMMING
# CHAPTER 3 - Workflow, Open Data Landscape & Processing JSON Data
---


# WHAT IS DATA SCIENCE WORKFLOW?

Workflow is the preferred step-by-step procedure to complete a complex data science task.

![R Data Science Workflow](https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png)

Some common steps in data science include: **exploring data**; **wrangling data**; **visualizing data** and **modeling** (for machine learning).

These steps are NOT necessarily carried out in linear. They are usually iterative. Company/team might have very different steps for the sake of intended productivity and efficiency consideration.


# COMMON DATA FORMATS

![](https://www.researchgate.net/publication/236860222/figure/fig4/AS:669293989593114@1536583530877/Unstructured-semi-structured-and-structured-data.png)

## Structured Data Formats
- **CSV (Comma-Separated Values)** – Stores tabular data in plain text, with values separated by commas.
- **Excel** - Excel is a commonly used data source in data science projects due to its accessibility, familiarity, and ability to handle structured tabular data.
- **SQL Databases** – Relational database storage where data is organized into tables with relationships.

## Semi-Structured &
- **JSON (JavaScript Object Notation)** – A lightweight, human-readable format often used in APIs and web applications.
- **XML (eXtensible Markup Language)** – A structured format used for storing and transporting data.
- **YAML (Yet Another Markup Language)** – A human-friendly format used for configuration files.


##  Unstructured Formats
- **Binary Formats** – Includes images (PNG, JPEG), videos (MP4), and other non-text data types.
- **TXT Files** – Simple unformatted text files for storing raw data.

Each format has specific advantages. CSV (structured) is simple and widely supported, JSON (semi-structure) is flexible and commonly used in web apps.

# OPEN DATA LANDSCAPE
- Open data refers to publicly available data that can be freely accessed, used, and shared by anyone, without restrictions or costly barriers. 
- It is typically structured or (at least semi-structure) in a way that enables easy analysis
- It is often released by governments, organizations, or researchers to promote transparency, innovation, and collaboration. 
- Open data can cover various domains, including government statistics, scientific research, environmental data, and more. 
- Examples open data sources include open government data portals, weather datasets, and publicly accessible geospatial information.

# OPEN DATA SOURCES
- [Hong Kong Government Open Data](https://data.gov.hk/en/)
- [US Government Open Data](https://data.gov/)
- [UK Government Open Data](https://www.data.gov.uk/)
- [EU Open Data](https://data.europa.eu/en)
- [World Bank](https://data.worldbank.org/)
- [Kaggle](https://www.kaggle.com/datasets)
- [Harvard Dataverse](https://dataverse.harvard.edu/)

# READING LIVE CSV DATA DIRECTLY FROM THE WEB

In this section, we will write codes to read **live** (the latest version) data from Hong Kong Gov Open Data portal.

Search for **ugc** (university grant committee) on Government Open Data website

[Click here to search dataset from ugc](https://data.gov.hk/en-datasets/search/ugc)
[![search ugc](./images/data-gov-hk-search-ugc.png)](https://data.gov.hk/en-datasets/search/ugc)

Importing pandas  
```
import requests
import pandas as pd
from io import StringIO
```

Declaring url - url means universal resource location, it actually means a web resource address   
```
url = 'https://res.data.gov.hk/api/get-download-file?name=https%3A%2F%2Fwww.ugcs.gov.hk%2Fdatagovhk%2FGraduates2(Eng).csv'
```

Create a HTTP request and store the response
```
response = requests.get(url)
```

Use pandas `read_csv()` function to read the response.text.  We have to convert it to `StringIO()` format first.  Compared to pre-download approach, with this approach, we will always get latest copy of date when we run the codes.
```
ugra = pd.read_csv(StringIO(response.text))
```

Explore the DataFrame of university graduate by using Pandas functions
- `shape` attribute: `ugra.shape`
- `columns` attribute: `ugra.shape`
- `head()` function: `ugra.head()`
- `tail()` function: `ugra.tail()`
- `info()` function: `ugra.info()`
- `describe()` function: `ugra.describe()`
- `unique()` function: `ugra['Academic Year'].unique()`
- filtering rows using `[ ]` operator: `ugra[ugra['Academic Year']=='2023/24']`
- more complex filtering: `ugra[(ugra['Academic Year']=='2023/24') & (ugra['Level of Study']=='Taught Postgraduate')]`


In [44]:
import requests
import pandas as pd
from io import StringIO

In [46]:
url = "https://res.data.gov.hk/api/get-download-file?name=https%3A%2F%2Fwww.ugcs.gov.hk%2Fdatagovhk%2FGraduates2(Eng).csv"
url

'https://res.data.gov.hk/api/get-download-file?name=https%3A%2F%2Fwww.ugcs.gov.hk%2Fdatagovhk%2FGraduates2(Eng).csv'

In [48]:
response = requests.get(url)

In [49]:
ugra = pd.read_csv(StringIO(response.text))

In [50]:
ugra.head(2)

Unnamed: 0,Academic Year,Level of Study,Broad Academic Programme Category,Sex,Number of Graduates (Headcount)
0,2009/10,Sub-degree,Arts and Humanities,Male,158
1,2009/10,Sub-degree,Arts and Humanities,Female,475


In [54]:
ugra.shape

(803, 5)

In [56]:
ugra.columns

Index(['Academic Year', 'Level of Study', 'Broad Academic Programme Category',
       'Sex', 'Number of Graduates (Headcount)'],
      dtype='object')

In [58]:
ugra.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 803 entries, 0 to 802
Data columns (total 5 columns):
 #   Column                             Non-Null Count  Dtype 
---  ------                             --------------  ----- 
 0   Academic Year                      803 non-null    object
 1   Level of Study                     803 non-null    object
 2   Broad Academic Programme Category  803 non-null    object
 3   Sex                                803 non-null    object
 4   Number of Graduates (Headcount)    803 non-null    int64 
dtypes: int64(1), object(4)
memory usage: 31.5+ KB


In [60]:
ugra.describe()

Unnamed: 0,Number of Graduates (Headcount)
count,803.0
mean,508.03736
std,693.152892
min,1.0
25%,62.0
50%,160.0
75%,729.5
max,2878.0


In [62]:
ugra['Academic Year'].unique()

array(['2009/10', '2010/11', '2011/12', '2012/13', '2013/14', '2014/15',
       '2015/16', '2016/17', '2017/18', '2018/19', '2019/20', '2020/21',
       '2021/22', '2022/23', '2023/24'], dtype=object)

In [64]:
ugra['Level of Study'].unique()

array(['Sub-degree', 'Undergraduate', 'Taught Postgraduate',
       'Research Postgraduate'], dtype=object)

In [66]:
ugra['Broad Academic Programme Category'].unique()

array(['Arts and Humanities', 'Business and Management', 'Education',
       'Engineering and Technology', 'Medicine, Dentistry and Health',
       'Sciences', 'Social Sciences'], dtype=object)

In [68]:
ugra['Sex'].unique()

array(['Male', 'Female'], dtype=object)

In [72]:
ugra[ugra['Academic Year']=='2023/24']

Unnamed: 0,Academic Year,Level of Study,Broad Academic Programme Category,Sex,Number of Graduates (Headcount)
753,2023/24,Sub-degree,Arts and Humanities,Male,11
754,2023/24,Sub-degree,Arts and Humanities,Female,42
755,2023/24,Sub-degree,Education,Male,538
756,2023/24,Sub-degree,Education,Female,1191
757,2023/24,Sub-degree,Engineering and Technology,Male,65
758,2023/24,Sub-degree,Engineering and Technology,Female,13
759,2023/24,Sub-degree,Sciences,Male,63
760,2023/24,Sub-degree,Sciences,Female,89
761,2023/24,Sub-degree,Social Sciences,Male,6
762,2023/24,Sub-degree,Social Sciences,Female,16


In [74]:
ugra[(ugra['Academic Year']=='2023/24') & (ugra['Level of Study']=='Taught Postgraduate')]

Unnamed: 0,Academic Year,Level of Study,Broad Academic Programme Category,Sex,Number of Graduates (Headcount)
777,2023/24,Taught Postgraduate,Arts and Humanities,Male,19
778,2023/24,Taught Postgraduate,Arts and Humanities,Female,38
779,2023/24,Taught Postgraduate,Education,Male,350
780,2023/24,Taught Postgraduate,Education,Female,647
781,2023/24,Taught Postgraduate,Engineering and Technology,Male,63
782,2023/24,Taught Postgraduate,Engineering and Technology,Female,80
783,2023/24,Taught Postgraduate,"Medicine, Dentistry and Health",Male,22
784,2023/24,Taught Postgraduate,"Medicine, Dentistry and Health",Female,77
785,2023/24,Taught Postgraduate,Sciences,Male,13
786,2023/24,Taught Postgraduate,Sciences,Female,26


# INTRODUCTION TO JSON FORMAT
JSON (**J**ava**S**cript **O**bject **N**otation) is a lightweight data format used to store and exchange structured information in a human-readable way.  

It is widely used in web applications, APIs, and data storage because of its simplicity and compatibility with most programming languages.

## Basic JSON Structure
JSON is **complex** plain text data.  Unlike regular python variable being a simple data point (e.g. `age = 20`), JSON is NOT a single simple data point.  Instead it is a collection of data wrapped together an known as an **dictionary** (also widely referred as **object**).

JSON object describe data in the manner of `key`-`value` pairs and supports various data types like **strings**, **numbers**, **arrays**, and **objects**. Yes, a JSON object can contain another JSON object and therefore JSON is **nested data structure**.  And this is why JSON is categorized as **semi-structured data** while CSV or EXCEL are usually **strict two-dimensional** data structure (contains **rows** with fixed number of **columns** for each row, widely known as **tabular data**).

A `customer.json` JSON text file (You can find this file at our **data** sub-folder
![JSON Example](./images/customer-json.png)

In Jupyter Lab, if you double click to open it, it will give a prettified preview mode like below.
![Prettified JSON](./images/json-jupyter-lab-preview.png)


If you wish to open in source-code mode, **right-click** at the JSON file and choose **Open with -> Editor**.  And it will give you source code edit mode in Jupyter Lab windows like below.
![Raw JSON](./images/json-source.png)

**Reminder**: JSON file is tricky to edit.   

JSON is very strict syntax, each the symbol (or pair of symbols) like `{ }`, `[ ]`, `,`, `:`,  `" "` is parts of the syntax and has it distinct meaning.  

It's not right to use them interchangely.

## JSON Object
An object is an **unordered** set of name/value pairs.  

An object begins with `{` (left brace) and ends with `}` (right brace).  

Each **name** is followed by `:` (colon, to seperate name from value) 

Each name/value pairs are separated by `,` (comma.  except the last pair). 

![JSON Object](https://www.json.org/img/object.png)

## JSON Array

An array is an **ordered** collection of values.  

An array begins with `[` (left bracket) and ends with `]` (right bracket). 

Values are separated by `,` comma.

An array can contain a collection of simple **value** or a colletion of **object**.

![JSON Array](https://www.json.org/img/array.png)

## JSON Value

A value can be a `string` (wrapped by a pair of **double quotes**, e.g.: `"string"`), or a `number`, or `true` or `false` or `null`, or an `object` or an `array`. 

These structures can be nested.

![JSON Value](https://www.json.org/img/value.png)

## JSON Playground
A online place for you to practice and have better understanding about JSON syntax. 
It will constantly check and prompt if your JSON codes are correct while you press your keystrokes so that once you miss just a `coma` you will immediately know.

Click to open -> [JSON-LD Playground](https://json-ld.org/playground/)
![JSON Playground Picture](./images/json-playground.png)

# ACESSING JSON DATA and NESTED DATA
In this section, we will load a local json file (you will find it at **.\data\customer.json**)

We need to import `json` package to handle json data  
```
import json
```


Use the file `open()` function to open a local file and use to `json.load()` to load the data as python `dict` type.  And make the json object accessible in the name of `customer`
```
with open('./data/customer.json', 'r') as file:
    customer = json.load(file)
customer
```
  
To prettify the object display with proper indentation/alignment
```
print(json.dumps(customer, indent=4))
```

Use `[ ]` operator together with key (in string type and therefore must be in `' '` (single quoted) or `" "` (double quoted)
```
customer['firstName']
```
**Reminder**:   
- You can use `Tab` key to prompt/complete long **key**
- key is case-sentive: `'firstName'` is not the same to `'firstname'`

## import `json` package/library

In [91]:
import json

## open file : `customer.json`

In [93]:
with open('./data/customer.json', 'r') as file:
    customer = json.load(file)
print(customer)

{'firstName': 'Peter', 'lastName': 'Pan', 'age': 18, 'tel': ['111', '222'], 'address': {'building': 'Cyberport 4', 'street': 'Information Cres', 'district': 'Southern District'}}


## Making reference to `customer` object

In [95]:
customer

{'firstName': 'Peter',
 'lastName': 'Pan',
 'age': 18,
 'tel': ['111', '222'],
 'address': {'building': 'Cyberport 4',
  'street': 'Information Cres',
  'district': 'Southern District'}}

---
**use `print()` function to display `customer` object**

In [97]:
print(customer)

{'firstName': 'Peter', 'lastName': 'Pan', 'age': 18, 'tel': ['111', '222'], 'address': {'building': 'Cyberport 4', 'street': 'Information Cres', 'district': 'Southern District'}}


---
**use `json.dump()` function with `indent` parameter to print out nicely**

In [99]:
print(json.dumps(customer, indent=4))

{
    "firstName": "Peter",
    "lastName": "Pan",
    "age": 18,
    "tel": [
        "111",
        "222"
    ],
    "address": {
        "building": "Cyberport 4",
        "street": "Information Cres",
        "district": "Southern District"
    }
}


---
**check the type of `customer`**

In [101]:
type(customer)

dict

## accessing string attribute: using `firstName` key
**reminder**: key is case sensitive

In [103]:
customer["firstName"]

'Peter'

The cell below will produce an KeyError which complains the key is NOT found.

In [108]:
customer["firstname"] # key is case sensitive, this will produce an error

KeyError: 'firstname'

---
**check type using `type()` function**
rememember: in most case, codes crash due to type-mismatching or type error

In [110]:
type(customer["firstName"])

str

---
**since `firstName` is a `str` type, you can call any functions that are associated to string**  
such as: `upper()`, `lower()`, `len()`

In [113]:
customer["firstName"].upper()

'PETER'

In [115]:
len(customer["firstName"])

5

## accessing number attribute: using `age` key

In [118]:
customer['age']

18

---
**check type**

In [121]:
type(customer['age'])

int

---
**since `age` is a number, you can do any operations or caluculation that you can do to a number**  
e.g.: `+`, `-`, `*`, `/`, `%`, `>`, `<`, `>=`, `<=`, `==`, `!=`

In [124]:
customer['age'] + 10

28

In [126]:
customer['age'] >18

False

In [128]:
customer['age'] ==18

True

In [130]:
customer['age'] >=18

True

In [132]:
customer['age'] !=18

False

## acessing `list` attribute: using `tel` key**  

In [135]:
customer['tel']

['111', '222']

--- 
**check type**

In [138]:
type(customer['tel'])

list

---
**accessing an element in list**  
Use `[ ]` operator together with index number (NOT string) to refer to the position of element in the list  
Python is 0-based indexing (means first element is at index `0`  
**correct**: `customer['tel'][0]`  
**wrong**: `customer['tel']["0"]`  

In [141]:
customer['tel'][0]

'111'

In [143]:
customer['tel'][1]

'222'

below will generate an erro as `tel` is a list of number  
and therefore an index number (**NOT string**) is expected to tell which tel number you want (0-based)

In [149]:
customer['tel']["0']

SyntaxError: unterminated string literal (detected at line 1) (1963794490.py, line 1)

In [151]:
type(customer['tel'][0])

str

## acessing `object` attribute: using `address` key
Remember: 
- JSON is nested data structure, we can have object inside object inside object.  You just need to use multiple level of `[ ]` operator together with the right `key' to access.
- key is always indentified by `string` for object
- To beginner, you can access the data level-by-level, don't type long statement for multiple level if you aren't very confident

In [154]:
customer['address']

{'building': 'Cyberport 4',
 'street': 'Information Cres',
 'district': 'Southern District'}

In [156]:
print(json.dumps(customer['address'], indent=4))

{
    "building": "Cyberport 4",
    "street": "Information Cres",
    "district": "Southern District"
}


---
**check type**

In [159]:
type(customer['address'])

dict

In [161]:
customer['address']['building']

'Cyberport 4'

---
**Use formatted string: `f" "` for together with `print()` function to display complex  sentence that includes values from `customer` 
object**  
We use `{ }` to include variable's value in a formatted string

In [164]:
print(f"The customer is {customer['firstName']} {customer['lastName']} and contact phone number is {customer['tel'][0]}.")

The customer is Peter Pan and contact phone number is 111.


# READING JSON DATA DIRECTLY FROM THE WEB
We usually read JSON data from the original web data source.  And therefore we will need a library named `requests` to handle HTTP request.

Below is a simple diagram that shows how HTTP protocol works
![HTTP Protocol](https://miro.medium.com/v2/resize:fit:853/1*8-fT6K1o6nHiBRxKppcqOg.png)

## Import `request` library

Import requests library to handle HTTP (HpyerText Transfer Protocol, a web protocol)
```
import requests
```

In [168]:
import requests

## Specify the url of your data source
- url (uniform resource locator) is the web address of the target data source (open data or private data)  
- Tt's a pubicly accessible web resource that DOESN'T return regular webpage to view in a web browser, but for application programming purpose
- If it's open data, it requires NO authentication.  If it's private data, it usually requires account pre-registration and an API key is required to access the data 
- Returned data is mostly in JSON or XML formats (Choose JSON format whenever you can as JSON data is more popular nowadays and is easier to process in programming)
- JSON data (and XML data) are NOT structured, so one CANNNOT use pandas `DataFrame` directly to store/represent online open data

Below is an url example.  It's simple a string that you can copy and paste to a browser address bar and it will show up data (usually in JSON).  You can install **Firefox** browser to view JSON data link in a much organized format as Firefox is JSON friendly.

```
url = "https://jsonplaceholder.typicode.com/users"
```

Click [here](https://www.mozilla.org/en-US/firefox/new/?xv=refresh-new&v=b) to download and install Firefox
![Firefox Logo](https://firefox-dev.tools/photon/images/product-identity-assets/firefox-dont-1.png)


In the above line of code, we use a variable named `url` to store the address and refer to it in later coding.

In [171]:
url = "https://jsonplaceholder.typicode.com/users"

## Fetch the url using `requests.get()` function and store it as response data named `response`
```
response = requests.get(url)
```

Parse the response data to JSON format by calling built-in `json()` function
```
json_data = response.json()
```

Display JSON content using `json.dumps()` together with `print()` function

In [174]:
response = requests.get(url)

In [176]:
type(response)

requests.models.Response

In [178]:
users = response.json()

In [180]:
print(users)

[{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}}, {'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}}, {'id': 3, 'name': 'Clementine Bauch', 'username': 'Samantha', 'email': 'Nathan@yesenia.net', 'address': {'street': 'Douglas Exten

In [182]:
print(json.dumps(users, indent=4))

[
    {
        "id": 1,
        "name": "Leanne Graham",
        "username": "Bret",
        "email": "Sincere@april.biz",
        "address": {
            "street": "Kulas Light",
            "suite": "Apt. 556",
            "city": "Gwenborough",
            "zipcode": "92998-3874",
            "geo": {
                "lat": "-37.3159",
                "lng": "81.1496"
            }
        },
        "phone": "1-770-736-8031 x56442",
        "website": "hildegard.org",
        "company": {
            "name": "Romaguera-Crona",
            "catchPhrase": "Multi-layered client-server neural-net",
            "bs": "harness real-time e-markets"
        }
    },
    {
        "id": 2,
        "name": "Ervin Howell",
        "username": "Antonette",
        "email": "Shanna@melissa.tv",
        "address": {
            "street": "Victor Plains",
            "suite": "Suite 879",
            "city": "Wisokyburgh",
            "zipcode": "90566-7771",
            "geo": {
              

In [184]:
users[0:2]

[{'id': 1,
  'name': 'Leanne Graham',
  'username': 'Bret',
  'email': 'Sincere@april.biz',
  'address': {'street': 'Kulas Light',
   'suite': 'Apt. 556',
   'city': 'Gwenborough',
   'zipcode': '92998-3874',
   'geo': {'lat': '-37.3159', 'lng': '81.1496'}},
  'phone': '1-770-736-8031 x56442',
  'website': 'hildegard.org',
  'company': {'name': 'Romaguera-Crona',
   'catchPhrase': 'Multi-layered client-server neural-net',
   'bs': 'harness real-time e-markets'}},
 {'id': 2,
  'name': 'Ervin Howell',
  'username': 'Antonette',
  'email': 'Shanna@melissa.tv',
  'address': {'street': 'Victor Plains',
   'suite': 'Suite 879',
   'city': 'Wisokyburgh',
   'zipcode': '90566-7771',
   'geo': {'lat': '-43.9509', 'lng': '-34.4618'}},
  'phone': '010-692-6593 x09125',
  'website': 'anastasia.net',
  'company': {'name': 'Deckow-Crist',
   'catchPhrase': 'Proactive didactic contingency',
   'bs': 'synergize scalable supply-chains'}}]

In [186]:
users[0]

{'id': 1,
 'name': 'Leanne Graham',
 'username': 'Bret',
 'email': 'Sincere@april.biz',
 'address': {'street': 'Kulas Light',
  'suite': 'Apt. 556',
  'city': 'Gwenborough',
  'zipcode': '92998-3874',
  'geo': {'lat': '-37.3159', 'lng': '81.1496'}},
 'phone': '1-770-736-8031 x56442',
 'website': 'hildegard.org',
 'company': {'name': 'Romaguera-Crona',
  'catchPhrase': 'Multi-layered client-server neural-net',
  'bs': 'harness real-time e-markets'}}

In [188]:
users[0]['name']

'Leanne Graham'

In [190]:
users[0]['email']

'Sincere@april.biz'

In [192]:
users[0]['company']

{'name': 'Romaguera-Crona',
 'catchPhrase': 'Multi-layered client-server neural-net',
 'bs': 'harness real-time e-markets'}

In [194]:
users[0]['company']['name']

'Romaguera-Crona'

# Commercial Data API
A Data **API** (**A**pplication **P**rogramming **I**nterface) allows applications to access, retrieve, and manipulate structured data from various sources, such as databases, cloud storage, or external data services. 

It acts as a bridge between users or applications and data repositories, enabling seamless interaction without requiring direct database access.

Key Features of a Data API
- **Standardized Access** – Provides a consistent way to query and retrieve data.
- **RESTful or GraphQL-Based** – Commonly uses REST or GraphQL protocols for efficient data exchange.
- **Authentication & Security** – Ensures data protection via authentication methods like API keys, OAuth, or tokens.
- **Data Filtering & Querying** – Supports filtering, pagination, and query customization.
- **Interoperability** – Allows different applications and platforms to integrate and interact with shared data.

## Some Commercial Data API
Commercial data APIs provide businesses and developers with access to structured or semi-structured, real-time or historical data for various industries.   
These APIs are often subscription-based or require payment for premium data access.

Some Examples In the Market
- [Alpha Vantage](https://www.alphavantage.co/) – Provides stock market data, financial indicators, and cryptocurrency trends.
- [Futu](https://openapi.futunn.com/futu-api-doc/en/intro/intro.html) - penAPI provides wide varieties of market data and trading services for your programmed trading to meet the needs of every developer's programmed trading and help your Quant dreams.
- [OpenWeather](https://openweathermap.org/api) - fast and easy-to-work weather APIs
- [exchangerates](https://exchangeratesapi.io/) - Historical & Real-time Exchange Rates & Currency Conversion for Business


## Consuming Currency Exchange Rates API
In this section, we will write code to retrieive currency exchange rate API.  

Go to [https://exchangeratesapi.io/](https://exchangeratesapi.io/) to create a free account and copy your personal **API Key** in your account setting for later use.  You won't be able to connect to the API without a valid API key.

Below is the API documentation page.  
[https://exchangeratesapi.io/documentation/](https://exchangeratesapi.io/documentation/)


In [200]:
import requests

In [202]:
ex_api_key = 'Y'

In [204]:
url = f"https://api.exchangeratesapi.io/v1/latest?access_key={ex_api_key}"
url

'https://api.exchangeratesapi.io/v1/latest?access_key=6e9dda309d99dbd132bfed5628c84144'

## Test your URL in browser before you moving on writing request codes

Sometime, you might mistakenly configure your API url with invalid parameters.  Test the url link in browser to double confirm it's correct.

In [207]:
url

'https://api.exchangeratesapi.io/v1/latest?access_key=6e9dda309d99dbd132bfed5628c84144'

In [209]:
ex_response = requests.get(url)

In [211]:
ex_data = ex_response.json()
ex_data

{'success': True,
 'timestamp': 1745763244,
 'base': 'EUR',
 'date': '2025-04-27',
 'rates': {'AED': 4.17462,
  'AFN': 81.26835,
  'ALL': 98.998633,
  'AMD': 443.632685,
  'ANG': 2.048405,
  'AOA': 1042.795414,
  'ARS': 1322.593932,
  'AUD': 1.773374,
  'AWG': 2.045802,
  'AZN': 1.936659,
  'BAM': 1.957362,
  'BBD': 2.294761,
  'BDT': 138.08625,
  'BGN': 1.955208,
  'BHD': 0.428349,
  'BIF': 3332.384619,
  'BMD': 1.136557,
  'BND': 1.49382,
  'BOB': 7.85337,
  'BRL': 6.467468,
  'BSD': 1.136522,
  'BTC': 1.196516e-05,
  'BTN': 97.016687,
  'BWP': 15.665706,
  'BYN': 3.719087,
  'BYR': 22276.513823,
  'BZD': 2.282952,
  'CAD': 1.57828,
  'CDF': 3269.874405,
  'CHF': 0.941226,
  'CLF': 0.027691,
  'CLP': 1062.624256,
  'CNY': 8.28289,
  'CNH': 8.284175,
  'COP': 4799.622655,
  'CRC': 575.266582,
  'CUC': 1.136557,
  'CUP': 30.118756,
  'CVE': 110.591405,
  'CZK': 24.979364,
  'DJF': 201.989327,
  'DKK': 7.466274,
  'DOP': 67.114127,
  'DZD': 150.469941,
  'EGP': 57.668094,
  'ERN': 17.04

## Retrieve Data

Use `[ ]` operator and correct nested key to retrieve HKD rate from the exchange dictionary `ex_data`.  
In this case, we need to use two keys in nested way and they are `rates` and `HKD`

In [214]:
hkd_rate = ex_data['rates']['HKD']
hkd_rate

8.818715

## Use the live exchange rate for sales amount calculation

Assume we have list of `product_retail_price` in EURO currency and we won't be change to HKD so that user from HKD region can have an instant referencing while buying.  

First we need to import numpy
```
import numpy as np
```

In [217]:
import numpy as np

**Let's check numpy's version**  
`np.__version__`

In [220]:
np.__version__

'1.26.4'

In [222]:
product_retail_price_euro = np.array([100.0, 200.0, 300.0, 400.0, 500.0, 600.0])
product_retail_price_euro

array([100., 200., 300., 400., 500., 600.])

In [224]:
type(product_retail_price_euro)

numpy.ndarray

In [226]:
product_retail_price_euro = product_retail_price_euro * hkd_rate
product_retail_price_euro

array([ 881.8715, 1763.743 , 2645.6145, 3527.486 , 4409.3575, 5291.229 ])

# EXERCISE 3: READING HONG KONG WEATHER FORECAST API 
In this exercise, we will explore the open weather api from Hong Kong Observatory Station. This exercise weighs 10 marks.

Go to Moodel and download the exercise description to understand the requirements.

Follow instructor's guidance to complete this exercise.  

Or, if you are confident enough, you can try to independantly finish the exercise by referring the syntax/codes from this chapter's notebook.