<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item">
<li><span><a href="#1.-Introduction-to-Web-Scraping" data-toc-modified-id="1.-Introduction-to-Web-Scraping-1">1. Introduction to Web Scraping</a></span><ul class="toc-item">
<li><span><a href="#1.1-Example:-Getting-information-about-the-International-Space-Station-(ISS)-from-http://api.open-notify.org" data-toc-modified-id="1.2-Example:-Getting-information-about-the-International-Space-Station-(ISS)-from-http://api.open-notify.org-1.2">1.2 Example: Getting information about the International Space Station (ISS) from <a href="http://api.open-notify.org" target="_blank">http://api.open-notify.org</a></a></span></li>
<li><span><a href="#1.2-Example:-Getting-information-about-countries-using-Rest-Countries-API" data-toc-modified-id="1.2-Example:-Getting-information-about-countries-using-Rest-Countries-API-1.3">1.3 Example: Getting information about countries using Rest Countries API</a></span></li></ul>
</li></ul></div>

---
# 1. Introduction to Web Scraping
---

Web scraping is the process of extracting data from websites. It can be performed manually or automated using software to download and store the data in an accessible format. 

In this notebook, we will be exploring web data access in Python using the built-in **requests** package, which allows us to make HTTP requests.

## 1.1 Example: Getting information about the International Space Station (ISS) from http://api.open-notify.org

In [None]:
# Imports
import requests

In [None]:
# Request the current location of the ISS using the requests.get() method
url = r'http://api.open-notify.org/iss-now.json'
r = requests.get(url)

In [None]:
# Check the status code of the response object
r.status_code

The response is in .json format, which looks similar to a Python dictionary. It can be converted into an actual Python dictionary using `.json()` method

Note: Some websites may also return the response in formats other than json (e.g. html or xml)

In [None]:
# Get the contents of the response as plain text
r.text

In [None]:
# Convert from json to a dictionary using .json() method
response = r.json()
response

Once converted into a dictionary, the response can be manipulated using standard indexing techniques

In [None]:
# Getting the current latitude
response['iss_position']['latitude']

The timestamp is a sequence of numbers ([Unix time format](https://en.wikipedia.org/wiki/Unix_time)). We can convert this into a readable date using the built-in **datetime** package in Python

In [None]:
# Import the datetime package
from datetime import datetime

In [None]:
# Convert the timestamp from Unix time to a more readable format
print(datetime.utcfromtimestamp(response['timestamp']))

### Concept Check <a class="tocSkip">

Print out a list of all people who are currently in space on the ISS. Use `http://api.open-notify.org/astros.json`



In [None]:
api_endpoint = r"http://api.open-notify.org/astros.json"
r = requests.get(api_endpoint)
print(f"GET request status code to {api_endpoint}: {r.status_code}")
response = r.json()
_ = [print('name:', astronaut['name']) for astronaut in response['people']]


## 1.2 Example: Getting information about countries using Rest Countries API
Rest Countries API: <https://restcountries.com>

In [None]:
# Getting the API url for a particular country
country = 'Japan'
url = rf'https://restcountries.com/v3.1/name/{country}'

In [None]:
# Do the request and check the status code
r = requests.get(url)
r.status_code

In [None]:
# Check the response
response = r.json()
print(response)

In [None]:
# Having a look at what is in the response object
# Note: The response is returned as a list of dictionaries. The above response contains only one dictionary in the list.
response[0].keys()

In [None]:
# Getting the currency name from the response
response[0]['currencies']['JPY']['name']

### Concept Check  <a class="tocSkip">

1. Print out a list of all the capital cities in Europe that begin with the letter 'L'?
2.  Print out a list of all the capital cities for countries that begin with the letter 'L'?

In [None]:
# Type your code here

# 1.
countries_endpoint = r'https://restcountries.com/v3.1/all'
r = requests.get(countries_endpoint)

In [None]:
print(f"GET request status code for {countries_endpoint}: {r.status_code}")
response = r.json()

In [None]:
continent_list = []
for country in response:
    if country['continents'] in continent_list:
        continue
    else:
        continent_list.append(country['continents'])
print(continent_list)
    # print(country['name']['common'])

In [None]:
# 2.
for country in response:
    capital_city = country.get('capital')
    if capital_city is not None:
        if capital_city[0].startswith('L'):
            print(capital_city[0])

In [None]:
for country in response:
    if country['continents'][0] == 'Europe':
        capital_city = country.get('capital')
        if capital_city is not None:
            if capital_city[0].startswith('L'):
                print(capital_city[0])
