# Getting Data from the Web
The internet is full of useful (as well as useless) information, so sometimes it might be very helpful to get data from it and process it locally.

There are different ways to get data from the web, the most used are:
- **Web Scraping**: download the html content from the page and then look from it to extract information
- **Using Web API**: APIs ([Application Programming Interface](https://en.wikipedia.org/wiki/API)) is code that is meant to be called from other code instead of displayed visually to a user. We'll be looking at the most common type: [REST API](https://en.wikipedia.org/wiki/Representational_state_transfer)

Often Web API need authentication, but ofter you can get a API key after a quick free signup.
[Here's a non-exaustive list](https://github.com/public-apis/public-apis) of open APIs.

## External Libraries
We'll be using these two external libraries:
- [requests](https://requests.readthedocs.io/en/master/): a more user-friendly alternative to the built-in library `urllib.request`
- [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) a html parser, which is a type of software that builds a data structure from given inputs (usually of text kind).

which you can install by runnin the cell below

In [None]:
!pip install requests 
!pip install beautifulsoup4

In [None]:
# let's import from them to check they were installed correctly
import requests
from bs4 import BeautifulSoup

## Web Scraping

In [None]:
r = requests.get("https://www.metaweather.com/21125/")
r

In [None]:
html_doc = r.text
html_doc[:200]

if you want you can try render the html within the notebook by using
```python
from IPython.core.display import HTML
HTML(r.text)
```

We could try an do string manipulation (or using [regular expression syntax](https://www.w3schools.com/python/python_regex.asp)), but using a pre-built parser is usually less painful :)

In [None]:
soup = BeautifulSoup(html_doc, 'html.parser')

In [None]:
weather_data = []

for item in soup.findAll('div'):
    if item.has_attr('data-date'):
        weather_date = item['data-date']
        weather_description = item.find("span").get_text()
        
        weather_day_data = {
            "date": weather_date,
            "description": weather_description,
        }
        
        weather_data.append(weather_day_data)

In [None]:
weather_data

## Web API
APIs are usually much more stable and nicer to work with, but you usually need to read through some documentation to learn what you can do and which particular urls (called "endpoints") you need to use.

In our case, [MetaWeather API documentation](https://www.metaweather.com/api/) tells us that to get a similar output as our scraped data, we need to use the endpoint `/api/location/(woeid)/` where `woeid` is the identifier of the location we want the weather from.

In [None]:
r = requests.get("https://www.metaweather.com/api/location/21125?api=hjagr0r3hg03hrghg0g3rah0")
r

In [None]:
api_weather_data = r.json()

In [None]:
api_weather_data

# Pandas to the rescue

In [None]:
!pip install lxml

In [None]:
import pandas as pd

list_of_dfs = pd.read_html('https://en.wikipedia.org/wiki/S%26P_500')
first_df = list_of_dfs[1].iloc[:-4]  # last four columns are statistics
df = first_df.set_index('Year')

change_values_as_strings = df['Change in Index'].str.replace('−','-').str.replace('%','')  # cleaning up wierd characters
change_values_as_numbers = pd.to_numeric(change_values_as_strings)

change_values_as_numbers.plot(grid=True)

# HomeWork

For a more flexible application we could get the `woeid` from a city name or lat/lon coordinates by using other provided endpoints: `/api/location/search/?query=(query)` and `/api/location/search/?lattlong=(latt),(long)`

Create two functions:
```python
def get_woeid_from_city_name(city_name):
    ...

def get_woeid_from_latlon(lat, lon):
    ...
```

which will return the `woeid` for the given input. Then try to combine this with the previous code to produce a function that gets the weather for the next days.

In [None]:
...

### Possible solution

In [None]:
def _get_woeid_from_city_name(city_name):
    json_data = requests.get(f"https://www.metaweather.com/api/location/search/?query={city_name}").json()
    if not json_data:
        raise ValueError(f"No city found with name: {city_name}")
    first_result_woeid = json_data[0]['woeid']
    return first_result_woeid
    
def _get_woeid_from_latlon(lat, lon):
    json_data = requests.get(f"https://www.metaweather.com/api/location/search/?lattlong={lat},{lon}").json()
    if not json_data:
        raise ValueError(f"No location found with latitude {lat} and longitude {lon}")
    first_result_woeid = json_data[0]['woeid']
    return first_result_woeid

# which can be tested with
_get_woeid_from_city_name("Glasgow")
_get_woeid_from_latlon(55.864200, -4.251800)

def _get_weather_from_woeid(woeid):
    json_data = requests.get(f"https://www.metaweather.com/api/location/{woeid}").json()
    json_data['consolidated_weather']
    return json_data
    
def print_weather(city_name=None, lat=None, lon=None):
    if city_name is None:
        woeid = _get_woeid_from_latlon(lat, lon)
    else:
        woeid = _get_woeid_from_city_name(city_name)
        
    weather_data = _get_weather_from_woeid(woeid)
    
    for i in weather_data['consolidated_weather']:
        print(i['applicable_date'], i['weather_state_name'])

# which can be tested with
print_weather("Glasgow")