<h1 style = "text-align: center; ">Descriptive Title</h1>
<h2 style = "text-align: center; ">ST445 - Managing and Visualizing Data</h2>
<h3 style = "text-align: center; ">Candidate IDs: 38682, XXXXX, YYYYY</h3>


### I. Notebook preparation (maybe this section is not needed)

Perhaps we include something similar to this example from "Example 2"

[[Before running this notebook, please make sure you have all necessary modules installed in your environment. Potentially less common modules used include:

google.cloud
dotenv
networkx
geopandas
praw
transformers
plotly.graph_objects
ipywidgets
folium
As usual, they can be installed by running the command pip install [module] in the terminal.

Furthermore, please make sure your Python version is compatible with all the modules. While writing this, it became apparent there might be some compatibility issues with newer Python versions (especially 3.11 and newer). In case you run into any issues, it might be worth trying to run the code with an older version such as Python 3.9.]]

Our complete GitHub repository can be found at the following location: https://github.com/lse-st445/2024-project-data-knows-ball [[Should we put this in the title of our paper??]]

In [33]:
# Import relevant packages
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Install lxml with conda install anaconda::lxml to use HMTL and XML with Python


### II. Introduction and data description

[[Describe our data sets and pose our research question]]

[[Maybe include data dictionaries of some sort similar to Table 1.3.1 and Table 1.3.2 in "Example 2"]]

### III. Data acquisition

#### III.i. Marketcheck UK API

In [5]:
# API 



#### III.ii. Webscrapping UK Office of National Statistics (ONS)

In [34]:
# Write function for webscrapping data from the UK Office of National Statistics
def webscrape_ONS(url):
    '''
    This function webscrapes various tables from the UK ONS and seperates the data 
    into distinct dataframes based on the given periodicity: year, quarter, or month.
    ----------
    Args:
        url: The UK Office of National Statistics url from which to webscrabe the table
    ----------
    Returns:        
        ons_year_df: Dataframe of UK ONS data at the yearly level
        ons_quarter_df: Dataframe of UK ONS data at the quarterly level
        ons_month_df: Dataframe of UK ONS data at the monthly level
    '''
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "lxml")

    # Save the table headers to later set as column names for the dataframes
    table_headers = soup.find_all("th")
    table_headers = table_headers[0:2] # We only need the first two columns of data from the ONS
    table_headers = [t.text for t in table_headers]

    ons_data = []

    # Identify and append all webscrapped rows of the ONS table into a dataframe
    for i, row in enumerate(soup.find_all("tr")[2:]): # The frist two rows of ONS tables are headers
        try:
            period, value = row.find_all("td")[0:2] # We only need the first two columns of data from the ONS
            ons_data.append([period.text, value.text])
        except:
            print("Error parsing row #{}".format(i))

    ons_df = pd.DataFrame(ons_data, columns = table_headers)

    # Split the data into separate dataframes based on periodicity (year/quarter/month)
    ons_year_df = ons_df[ons_df["Period"].str.len() == 4].reset_index(drop = True) # Year periods will have 4 characters (e.g., "2020")
    ons_quarter_df = ons_df[ons_df["Period"].str.len() == 7].reset_index(drop = True) # Quarter periods will have 7 characters (e.g., "2020 Q1")
    ons_month_df = ons_df[ons_df["Period"].str.len() == 8].reset_index(drop = True) # Month periods will have 8 characters (e.g., "2020 JAN")
    
    # Ensure that all rows present in the original ONS table are present in the three dataframes split based on periodicity
    split_df_len = sum([len(ons_year_df), len(ons_quarter_df), len(ons_month_df)])
    orig_df_len = len(ons_data)
    assert split_df_len == orig_df_len, "ERROR: Not all rows from original ONS table present in corresponding year/quarter/month dataframes"

    return ons_year_df, ons_quarter_df, ons_month_df


In [35]:
# Webscrape UK unemployment and CPIH data tables from the ONS
url_uk_unemp = "https://www.ons.gov.uk/employmentandlabourmarket/peoplenotinwork/unemployment/timeseries/mgsx/lms"
url_uk_cpih = "https://www.ons.gov.uk/economy/inflationandpriceindices/timeseries/l55o/mm23"

uk_unemp_year_df, uk_unemp_quarter_df, uk_unemp_month_df = webscrape_ONS(url_uk_unemp)
uk_cpih_year_df, uk_cpih_quarter_df, uk_cpih_month_df = webscrape_ONS(url_uk_cpih)


### IV. Data preparation

In [7]:
# Clean and merge datasets



### V. Visualizations

[[Description of what visualizations we decided to include and why]]

In [8]:
# Code for visualizations



[[Explanation/interpretation of the visualizations are depicting]]

### VI. Data modeling

### VII. Conclusion