# Employer Project: the Bank of England


## Table of Contents

<h3>

[1. Introduction](#1.-Introduction)

[2. Goals](#2.-Goals)

[3. Data Collection & Preprocessing](#3.-Data-Collection-&-Preprocessing)

[4. Data Analysis](#4.-Data-Analysis)

[5. Appendix A: List of Files Needed for this Notebook](#Appendix-A:-List-of-Files-Needed-for-this-Notebook)

[6. Appendix B: List of Accompanying Notebooks](#Appendix-B:-List-of-Accompanying-Notebooks)

</h3>


## 1. Introduction

The client for this project is the Bank of England. Here is an excerpt from the project briefing, profiling the Bank:

> The Bank of England's mission is to promote the good of the people of the United Kingdom by maintaining monetary and financial stability. The Bank of England plays a multifaceted role in the national economy. Its primary objectives include maintaining price stability and supporting the government’s economic policies. To achieve this, the Bank has control over monetary policy instruments, primarily the setting of interest rates. By altering interest rates, the Bank can influence borrowing costs for businesses and individuals, which in turn affects spending, investment, and inflation. 

More information about the Bank of England can be found [here.](https://www.bankofengland.co.uk/about)


The scenario for this project, as outlined in the briefing, is as follows: 

> Part of the job of the Bank of England is to provide reassurance and stability to financial markets. One way this is achieved is through representatives of the Bank delivering speeches at various public events. As an organisation, the Bank of England is interested in how the trends in these speeches correlate with observed events and economic indicators, as well as how the sentiment of these speeches can be used to predict market behaviour. This analysis will inform our understanding of the impact of the Bank’s communications on the economy, as well as the predictive power of using this data set.

### 1.1 Business Questions

To provide insight into the above, The Bank of England's Data Strategy & Implementation Division needs answers to the following questions:

1. Has the sentiment of central bank speeches changed over time? If so, how has it changed?
<br>
<br>
2. How does the sentiment of the Bank of England’s speeches correlate with key events such as:
    * bank rate decisions (including direction/magnitude of the change)
    * publication of the Monetary Policy Report
    * publication of the  Financial Stability Report/Review
    * any other events or trends that may be relevant or interesting?  
<br>
<br>
3. How does the sentiment of speeches correlate with key economic indicators of the UK, such as:
    * GDP growth
    * inflation
    * labour market statistics (e.g. unemployment and wages)
    * any other economic indicators that may be relevant or interesting.
<br>   
<br>
4. Do these speeches have any predictive power to assist in predicting market behaviour?
<br>
<br>
5.  Are there other insights or findings from the analysis that may be of interest to the organisation?
<br>
<br>
6.  What are the potential reasons for any of the correlations discovered above? How have you drawn these conclusions?


## 2. Goals


**1. To submit a 'Project Scope and Plan' by 18th March:**
* A 1,000-word overview and project plan covering project roles, roadmap, objectives, communications plans, work agreements, as well as a refined problem statement and draft project scope
<br>
<br>

**2. To submit an 'Initial Recommendation Pitch' by 15th April:**

* Presentation deck (pdf) covering background/context, summary of analysis and visualisation approach, data-informed recommendations and conclusion
* Presentation recording (5-10 mins) (MP4)
<br>
<br>

**3. To submit a 'Final Report and Presentation' by 22nd April:**

* Pdf Report (1,500 words (+/- 10%) describing the problem, approach, insights identified; recommendations; all aspects of presentation
* Code (file or link), submitted as private GitHub repo, Jupyter Notebook or RMarkdown (must be reproducible) 
* Presentation deck (pdf): a summary of the process followed by the group, visualised data story justifiying recommendations
* Live presentation (10-15 mins) (date TBC)
<br>
<br>

**4. To submit an 'Individual Reflection' by 22nd April:** 

* Pdf document (500 words, +/- 10%) covering reflections on how effectively the group worked together, what challenging situations we encountered and how we responded to them, what contributions we made as individuals and what was the most useful feedback we received that we can use for group projects moving forward

## 3. Data Collection & Preprocessing


### 3.1 Getting the Data


#### Data for Natural Language Processing

* `all_speeches.csv`: this is a [publically-available Kaggle dataset](https://www.kaggle.com/datasets/davidgauthier/central-bank-speeches/data) comprised of a corpus of speeches from senior central bankers of various influential central banks. This corpus covers the period from 1997 until 2022 and was provided to the team as part of the project briefing
<br>
<br>
* `LSE_DA_BoE_Employer_project_Sentiment-labelled_wordlist.xlsx`: a list of words labelled with sentiment, provided along with the project brief
<br>
<br>
* `scraped_speeches.csv`: the most recent governor and deputy governor speeches scraped from the [Bank of England web pages.]( https://www.bankofengland.co.uk/news/speeches ) by the project team. For the code used for scraping and preprocessing of this data set, please see accompanying workbook `scraping_speeches.ippynb`


#### Bank of England Data

* `mpcvoting.xlsx`: a record of voting decisions by the Bank of England in relation to Bank Rate (6th Jun 1997-1st Feb 2024), Stock of Government Bond Purchases (4th Aug 2016-21st Sep 2023), Stock of Corporate Bond Purchases (4th Aug 2016-3rd Feb 2022) and Asset Purchase Decisions (5th Mar 2009-4th Aug 2016).  Each of the tabs was cleaned and preprocessed- see section 3.2.3 below
<br>
<br>
* Links were also provided to publicly-available Bank of England reports: the ['Monetary Policy Reports'](https://www.bankofengland.co.uk/monetary-policy-report/monetary-policy-report)  and ['Financial Stability Reports'](https://www.bankofengland.co.uk/financial-stability-report/financial-stability-reports). Publication dates for these were scraped from the Bank of England web pages.The code for this is provided in a separate Notebook (`monetary_policy_reports_beautifulsoup.ipynb` and `financial_stability_reports_beautifulsoup.ipynb`) 


#### Additional Office for National Statistics (OfS) Data

* Links to publicly-available data-sets related to [GDP Growth](https://www.ons.gov.uk/economy/grossdomesticproductgdp), [Inflation and Price Indices](https://www.ons.gov.uk/economy/inflationandpriceindices) and [Labour Market Statistics](https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/bulletins/uklabourmarket/previousReleases) were provided as part of the briefing. The team chose to focus on the following: 
  * NOMIS Economic Activity Data
  * ONS Vacancies Data
  * ONS GDP Data
  * ONS CPI and CPIH Monthly Indices
  * ONS Average Weekly Earnings Data <div class="alert alert-block alert-info">
<b>@Alison:</b> Could you provide me with the links to where you downloaded each of these datsets from please and I will add them here, thank you!
</div>
<br>

### 3.2 Preprocessing the Data

### 3.2.1 Preprocessing of Sentiment Analysis Data

The following steps were followed:

* `all_speeches.csv` and `scraped_speeches.csv` were combined into one dataset
* This dataset was cleaned
* The following sentiment analysis was carried out: 
  * Vader
  * TextBlob
  * mplementation of lexicon-based classifier based on the provided Loughran-McDonald word list
  * Roberta model
  
The outputs from this analysis was combined into the dataset: `uk_speeches_sentiments_processed_3.0.csv`. This preprocessed dataset is presented below. 
  
Please see separate Notebooks containing the code for the above: 

<div class="alert alert-block alert-success">
<b>Val:</b> Which is your notebook(s) do you think we should attach with the submission? I will add to the list in Appendix B
</div>

The metadata for this dataframe is as follows:

| **Column**             | **Description**                                                                                                                                |
|:--- |:--- |
| reference                 | Speech reference number  |
| country                    | Country where speech was made                                                                                                                  |
| date      | Date of speech                 |
| title | Title of speech |
| author         | Who wrote the speech     |
| is_gov             | 1= The person making the speech was a governor; 0= was not a governor                                                                                            |
| text               | The text of the speech          |
| string_len              | Number of characters in speech|
| formatted_text               | Formatted text- check with Val how |
| vader_neg | Vader negative sentiment score |
| vader_neu | Vader neutral sentiment score  |
| vader_pos | Vader positive sentiment score  |
| vader_compound | Vader compound sentiment score  |
| textblob_polarity | TextBlob polarity score  |
| textblob_subjectivity | Vader neutral sentiment score  |
| lm_num_filtered_tokens | Filtered tokens for lm classifier.
  |
| lm_num_negative |  The columns prefixed by 'lm_num' contain the count of tokens per sentiment in the speech.This is the Lm negative sentiment score. Negative = words with bad connotations (e.g. "indict", "abandon", "default")  |
| lm_negative_ratio | The columns suffixed by 'ratio' contain the count of tokens per sentiment in the speech divided by the total number of tokens in the speech. This is the Lm negative ratio  |
| lm_num_positive | Lm positive sentiment score. Positive = words with good connotations (e.g. "best", "accomplish", "innovativeness") |
| lm_positive_ratio | Lm positive ratio  |
| lm_num_uncertainty | Lm uncertainty score. Uncertainity = words indicating imprecision (e.g. "approximate", "almost", "contingency")
  |
| lm_uncertainty_ratio | Lm uncertainty ratio  |
| lm_num_litigious | Lm litigious score. Litigious = litigation-related words (e.g. "claimant", "tort", "absolves")
  |
| lm_litigious_ratio | Lm litigious ratio  |
| lm_num_strong | Lm strong sentiment score. Strong modal = words expressing certainty of an action (e.g. "always", "definitely", "never")
  |
| lm_strong_ratio | Lm stregnth ratio  |
| lm_num_weak | Lm weak sentiment score. Weak modal = words expressing uncertainty of an action (e.g. "almost", "could", "might")
  |
| lm_weak_ratio | Lm weak ratio  |
| lm_num_constraining | Lm constraining sentiment score. Constraining = words related to constraints (e.g. "required", "obligations", "commit")  |
| lm_constraining_ratio | Lm constraining ratio  |
| lm_polarity | The lm_polarity column computes (lm_num_positive - lm_num_negative) / (lm_num_positive + lm_num_negative) |
| lm_subjectivity | The lm_subjectivity column computes (lm_num_positive + lm_num_negative) / lm_num_filtered_tokens  |
| dovish-hawkish-polarity| The dovish-hawkish-polarity column computes per speech: (sum Dovish - sum Hawkish) / (sum Dovish + sum Hawkish)  |
| dovish-hawkish-subjectivity | The dovish-hawkish-subjectity column computes per speech: (sum Dovish + sum Hawkish) / (sum Dovish + sum Hawkish + sum Neutral)
 |


In [2]:
# Importing libraries that we will use

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from datetime import datetime

<div class="alert alert-block alert-danger">
<b>Please note:</b> the imported file below is still version 1.0; we need to replace this with version 3.0 when Val has finalised this (code will not currently run as intended)
</div>

In [None]:
uk_speeches=pd.read_csv('uk_speeches_sentiments_processed_V1.0.csv')

In [None]:
uk_speeches.head()

### 3.2.2 Selecting Which Sentiment Scores to Use to Answer Questions

We were intereseted in exploring which of the sentiment score would be best to use for exploring their relatioship with the economic and financial indicators. 

First, we computed summary scores per classifier: polarity multiplied by subjectivity

For example, if a speech is for the most part neutral (80% neutral, 20% subjective), but the part that is subjective is 100% postive (or dovish), we have:

* polarity = 1 (because 100% of the subjective part is positive)
* subjectivity = 0.2 (only 20% of the speech is either positive or negative)
* summary = 1 multiplied by 0.2 = 0.2    

<div class="alert alert-block alert-danger">
<b>Please note:</b> this code is taken from Notebook 'Val_Exploratory_Data_Analysis_V1.0': it won't work here yet as that Notebook was based on 'uk_speeches_sentiments_processed_V3.0.csv' which I dont have yet
</div>



In [None]:
df_sp = pd.read_csv('uk_speeches_sentiments_processed_V3.0.csv')

In [None]:
df_sp['vader_summary'] = df_sp.apply(lambda x: (x['vader_polarity'] * x['vader_subjectivity']), axis=1)

In [None]:
df_sp['textblob_summary'] = df_sp.apply(lambda x: x['textblob_polarity'] * x['textblob_subjectivity'] , axis=1)

In [None]:
df_sp['dovish-hawkish_summary'] = df_sp.apply(lambda x: x['dovish-hawkish-polarity'] * x['dovish-hawkish-subjectivity'] , axis=1)

In [None]:
df_sp['lm_summary'] = df_sp.apply(lambda x: x['lm_polarity'] * x['lm_subjectivity'] , axis=1)

In [None]:
#check for null values:
df_sp.isna().sum()

Sense check correlations between summaries of the various classifiers

In [None]:
# retrieve list of column names with 'summary' in the name
col_summaries = [c for c in df_sp.columns if 'summary' in c]
print (col_summaries)
collist = col_summaries

In [None]:
df_sp_small = df_sp[collist]

In [None]:
# plotting correlation heatmap 
dataplot = sb.heatmap(df_sp_small.corr(), cmap="YlGnBu", annot=True) 

# displaying heatmap 
mp.show() 

In [None]:
# function to easily review correlations
def get_top_correlations_blog(df, threshold=0.4):
    """
    df: the dataframe to get correlations from
    threshold: the maximum and minimum value to include for correlations. For eg, if this is 0.4, only pairs haveing a correlation coefficient greater than 0.4 or less than -0.4 will be included in the results. 
    """
    orig_corr = df.corr()
    c = orig_corr.abs()

    so = c.unstack()

    print("|    Variable 1    |    Variable 2    | Correlation Coefficient    |")
    print("|------------------|------------------|----------------------------|")
    
    i=0
    pairs=set()
    result = pd.DataFrame()
    for index, value in so.sort_values(ascending=False).iteritems():
        # Exclude duplicates and self-correlations
        if value > threshold \
        and index[0] != index[1] \
        and (index[0], index[1]) not in pairs \
        and (index[1], index[0]) not in pairs:
            
            print(f'|    {index[0]}    |    {index[1]}    |    {orig_corr.loc[(index[0], index[1])]}    |')
            result.loc[i, ['Variable 1', 'Variable 2', 'Correlation Coefficient']] = [index[0], index[1], orig_corr.loc[(index[0], index[1])]]
            pairs.add((index[0], index[1]))
            i+=1
    return result.reset_index(drop=True).set_index(['Variable 1', 'Variable 2'])

In [None]:
get_top_correlations_blog(df_sp_small, 0.3)

In [None]:
for c in collist:
    plt.hist(df_sp[c])
    plt.title(c)
    plt.show()

Findings on summary scores comparisons of the various classifiers:
* Vader and LM are highly correlated
* Vader and TextBlob are somewhat correlated
* TextBlob and LM are somewhat correlated
* Hawkish-dovish is not correlated to any of the other classifiers
* Various distribution shapes for all classifiers

Sense check correlations between polarities of the various classifiers

In [None]:
# retrieve list of column names with 'polarity' in the name
col_polarities = [c for c in df_sp.columns if 'polarity' in c]
print (col_polarities)
collist = col_polarities

df_sp_small = df_sp[collist]

In [None]:
# plotting correlation heatmap 
dataplot = sb.heatmap(df_sp_small.corr(), cmap="YlGnBu", annot=True) 

# displaying heatmap 
mp.show()

In [None]:
get_top_correlations_blog(df_sp_small, 0.3)

In [None]:
for c in collist:
    plt.hist(df_sp[c])
    plt.title(c)
    plt.show()

Findings on polarity scores comparisons of the various classifiers
* Vader and LM are strongly correlated
* Vader and TextBlob are somewhat correlated
* TextBlob and LM are somewhat correlated
* Hawkish-dovish is not correlated to any other classifier
* Hawkish-dovish has an unusual distribution
* Modal value for Hawkish-dovish is 'just slightly dovish' (dovish = positive, hawkish = negative)
* TextBlob hardly has any negative sentiment

### Sense checking correlations between subjectivity of the various classifiers

In [None]:
# retrieve list of column names with 'subjectivity' in the name
col_subjectivities = [c for c in df_sp.columns if 'subjectivity' in c]
print (col_subjectivities)
collist = col_subjectivities

In [None]:
df_sp_small = df_sp[collist]

In [None]:
# plotting correlation heatmap 
dataplot = sb.heatmap(df_sp_small.corr(), cmap="YlGnBu", annot=True) 

# displaying heatmap 
mp.show()

In [None]:
get_top_correlations_blog(df_sp_small, 0.3)

In [None]:
for c in collist:
    plt.hist(df_sp[c])
    plt.title(c)
    plt.show()

Findings on polarity scores comparisons of the various classifiers
* Vader and LM are somewhat correlated
* Hawkish-dovish distribution is not normal
* Modal value for Hawkish-dovish is 'fairly neutral' (subjectivity close to 0)

### Conclusion
* Vader, LM and TextBlob classifier scores reflect the tone of the speech (irrespective of the monetary stance)
* Hawkish-dovish classifer scores reflect the monetary stance: from extremely dovish (1) to extremely hawkish (-1)

The team oncluded that the following two scores should be used to correlate with economic indicators:
* lm_summary
* dovish-hawkish_summary

### 3.2.3 Preprocessing of Bank of England Data

In brief, the following steps were taken:

* The 'Bank Rates Decisions' tab from `mpcvoting.xlsx` was imported and cleaned; voting intentions were calculated for each member voting; and ratio of 'Hawkish to Dovish' was calculated; and then a further column added calculating strength of the decision
*  The ''Stock of govt. bond purchases' tab from `mpcvoting.xlsx` was imported, cleaned and voting intentions by date retained
*  The 'Stock of corp. bond purchases' tab was imported, cleaned and voting intentions by date retained
* Asset purchase decision from 'Asset Purchase Decisions' tab were imported and processed 
* All of these dateframes were then combined into one dataframe, `MPC_Processed.csv`


<div class="alert alert-block alert-info">
<b>Alison:</b> Could you check on the above (I'm note sure if I've summarised it very well, thanks)?
</div>

Please see the Notebook `BoE_MPCVoting_Preprocessing.ipynb` for all of the code for this preprocessing. 

Here is the metadata for this dataframe:
<div class="alert alert-block alert-info">
<b>Alison:</b> There are a couple of gaps in the below, would you minding adding them, thank you!
</div>

| **Column**             | **Description**                                                                                                                                |
|:--- |:--- |
| MPC_MeetingDate                | The meeting date  |
| MPC_PreviousRate | Rate set at previous meeting                                                                                                                 |
| MPC_RateDecided | Rate set at the meeting                 |
| MPC_RateDecision| Overall decision (as string): 'Increase', 'Stay' or 'Decrease' |
| MPC_RateChange| Computed column showing difference from previous rate set|
| MPC_VotedIncrease|Calculated column based on number of members who voted to increase the rate |
| MPC_VotedStay | Calculated column based on number of members who voted to keep the rate the same    |
|MPC_VotedDecrease  | Calculated column based on number of members who voted to decrease the rate|
| MPC_PropVotedIncRate  | For each meeting proportion who voted to increase the rate. 1 = all voted increase, 0.5 = half voted increase, 0 = none vote increase|
| MPC_PropVotedDecRate | For each meeting proportion who voted to decrease the rate. 1 = all voted decrease, 0.5 = half voted decrease, 0 = none vote decrease |
|MPC_PropVotedStayRate | For each meeting proportion who voted to keep the rate the same ('stay'). 1 = all voted stay, 0.5 = half voted stay, 0 = none vote stay  |
| MPC_DecisionStrength | Returns the relevant proportion so if decision is to increase returns PropVotedIncRate |
| MPC_StockGovtBond| Description here |
| MPC_StockCorpBond| Description here |
| MPC_BondStock | Total Asset Purchases financed with central bank reserves (£bn)  |
| MPC_PreviousBondStock |Total Asset Purchases financed with central bank reserves  at previous meeting (£bn)
  |
| MPC_BondStock_Change |  Difference in total from previous meeting (£bn)  |
| MPC_QEDec | Description here  |


In [3]:
mpc=pd.read_csv('MPC_Processed.csv')

In [6]:
mpc.head()

Unnamed: 0,MPC_MeetingDate,MPC_PreviousRate,MPC_RateDecided,MPC_RateDecision,MPC_RateChange,MPC_VotedIncrease,MPC_VotedStay,MPC_VotedDecrease,MPC_PropVotedIncRate,MPC_PropVotedDecRate,MPC_PropVotedStayRate,MPC_DecisionStrength,MPC_StockGovtBond,MPC_StockCorpBond,MPC_BondStock,MPC_PreviousBondStock,MPC_BondStock_Change,MPC_QEDec
0,1997-06-06,0.0625,0.065,Increase,0.0025,6.0,0.0,0.0,1.0,0.0,0.0,1.0,,,,,,
1,1997-07-10,0.065,0.0675,Increase,0.0025,6.0,0.0,0.0,1.0,0.0,0.0,1.0,,,,,,
2,1997-08-07,0.0675,0.07,Increase,0.0025,5.0,0.0,0.0,1.0,0.0,0.0,1.0,,,,,,
3,1997-09-11,0.07,0.07,Stay,0.0,0.0,7.0,0.0,0.0,0.0,1.0,1.0,,,,,,
4,1997-10-09,0.07,0.07,Stay,0.0,0.0,7.0,0.0,0.0,0.0,1.0,1.0,,,,,,


### 3.2.4 Preprocessing ONS and Other Economic Indicators 

In brief, the following steps were taken:

* The `NOMIS_Economic activity_Raw.xlsx` and `ONS_Vacancies_Raw.xlsx` were cleaned and preprocessed. As both of these datasets were for quarterly reporting periods, they were combined by data into the dataframe, `EcoQ_Processed.csv`, which is presented below
<br>
<br>
* The `ONS_GDP_Raw.xlsx`, `ONS_CPI and CPIH_Monthly indices_Raw.xlsx`, `ONS_CPI and CPIH_Monthly indices_Raw.xlsx` and `ONS_Real AWE_Monthly_Raw.xlsx`. As these datasets were all reporting on monthly periods, they were combined into the dataframe ''EcoM_Processed.csv', which is presented below. 

Please see the Notebook `Economic_indicators_Preprocessing.ipynb` for all of the code for this preprocessing.

Here is the metadata for these dataframes, first `EcoQ_Processed`:

Here is the metadata for this dataframe:
<div class="alert alert-block alert-info">
<b>Alison:</b> Could you kindly add the Descriptions here and to the one below? Thank you!
</div>

| **Column**             | **Description**                                                                                                                                |
|:--- |:--- |
| 3mths_ending	               | Description here  |
| UnemploymentRate | Description here                                                                                                                  |
| Vacancies(000s) | Description here                 |
| Unemployed(000s)| Description here |
| UnEmp/Vacancy| Description here    |


Next, `EcoM_Processed`:

| **Column**             | **Description**                                                                                                                                |
|:--- |:--- |
| MonthRefers	               | Description here  |
| GDP | Description here                                                                                                                  |
| CPIH | Description here                 |
| CPI| Description here |
| AWE_Real_2015| Description here    |


In [None]:
EcoQ_Processed = pd.read_csv('EcoQ_Processed.csv')

In [None]:
EcoQ_Processed.head()

In [None]:
EcoM_Processed = pd.read_csv('EcoM_Processed.csv')

In [None]:
EcoM_Processed.head()

## 4. Data Analysis

In this section, we carry out the analysis to answer each of the business questions in turn, using the above joined datasets.

### 4.1 Analysis: Has the sentiment of central bank speeches changed over time? If so, how has it changed?


### 4.2 Analysis: How does the sentiment of the Bank of England’s speeches correlate with key events such as:
* bank rate decisions (including direction/magnitude of the change)
* publication of the Monetary Policy Report
* publication of the Financial Stability Report/Review
* any other events or trends that may be relevant or interesting?

### 4.3 How does the sentiment of speeches correlate with key economic indicators of the UK, such as:
* GDP growth
* inflation
* labour market statistics (e.g. unemployment and wages)
* any other economic indicators that may be relevant or interesting.

### 4.4 Do these speeches have any predictive power to assist in predicting market behaviour?

### 4.5 Are there other insights or findings from the analysis that may be of interest to the organisation?

### 4.6 What are the potential reasons for any of the correlations discovered above? How have you drawn these conclusions?

## Appendix A: List of Files Needed for this Notebook

These are the files that are needed to run this Notebook:

* `uk_speeches_sentiments_processed_V1.0.csv`
* `MPC_Processed.csv`
* `EcoQ_Processed.csv`
* `EcoM_Processed.csv`

## Appendix B: List of Accompanying Notebooks

* `scraping_speeches.ippynb`
* `monetary_policy_reports_beautifulsoup.ipynb`
* `financial_stability_reports_beautifulsoup.ipynb`
* `(Insert list of NLP preprocessing notebooks once confirmed by Val)`
* `BoE_MPCVoting_Preprocessing.ipynb`
* `Economic_indicators_Preprocessing.ipynb` 

(I will add a note explaining what each one is once we have a complete list)