# Bank Financial Data Project

> For this project, you will assume the role of data engineer working for an international financial analysis company. Your company tracks stock prices, commodities, forex rates, inflation rates.  Your job is to extract financial data from various sources like websites, APIs and files provided by various financial analysis firms. After you collect the data, you extract the data of interest to your company and transform it based on the requirements given to you. Once the transformation is complete you load that data into a database.


- Collect data using APIs
- Collect data using webscraping.
- Download files to process.    
- Read csv and json file types.
- Extract data from the above file types.
- Transform data.

In [108]:
import pandas as pd
import html5lib
import requests
import time
from datetime import datetime
from bs4 import BeautifulSoup

# Extract Data from API

#### https://exchangeratesapi.io/

In [93]:
api_url = 'https://api.apilayer.com/exchangerates_data/latest?base=EUR&apikey=DS9yDpqy7PQxaJkjrhGqq7lrmMFg7Tm0'

In [103]:
api_response = requests.get(api_url)
print(api_response.status_code)

200


In [105]:
api_data = api_response.json()
api_data

{'success': True,
 'timestamp': 1682699463,
 'base': 'EUR',
 'date': '2023-04-28',
 'rates': {'AED': 4.051878,
  'AFN': 95.376684,
  'ALL': 111.384466,
  'AMD': 426.60424,
  'ANG': 1.988399,
  'AOA': 562.171874,
  'ARS': 245.59213,
  'AUD': 1.66813,
  'AWG': 1.988812,
  'AZN': 1.879757,
  'BAM': 1.964449,
  'BBD': 2.227679,
  'BDT': 117.119363,
  'BGN': 1.95693,
  'BHD': 0.41599,
  'BIF': 2301.614869,
  'BMD': 1.103363,
  'BND': 1.475141,
  'BOB': 7.623738,
  'BRL': 5.518578,
  'BSD': 1.103293,
  'BTC': 3.7816747e-05,
  'BTN': 90.252404,
  'BWP': 14.612667,
  'BYN': 2.784849,
  'BYR': 21625.911524,
  'BZD': 2.223862,
  'CAD': 1.495978,
  'CDF': 2355.679247,
  'CHF': 0.984189,
  'CLF': 0.032226,
  'CLP': 889.111711,
  'CNY': 7.626667,
  'COP': 5164.466277,
  'CRC': 595.133711,
  'CUC': 1.103363,
  'CUP': 29.239115,
  'CVE': 110.750082,
  'CZK': 23.508026,
  'DJF': 196.439325,
  'DKK': 7.454165,
  'DOP': 60.188125,
  'DZD': 149.464036,
  'EGP': 34.093028,
  'ERN': 16.550442,
  'ETB': 59.

In [106]:
df_api = pd.DataFrame(api_data)
df_api.head(10)

Unnamed: 0,success,timestamp,base,date,rates
AED,True,1682699463,EUR,2023-04-28,4.051878
AFN,True,1682699463,EUR,2023-04-28,95.376684
ALL,True,1682699463,EUR,2023-04-28,111.384466
AMD,True,1682699463,EUR,2023-04-28,426.60424
ANG,True,1682699463,EUR,2023-04-28,1.988399
AOA,True,1682699463,EUR,2023-04-28,562.171874
ARS,True,1682699463,EUR,2023-04-28,245.59213
AUD,True,1682699463,EUR,2023-04-28,1.66813
AWG,True,1682699463,EUR,2023-04-28,1.988812
AZN,True,1682699463,EUR,2023-04-28,1.879757


# Extract from Webscraping 

In [111]:
bank_url = 'https://www.relbanks.com/worlds-top-banks/assets'
bank_data_response = requests.get(bank_url)

print(bank_data_response.status_code)

200


In [122]:
bank_tables = pd.read_html(bank_url)
bank_tables

[        0                                      1            2  \
 0    Rank                                   Bank      Country   
 1       1  Industrial & Commercial Bank of China        China   
 2       2           China Construction Bank Corp        China   
 3       3             Agricultural Bank of China        China   
 4       4                          Bank of China        China   
 ..    ...                                    ...          ...   
 116   116                    Qatar National Bank        Qatar   
 117   117                National Bank of Canada       Canada   
 118   118                         Suntrust Banks          USA   
 119   119                 Korea Development Bank  South Korea   
 120   120                                Belfius      Belgium   
 
                       3              4  
 0    Total assets, US$b  Balance sheet  
 1               4005.58     12/31/2017  
 2               3397.13     12/31/2017  
 3               3232.68     12/31/201

In [125]:
bank_df = pd.DataFrame(bank_tables[0])
bank_df.head()

Unnamed: 0,0,1,2,3,4
0,Rank,Bank,Country,"Total assets, US$b",Balance sheet
1,1,Industrial & Commercial Bank of China,China,4005.58,12/31/2017
2,2,China Construction Bank Corp,China,3397.13,12/31/2017
3,3,Agricultural Bank of China,China,3232.68,12/31/2017
4,4,Bank of China,China,2989.16,12/31/2017


In [129]:
bank_df.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 121 entries, 0 to 120
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       121 non-null    object
 1   1       121 non-null    object
 2   2       121 non-null    object
 3   3       121 non-null    object
 4   4       121 non-null    object
dtypes: object(5)
memory usage: 4.9+ KB


In [136]:
bank_df.columns = ['Rank', 'Bank', 'Country', 'Total_Assets', 'Balance_Sheet']
bank_df


AttributeError: 'NoneType' object has no attribute 'columns'

In [None]:
bank_df.columns