In [2]:
"""

DATA ENGINEERING ETL PIPELINE - XETRA DATASET

1: Understanding source data.

Aim:
Write a production ready ETL pipeline using python and pandas.

Overview:
Xetra is a German stock exchange based in Frankfurt operated by Deutsche Börse Group. 
Data related to daily trading activity is stored publicly on the Amazon S3 database. 
(Update - as of July 2022 the data is no longer available. An archival S3 database will be used) 

Task:
Use jupyter notebook as a protoyping tool to extract and transform source data.
Request and extract source data from cloud based web services.
Familiarise boto3 and pandas package functions.

Below outlines the steps to be performed:
    
    1) First import the necessary libraries and functions for the project.
    2) Create variables to define the Amazon S3 cloud resource we're going to call. 
    3) Retrieve the trading data from amazon S3 bucket labelled 'xetra-1234'.
    4) Filter xetra bucket by date and store data elements within selected period as a list.
    5) Retrieve body of data from selected period and store as csv object. 
    6) Parse csv object as string data and read ito pandas dataframe.
    7) Print data frame object.
    
"""

"\n\nDATA ENGINEERING ETL PIPELINE - XETRA DATASET\n\n1: Understanding source data.\n\nAim:\nWrite a production ready ETL pipeline using python and pandas.\n\nOverview:\nXetra is a German stock exchange based in Frankfurt operated by Deutsche Börse Group. \nData related to daily trading activity is stored publicly on the Amazon S3 database. \n(Update - as of July 2022 the data is no longer available. An archival S3 database will be used) \n\nTask:\nUse jupyter notebook as a protoyping tool to extract and transform source data.\nRequest and extract source data from cloud based web services.\nFamiliarise boto3 and pandas package functions.\n\nBelow outlines the steps to be performed:\n    \n    1) First import the necessary libraries and functions for the project.\n    2) Create variables to define the Amazon S3 cloud resource we're going to call. \n    3) Retrieve the trading data from amazon S3 bucket labelled 'xetra-1234'.\n    4) Filter xetra bucket by date and store data elements wi

In [3]:
import boto3 #Amazon web service management package.
import pandas as pd #Data analysis library.
from io import StringIO # String buffer to read CSV files.

In [4]:
s3 = boto3.resource('s3') #Use the Amazon S3 cloud storage resource.
bucket = s3.Bucket('xetra-1234') #Create instance of the "xetra" data bucket.

In [5]:
bucket_obj = bucket.objects.filter(Prefix='2022-01-28/') #Filter by date and store data as "bucket_obj".
list = [obj for obj in bucket_obj]

In [6]:
#Retrieve body of data from selected period and store as csv object. 
csv_obj = bucket.Object(key='2022-01-28/2022-01-28_BINS_XETR15.csv').get().get('Body') #Read data element from list.
csv_obj = csv_obj.read().decode('utf-8') #Store into to csv object in utf-8 format.

In [7]:
data = StringIO(csv_obj) #Convert csv object to string data.
df = pd.read_csv(data, delimiter=',') #Read data into pandas data frame.

In [8]:
df #Print data frame object.

Unnamed: 0,ISIN,Mnemonic,SecurityDesc,SecurityType,Currency,SecurityID,Date,Time,StartPrice,MaxPrice,MinPrice,EndPrice,TradedVolume,NumberOfTrades
0,AT0000A0E9W5,SANT,S+T AG O.N.,Common stock,EUR,2504159,2022-01-28,15:00,16.000,16.000,15.990,15.990,371,3
1,CA0679011084,ABR,BARRICK GOLD CORP.,Common stock,EUR,2504196,2022-01-28,15:00,16.498,16.498,16.498,16.498,1000,4
2,CH0038389992,BBZA,"BB BIOTECH NAM. SF 0,20",Common stock,EUR,2504244,2022-01-28,15:00,62.550,62.550,62.550,62.550,50,1
3,CH0303692047,ED4,"EDAG ENGINEERING G.SF-,04",Common stock,EUR,2504254,2022-01-28,15:00,10.950,10.950,10.950,10.950,100,1
4,DE0005933931,EXS1,ISHS CORE DAX UC.ETF EOA,ETF,EUR,2504265,2022-01-28,15:00,128.960,128.960,128.940,128.940,202,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15552,DE000A3E5D56,FPE,FUCHS PETROLUB NA ST O.N.,Common stock,EUR,6699157,2022-01-28,15:59,29.780,29.800,29.780,29.800,69,3
15553,DE000A3E5D64,FPE3,FUCHS PETROLUB VZO NA ON,Common stock,EUR,6699158,2022-01-28,15:59,37.800,37.820,37.800,37.820,289,6
15554,DE000VTSC017,VTSC,VITESCO TECHS GRP NA O.N.,Common stock,EUR,6791383,2022-01-28,15:59,42.000,42.000,42.000,42.000,225,2
15555,DE000DTR0CK8,DTG,DAIMLER TRUCK HLDG JGE NA,Common stock,EUR,7126155,2022-01-28,15:59,31.640,31.645,31.625,31.625,1571,8
