# Xetra Data Snapshot

Loading some sample data from an S3 bucket to a Pandas dataframe to get a brieflook at the data. Each object in the bucket represents an hour in a day, and the object's key contains the date and time. The data in the object is in CSV format, detailing company name, stock type, minimum and maximum price, ETC.

---

## Imports


In [1]:
import boto3
import pandas as pd
from io import StringIO

## Getting S3 Data

In [2]:
bucket_name = 'deutsche-boerse-xetra-pds'
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)

In [3]:
sample_date = '2022-04-20'
sample_obj = bucket.objects.filter(Prefix=sample_date)
objects = [obj for obj in sample_obj]

objects[:5]

[s3.ObjectSummary(bucket_name='deutsche-boerse-xetra-pds', key='2022-04-20/2022-04-20_BINS_XETR00.csv'),
 s3.ObjectSummary(bucket_name='deutsche-boerse-xetra-pds', key='2022-04-20/2022-04-20_BINS_XETR01.csv'),
 s3.ObjectSummary(bucket_name='deutsche-boerse-xetra-pds', key='2022-04-20/2022-04-20_BINS_XETR02.csv'),
 s3.ObjectSummary(bucket_name='deutsche-boerse-xetra-pds', key='2022-04-20/2022-04-20_BINS_XETR03.csv'),
 s3.ObjectSummary(bucket_name='deutsche-boerse-xetra-pds', key='2022-04-20/2022-04-20_BINS_XETR04.csv')]

In [4]:
# Cycle through object keys to find a non-empty object
key_stem = '2022-04-20/2022-04-20_BINS_XETR'

# Change both digits of the key endings using nested loops
# First digit (0-2)
for i in range(3):

    # Second digit (0-9)
    for j in range(10):
        if i == 2 and j == 4:
            break

        key_ending = f"{i}{j}.csv"
        key = key_stem + key_ending

        # Find and list objects with data (len > 136)
        obj = bucket.Object(key=key).get().get('Body').read().decode('utf-8')
        obj_len = len(obj)

        if obj_len > 136:
            hour = key[-6:-4]
            print(f"{key} | {hour}:00 | {obj_len}")


2022-04-20/2022-04-20_BINS_XETR07.csv | 07:00 | 1610454
2022-04-20/2022-04-20_BINS_XETR08.csv | 08:00 | 1171065
2022-04-20/2022-04-20_BINS_XETR09.csv | 09:00 | 1263221
2022-04-20/2022-04-20_BINS_XETR10.csv | 10:00 | 1096608
2022-04-20/2022-04-20_BINS_XETR11.csv | 11:00 | 1304595
2022-04-20/2022-04-20_BINS_XETR12.csv | 12:00 | 1022193
2022-04-20/2022-04-20_BINS_XETR13.csv | 13:00 | 1414215
2022-04-20/2022-04-20_BINS_XETR14.csv | 14:00 | 1497436
2022-04-20/2022-04-20_BINS_XETR15.csv | 15:00 | 1271399
2022-04-20/2022-04-20_BINS_XETR19.csv | 19:00 | 876


## Load Sample Data From S3 Bucket to Pandas DataFrame

In [5]:
# Select random object to inspect in Pandas
sample_csv = bucket.Object(key=key_stem + '10.csv').get().get('Body').read().decode('utf-8')
data = StringIO(sample_csv)
sample_df = pd.read_csv(data, delimiter=',')

sample_df

Unnamed: 0,ISIN,Mnemonic,SecurityDesc,SecurityType,Currency,SecurityID,Date,Time,StartPrice,MaxPrice,MinPrice,EndPrice,TradedVolume,NumberOfTrades
0,AT0000A0E9W5,SANT,S+T AG O.N.,Common stock,EUR,2504159,2022-04-20,10:00,16.710,16.710,16.71,16.71,729,3
1,AT0000969985,AUS,AT+S AUSTR.T.+SYSTEMT.,Common stock,EUR,2504191,2022-04-20,10:00,51.100,51.100,51.10,51.10,100,1
2,CA0679011084,ABR,BARRICK GOLD CORP.,Common stock,EUR,2504196,2022-04-20,10:00,22.920,22.930,22.92,22.93,411,4
3,LU0274211480,DBXD,XTR.DAX 1C,ETF,EUR,2504269,2022-04-20,10:00,136.800,136.800,136.80,136.80,51,1
4,DE000A0DJ6J9,S92,SMA SOLAR TECHNOL.AG,Common stock,EUR,2504287,2022-04-20,10:00,43.460,43.460,43.40,43.44,380,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10031,DE000A3CQ7F4,BIKE,BIKE24 HLDG O.N.,Common stock,EUR,6560091,2022-04-20,10:59,8.570,8.570,8.57,8.57,364,5
10032,FR0004056851,AYJ,"VALNEVA SE EO -,15",Common stock,EUR,6769323,2022-04-20,10:59,15.820,15.820,15.82,15.82,133,1
10033,AT0000BAWAG2,0B2,BAWAG GROUP AG,Common stock,EUR,7026001,2022-04-20,10:59,46.700,46.700,46.70,46.70,42,1
10034,DE000DTR0CK8,DTG,DAIMLER TRUCK HLDG JGE NA,Common stock,EUR,7126155,2022-04-20,10:59,25.325,25.325,25.31,25.31,1326,5
