# Daily Reporting Methods for Deutsche Boerse Xetra Dataset

A daily summary report is generated by extracting CSV data from the Deutsche Boerse public S3 bucket. The data is then wrangled, aggregated, and stored in a separate bucket in [Apache parque format](https://parquet.apache.org/).

Note: Data from the previous day is also used to ensure a significant volume of data is collected. This is in case it's the weekend or a holiday, during which the markets are closed.

---

## Imports

In [1]:
import boto3
import pandas as pd
from datetime import datetime, timedelta
from io import StringIO, BytesIO

## ETL Methods

In [2]:
# Adapter

def read_csv_to_df(bucket, key, decoding='utf-8', sep=','):
    """Reads data from a CSV S3 object to a Pandas dataframe."""

    csv_obj = bucket.Object(key=key).get().get('Body').read().decode(decoding)
    data = StringIO(csv_obj)
    df = pd.read_csv(data, delimiter=sep)
    return df


def write_df_to_s3(bucket, key, df, format='csv'):
    """Writes dataframe to a target S3 bucket."""

    out_buffer = BytesIO()

    if format == 'csv':
        df.to_csv(out_buffer, index=False)

    elif format == 'parquet':
        df.to_parquet(out_buffer, index=False)

    else:
        print(f"Error: {format} is not a valid format. It should be 'csv' or 'parquet.'")
        return False

    bucket.put_object(Body=out_buffer.get_value(), key=key)

    print(f"Successfully loaded {key} to the bucket.")
    return True


def list_files_by_prefix(bucket, prefix):
    files = [obj.key for obj in bucket.objects.filter(Prefix=prefix)]
    return files