## Monitor data normalization since reference date (EQUITIES) - Python

### Overview
Tick count indicator enables to monitor data collection, normalization and storage. Coupled with other monitoring metrics, tick count represents a rich monitoring tool to ensure data completion and storage quality.

### Inputs/outputs
Data normalization monitoring sample takes a list of instrument identifiers (equities) a sper input and returns a set of metrics such as:
* Total tick cout for each instrument
* Total entries used to compute tick count based on the chosen time granularity
* First tick date
* Last tick date
* Missing days: today - last tick date

### Services used
This sample uses *gRPC requests* in order to retrieve ticks from the dedicated hosted service. The queried endpoint in this script are:
* *TopologiesService*: to directly retrieve ticks objects from the server.

### Modules required
1. Systemathics packages:
    * *systemathics.apis.services.topology.v1*
    * *systemathics.apis.type.shared.v1*
    * *google.type*
2. Open source packages
    * *googleapis-common-protos*
    * *protobuf*
    * *grpcio*
    * *pandas*
    * *matpotlib* as per display package

***

# Run equities data normalization sample

### Step 1: Install packages

In [None]:
pip install googleapis-common-protos protobuf grpcio pandas matplotlib

In [None]:
pip install systemathics.apis --pre

In [None]:
import os
import grpc
import pandas as pd
import matplotlib.pyplot as plt
from datetime import date
from datetime import datetime
import google.type.date_pb2 as date
import systemathics.apis.type.shared.v1.level_pb2 as level
import systemathics.apis.type.shared.v1.identifier_pb2 as identifier
import systemathics.apis.services.topology.v1.topologies_pb2 as topologies
import systemathics.apis.services.topology.v1.topologies_pb2_grpc as topologies_service
import systemathics.apis.helpers.token_helpers as token_helpers
import systemathics.apis.helpers.channel_helpers as channel_helpers

### Step 2: Retrieve authentication token
The following code snippet sends authentication request and print token to console output in order to process the upcomming *gRPC queries*.

In [None]:
token = token_helpers.get_token()
display(token)

### Step 3: Create and process request
To request *TopologiesService*, we need to specify:
* Instrument identifier
* Time period selection: select start and end dates
* Topology request parameters

#### 3.1 Reference date specification

In [None]:
# reference date (tick data availability)
reference_date = datetime(2020, 1, 1)

#### 3.2 Instrument selection

In [None]:
# set instrument identifier: exchange + ticker + sources
tickerexchange_array = [['AAPL', 'XNGS',564],
                        ['AMZN', 'XNGS',564],
                        ['MSFT', 'XNGS',564],
                        ['MSFT', 'BATS',729],
                        ['AAPL', 'BATS',729],
                        ['AMZN', 'BATS',729],
                        ['ASML', 'XAMS', 787],
                        ['ABI', 'XBRU',787],
                        ['UNA', 'XAMS',787],
                        ['RDSA', 'XAMS',787],
                        ['ARGX', 'XBRU',787],
                        ['MC', 'XPAR',787],
                        ['SAN', 'XPAR',787],
                        ['TTE', 'XPAR',787],
                        ['JMT', 'XLIS',787],
                        ['EDPR', 'XLIS',787],
                        ['EDP', 'XLIS',787],
                        ['KBC', 'XBRU',787],
                        ['EDP', 'CHIX',794],
                        ['ASML', 'CHIX',794],
                        ['ABI', 'CHIX',794],
                        ['KBC', 'CHIX',794],
                        ['ARGX', 'CHIX',794],
                        ['MC', 'CHIX',794],
                        ['SAN', 'CHIX',794],
                        ['JMT', 'CHIX',794],
                        ['EDPR', 'CHIX',794],
                       ]
length = len(tickerexchange_array)
colors = {
  564: "green",
  729 : "blue",
  787: "purple",
    794 : "brown" 
}

#### 3.3 Topology parameters

In [None]:
# set topology time granularity (daily, weekly...)
granularity = topologies.TOPOLOGY_GRANULARITY_DAILY

# set level: Trades or Trades and Book
level = level.LEVEL_TRADES_AND_BOOK

#### 3.4 Request creation
The following code snippet creates *gRPC client*, process request and ensure that the reply is not empty:

In [None]:
# define method to handle topologies request creation for each instrument
def get_topologies_request(ticker, exchange, granularity, level):
    request = topologies.TopologiesRequest(identifier = identifier.Identifier(exchange = exchange, ticker = ticker),
                                           granularity = granularity,
                                           level = level)
    return request

In [None]:
# process all topologies requests
credentials = grpc.ssl_channel_credentials()
equities_responses =[]
today = datetime.today()
      
# iterate all instrument identifiers: exhange/ticker pairs
for i in range(length):
    try:
        # open a gRPC channel
        with channel_helpers.get_grpc_channel() as channel:

            # instantiate the topologies service
            ticker = tickerexchange_array[i][0]
            exchange = tickerexchange_array[i][1]
            request = get_topologies_request(ticker, exchange, granularity, level)
            service = topologies_service.TopologiesServiceStub(channel)

            # process the topologies request
            response = service.Topologies(request=request, metadata = [('authorization', token)])
            
            # store responses after reference date
            filtered_responses = []      
            for elt in response.entries:
                if elt.begin.year >= reference_date.year:
                    filtered_responses.append(elt)
            equities_responses.append(filtered_responses)
    except grpc.RpcError as e:
        display(e.code().name)
        display(e.details())
        
# get tick count data
print("Total asset requests: ", length)

### Step 4: Retreive data
The first code snippet allows to get the expected number of entries (daily)

In [None]:
# Find number of days (expected entries)
def get_expected_entry_numbers(start_date, end_date):
    if start_date > end_date:
        return 0
    else:
        current_date = start_date
        cpt =0
        while (current_date < today):
            current_date = current_date + timedelta(days=1)
            if current_date.weekday() < 5:
                cpt+=1
        return cpt


In [None]:
from datetime import timedelta 
today = datetime.today()
expected_entries = get_expected_entry_numbers(reference_date, today)
print("Expected entries: {}".format(expected_entries))

The following code snippet enables to export computed metrics to *csv file*:

In [None]:
import csv

# process all topologies responses
filename = 'reference_equities_dashboard_{0:%Y%m%d}.csv'.format(today)

with open(filename, mode='w') as topologies_equities_file:
    topologies_equities_writer = csv.writer(topologies_equities_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    # write 1rst row
    topologies_equities_writer.writerow(['Ticker', 'Exchange', 'Source', 'Entries' ,'Total_ticks', 'First_tick', 'Last_tick', 'Missing_days'])
    
    # Iterate all exhange/ticker pairs
    for i in range(length):

        response = equities_responses[i]    
        # instantiate the topologies service
        ticker = tickerexchange_array[i][0]
        exchange = tickerexchange_array[i][1]
        entries_count = len(response)
        tick_counts = sum([entry.ticks_count for entry in response])
        first_date = datetime(response[0].begin.year, response[0].begin.month, response[0].begin.day)
        last_date = datetime(response[-1].end.year, response[-1].end.month, response[-1].end.day)
        missing_days = expected_entries-entries_count
        source = tickerexchange_array[i][2]
        print("Total entries for {0}-{1} ({2}) \t: {3} \t| total ticks count: {4} | b: {5:%Y/%m/%d} - {6:%Y/%m/%d} \t| Missing days: {7}".format(ticker, exchange, source ,entries_count, tick_counts, first_date, last_date,missing_days))
        topologies_equities_writer.writerow([ticker,exchange, source, entries_count, tick_counts, '{0:%Y/%m/%d}'.format(first_date), '{0:%Y/%m/%d}'.format(last_date), missing_days])

### Step 5: Visualize data

#### 5.1 Plot data normalization overview
The following code snippet enables to plot data normalization metrics per instrument in a single window to give an overview:

In [None]:
num_rows = 7
num_cols = 4
fig,axs = plt.subplots(num_rows,num_cols, figsize=(30,10))
for i in range(length):
    ticker = tickerexchange_array[i][0]
    exchange = tickerexchange_array[i][1]
    source = tickerexchange_array[i][2]
    counts = [entry.ticks_count for entry in equities_responses[i]]
    dates = [datetime(year=entry.begin.year,day=entry.begin.day, month=entry.begin.month) for entry in equities_responses[i]]
    col = i//num_rows
    row = i%num_rows
    axs[row, col].bar(dates, counts, color = colors[source])
    axs[row, col].set_xlim([datetime(reference_date.year, reference_date.month, reference_date.day), datetime(today.year, today.month, today.day)])
    axs[row, col].set_title('{0}-{1} ({2})'.format(ticker, exchange, source))
    
# set the spacing between subplots
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=0.8)

# add subtitle
plt.suptitle("Tick counts for all selected equities from {0:%Y/%m/%d} to {1:%Y/%m/%d}".format(reference_date, today), size="20")

# plot
plt.plot()

#### 5.2 Plot data normalization details
The following code snippet enables to plot data normalization metrics per instrument in a multiple windows:

In [None]:
# One figure for each asset
for i in range(length):
    ticker = tickerexchange_array[i][0]
    exchange = tickerexchange_array[i][1]
    source = tickerexchange_array[i][2]
    counts = [entry.ticks_count for entry in equities_responses[i]]
    dates = [datetime(year=entry.begin.year,day=entry.begin.day, month=entry.begin.month) for entry in equities_responses[i]]
    
    # plot
    fig,ax = plt.subplots(1,1,figsize=(25,10))
    ax.bar(dates,counts, color = colors[source])
    ax.set_xlim([datetime(reference_date.year, reference_date.month, reference_date.day), datetime(today.year, today.month, today.day)])
    plt.xlabel("Date",size="20")
    plt.ylabel("Tick count",size="20")
    plt.title("Tick count for {0}-{1} | source: {2}".format(ticker,exchange, source),size="20")
    plt.show()