## Monitor data normalization (EQUITIES) - Python

### Overview
Tick count indicator enables to monitor data collection, normalization and storage. Coupled with other monitoring metrics, tick count represents a rich monitoring tool to ensure data completion and storage quality.

### Inputs/outputs
Data normalization monitoring sample takes a list of instrument identifiers (equities) a sper input and returns a set of metrics such as:
* Total tick cout for each instrument
* Total entries used to compute tick count based on the chosen time granularity
* First tick date
* Last tick date
* Missing days: today - last tick date

### Services used
This sample uses *gRPC requests* in order to retrieve ticks from the dedicated hosted service. The queried endpoint in this script are:
* *TopologiesService*: to directly retrieve ticks objects from the server.

### Modules required
1. Systemathics packages:
    * *systemathics.apis.services.topology.v1*
    * *systemathics.apis.type.shared.v1*
    * *google.type*
2. Open source packages
    * *googleapis-common-protos*
    * *protobuf*
    * *grpcio*
    * *pandas*
    * *matpotlib* as per display package

***

# Run equities data normalization sample

### Step 1: Install packages

In [1]:
pip install googleapis-common-protos protobuf grpcio pandas matplotlib systemathics.apis

Collecting pandas
  Downloading pandas-1.2.4-cp39-cp39-manylinux1_x86_64.whl (9.7 MB)
[K     |████████████████████████████████| 9.7 MB 14.2 MB/s eta 0:00:01
[?25hCollecting matplotlib
  Downloading matplotlib-3.4.2-cp39-cp39-manylinux1_x86_64.whl (10.3 MB)
[K     |████████████████████████████████| 10.3 MB 46.7 MB/s eta 0:00:01
[?25hCollecting systemathics.apis
  Downloading systemathics.apis-0.9.33.tar.gz (36 kB)
Collecting numpy>=1.16.5
  Downloading numpy-1.20.3-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.4 MB)
[K     |████████████████████████████████| 15.4 MB 39.4 MB/s eta 0:00:01    |██████████████▊                 | 7.1 MB 39.4 MB/s eta 0:00:01     |███████████████████████▋        | 11.3 MB 39.4 MB/s eta 0:00:01     |███████████████████████████     | 12.9 MB 39.4 MB/s eta 0:00:01
[?25hCollecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.3.1-cp39-cp39-manylinux1_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 90.9 MB/s eta 0:00:01
Col

In [2]:
import os
import grpc
import pandas as pd
import matplotlib.pyplot as plt
from datetime import date
from datetime import datetime
import google.type.date_pb2 as date
import systemathics.apis.type.shared.v1.level_pb2 as level
import systemathics.apis.type.shared.v1.identifier_pb2 as identifier
import systemathics.apis.services.topology.v1.topologies_pb2 as topologies
import systemathics.apis.services.topology.v1.topologies_pb2_grpc as topologies_service

### Step 2: Retrieve authentication token
The following code snippet sends authentication request and print token to console output in order to process the upcomming *gRPC queries*.

In [3]:
token = f"Bearer {os.environ['AUTH0_TOKEN']}"
display(token)

'Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6Im1rTVU2czFPQ3FGcVlqZ1pEdDNPOSJ9.eyJpc3MiOiJodHRwczovL3N5c3RlbWF0aGljcy5ldS5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NjBhM2E3YmRhN2FlNTgwMDZlYWJmNjBhIiwiYXVkIjpbImFwaXMuc3lzdGVtYXRoaWNzLmNsb3VkIiwiaHR0cHM6Ly9zeXN0ZW1hdGhpY3MuZXUuYXV0aDAuY29tL3VzZXJpbmZvIl0sImlhdCI6MTYyMjIwNjQxNywiZXhwIjoxNjIyMjkyODE3LCJhenAiOiJtdDFVSHJva1huNGJheTVYUXVNdDZTRk1hVXdvQUI2MCIsInNjb3BlIjoib3BlbmlkIHByb2ZpbGUgZW1haWwgc2VydmljZXM6YmFzaWMgc2VydmljZXM6ZWxldmF0ZWQiLCJwZXJtaXNzaW9ucyI6WyJzZXJ2aWNlczpiYXNpYyIsInNlcnZpY2VzOmVsZXZhdGVkIl19.M0tiGwmKettphoQYbhtysn-ahkcE3qT6hoWwhFrlQ_f8JS7mCqxeu48GvetWnK3dvHNxA8rea3i2gqWQWJQLpuDp-GOEuRoxhQhhyxCN8yEVuyQwsHI1lUYADafo0B2AM-0Sq-9f-VQXPXWXJljZW4227Nlwtrcnd1PTMbOTZwpe_qXcys2luF0JH7lRJka1BwZCvWdZVDAiK3hG0J8SV8LNvjJ70svan4lZFfxhYoi22grG8mL1xxTSZkDdXfQarvpYFsAz5Q1tk8GUUDd3gI7qySxUgqGiXZJqSpjQlV6F6bmlThEbe1-IIfGmVr5EwMyQrTGyD0lWAXl-OumE_Q'

### Step 3: Create and process request
To request *TopologiesService*, we need to specify:
* Instrument identifier
* Time period selection: select start and end dates
* Topology request parameters

#### 3.1 Instrument selection

In [4]:
# set instrument identifier: exchange + ticker + sources
tickerexchange_array = [['AAPL', 'XNGS',564],
                        ['MSFT', 'BATS',729],
                        ['AMZN', 'XNGS',564],
                        ['AAPL', 'BATS',729],
                        ['MSFT', 'XNGS',564],
                        ['AMZN', 'BATS',729],
                        ['ASML', 'XAMS', 787],
                        ['ABI', 'XBRU',787],
                        ['UNA', 'XAMS',787],
                        ['RDSA', 'XAMS',787],
                        ['ARGX', 'XBRU',787],
                        ['MC', 'XPAR',787],
                        ['SAN', 'XPAR',787],
                        ['FP', 'XPAR',787],
                        ['JMT', 'XLIS',787],
                        ['EDPR', 'XLIS',787],
                        ['EDP', 'CHIX',794],
                        ['ASML', 'CHIX',794],
                        ['ABI', 'CHIX',794],
                        ['KBC', 'CHIX',794],
                        ['KBC', 'XBRU',787],
                        ['ARGX', 'CHIX',794],
                        ['MC', 'CHIX',794],
                        ['SAN', 'CHIX',794],
                        ['JMT', 'CHIX',794],
                        ['EDPR', 'CHIX',794],
                        ['EDP', 'XLIS',787]
                       ]
length = len(tickerexchange_array)

#### 3.2 Topology parameters

In [5]:
# set topology time granularity (daily, weekly...)
granularity = topologies.TOPOLOGY_GRANULARITY_DAILY

# set level: Trades or Trades and Book
level = level.LEVEL_TRADES_AND_BOOK

#### 3.3 Request creation
The following code snippet creates *gRPC client*, process request and ensure that the reply is not empty:

In [6]:
# define method to handle topologies request creation for each instrument
def get_topologies_request(ticker, exchange, granularity, level):
    request = topologies.TopologiesRequest(identifier = identifier.Identifier(exchange = exchange, ticker = ticker),
                                           granularity = granularity,
                                           level = level)
    return request

In [7]:
# process all topologies requests
credentials = grpc.ssl_channel_credentials()
equities_responses =[]
today = datetime.today()
      
# iterate all instrument identifiers: exhange/ticker pairs
for i in range(length):
    with grpc.secure_channel("apis.systemathics.cloud:443", credentials) as channel:

        # instantiate the topologies service
        ticker = tickerexchange_array[i][0]
        exchange = tickerexchange_array[i][1]
        request = get_topologies_request(ticker, exchange, granularity, level)
        service = topologies_service.TopologiesServiceStub(channel)

        # process the topologies request
        response = service.Topologies(request=request, metadata = [('authorization', token)])
        
        # store
        equities_responses.append(response)
        
# get tick count data
print("Total asset requests: ", length)

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAUTHENTICATED
	details = "Received http2 header with status: 401"
	debug_error_string = "{"created":"@1622453057.549895088","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":132,"grpc_message":"Received http2 header with status: 401","grpc_status":16,"value":"401"}"
>

### Step 4: Retreive data
The following code snippet enables to export computed metrics to *csv file*:

In [None]:
import csv

# process all topologies responses
today = datetime.today()
filename = 'topologies_equities_{0:%Y%m%d}.csv'.format(today)

with open(filename, mode='w') as topologies_equities_file:
    topologies_equities_writer = csv.writer(topologies_equities_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    # write 1rst row
    topologies_equities_writer.writerow(['Ticker', 'Exchange', 'Source', 'Entries' ,'Total_ticks', 'First_tick', 'Last_tick', 'Missing_days'])
    
    # reference date (tick data availability)
    refernce_date = datetime(2020, 1, 1)
    
    # Iterate all exhange/ticker pairs
    for i in range(length):

        response = equities_responses[i]    
        # instantiate the topologies service
        ticker = tickerexchange_array[i][0]
        exchange = tickerexchange_array[i][1]

        entries_count = len(response.entries)
        tick_counts = sum([entry.ticks_count for entry in response.entries])
        first_date = datetime(response.entries[0].begin.year, response.entries[0].begin.month, response.entries[0].begin.day)
        last_date = datetime(response.entries[-1].end.year, response.entries[-1].end.month, response.entries[-1].end.day)
        missing_days = (today- last_date).days
        source = tickerexchange_array[i][2]
        print("Total entries for {0}-{1} ({2}) \t: {3} \t| total ticks count: {4} \t | b: {5:%Y/%m/%d} - {6:%Y/%m/%d} \t| Missing days: {7}".format(ticker, exchange, source ,entries_count, tick_counts, first_date, last_date,missing_days))
        topologies_equities_writer.writerow([ticker,exchange, source, entries_count, tick_counts, '{0:%Y/%m/%d}'.format(first_date), '{0:%Y/%m/%d}'.format(last_date), missing_days])

### Step 5: Visualize data

#### 5.1 Plot data normalization overview
The following code snippet enables to plot data normalization metrics per instrument in a single window to give an overview:

In [None]:
num_rows = 7
num_cols = 4
fig,axs = plt.subplots(num_rows,num_cols, figsize=(30,10))
for i in range(length):
    ticker = tickerexchange_array[i][0]
    exchange = tickerexchange_array[i][1]
    source = tickerexchange_array[i][2]
    counts = [entry.ticks_count for entry in equities_responses[i].entries]
    dates = [datetime(year=entry.begin.year,day=entry.begin.day, month=entry.begin.month) for entry in equities_responses[i].entries]
    col = i//num_rows
    row = i%num_rows
    axs[row, col].bar(dates, counts)
    axs[row, col].set_title('{0}-{1} ({2})'.format(ticker, exchange, source))
    
# set the spacing between subplots
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=0.8)

# add subtitle
plt.suptitle("Tick counts for all selected equities", size="20")

# plot
plt.plot()

#### 5.2 Plot data normalization details
The following code snippet enables to plot data normalization metrics per instrument in a multiple windows:

In [None]:
# One figure for each asset
for i in range(length):
    counts = [entry.ticks_count for entry in equities_responses[i].entries]
    dates = [datetime(year=entry.begin.year,day=entry.begin.day, month=entry.begin.month) for entry in equities_responses[i].entries]
    
    # plot
    fig,ax = plt.subplots(1,1,figsize=(25,10))
    ax.bar(dates,counts)
    plt.xlabel("Date",size="20")
    plt.ylabel("Tick count",size="20")
    plt.title("Tick count for {0}-{1} | source: {2}".format(tickerexchange_array[i][0],tickerexchange_array[i][1], tickerexchange_array[i][2]),size="20")
    plt.show()