## Monitor data normalization (FUTURES) - Python

### Overview
Tick count indicator enables to monitor data collection, normalization and storage. Coupled with other monitoring metrics, tick count represents a rich monitoring tool to ensure data completion and storage quality.

### Inputs/outputs
Data normalization monitoring sample takes a list of instrument identifiers (futures) a sper input and returns a set of metrics such as:
* Total tick cout for each instrument
* Total entries used to compute tick count based on the chosen time granularity
* First tick date
* Last tick date
* Missing days: today - last tick date

### Services used
This sample uses *gRPC requests* in order to retrieve ticks and static data objects from the dedicated hosted services. The queried endpoint in this script are:
* *StaticDataService*: to directly retrieve static data objects from the server
* *TopologiesService*: to directly retrieve ticks objects from the server

### Modules required
1. Systemathics packages:
    * *systemathics.apis.services.static_data.v1*
    * *systemathics.apis.services.topology.v1*
    * *systemathics.apis.type.shared.v1*
    * *google.type*
2. Open source packages
    * *googleapis-common-protos*
    * *protobuf*
    * *grpcio*
    * *pandas*
    * *matpotlib* as per display package

***

# Run futures futures data normalization sample

### Step 1: Install packages

In [None]:
pip install googleapis-common-protos protobuf grpcio pandas matplotlib

In [None]:
pip install systemathics.apis --pre

In [None]:
import os
import grpc
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from datetime import date
import google.type.date_pb2 as date
import systemathics.apis.type.shared.v1.level_pb2 as level
import systemathics.apis.type.shared.v1.identifier_pb2 as identifier
import systemathics.apis.services.topology.v1.topologies_pb2 as topologies
import systemathics.apis.services.topology.v1.topologies_pb2_grpc as topologies_service
import systemathics.apis.services.static_data.v1.static_data_pb2 as static_data
import systemathics.apis.services.static_data.v1.static_data_pb2_grpc as static_data_service
import systemathics.apis.helpers.token_helpers as token_helpers
import systemathics.apis.helpers.channel_helpers as channel_helpers

### Step 2: Retrieve authentication token
The following code snippet sends authentication request and print token to console output in order to process the upcomming *gRPC queries*.

In [None]:
token = token_helpers.get_token()
display(token)

### Step 3: Create and process request
To request *TopologiesService*, we need to specify:
* Instrument identifier
* Time period selection: select start and end dates
* Topology request parameters

#### 3.1 Instrument selection

In [None]:
# set instrument identifier: exchange + ticker + sources
contractexchange_array = [['CC', 'IFUS',675],
                        ['KC', 'IFUS',675],
                        ['BRN', 'IFEU',756],
                        ['WBS', 'IFEU',756],
                        ['I', 'IFLL',890],
                        ['W', 'IFLX',890]]
length = len(contractexchange_array)
colors = {
  675: "green",
  756 : "blue",
  890: "red",
}

#### 3.2 Retrieve front future contract

The following code snippets enable to:
* Retrieve the matching futures (all maturities) for a given future input contract code
* For each input contract code, select the front future among the previously returned futures (all maturities)

If the front future is missing, we select the most recent future contract (expired).

In [None]:
# define method to handle static data request creation for each instrument
def get_staticdata_request(contract):
    data_request = static_data.StaticDataRequest( asset_type = static_data.AssetType.ASSET_TYPE_FUTURE)
    data_request.future_contract.value = contract
    data_request.count.value = 1000
    return data_request

In [None]:
all_futures=[]

# credentials
credentials = grpc.ssl_channel_credentials()

# iterate all future contracts
for i in range(length):
    contract = contractexchange_array[i][0]
    try:
        # open a gRPC channel
        with channel_helpers.get_grpc_channel() as channel:  
            # instantiate the static data request
            service = static_data_service.StaticDataServiceStub(channel)
            request = get_staticdata_request(contract)
            
            # process the static data request
            response = service.StaticData(request = request, metadata = [('authorization', token)])
            sorted_futures = sorted(response.futures, key=lambda x: (x.maturity.year, x.maturity.month))
            futures_count = len(sorted_futures)
            print('Found {0} futures for contract {1}: '.format(futures_count, contract))
            all_futures.append(sorted_futures)
    except grpc.RpcError as e:
        display(e.code().name)
        display(e.details())

Keep only the **front** for each future contract:

In [None]:
today = datetime.today()
tickerexchange_array = []

# iterate all future contracts
for i in range(length):
    source = contractexchange_array[i][2] #keep source for later
    contract = contractexchange_array[i][0]
    found = False
    
    # iterate all future maturities
    for future in all_futures[i]:
        current_future = future
        maturity = datetime(current_future.maturity.year, current_future.maturity.month,current_future.maturity.day)
        ticker = current_future.identifier.ticker
        exchange = current_future.identifier.exchange
        
        # check if we reached the front
        if (maturity > today):
            tickerexchange_array.append([ticker, exchange, source])
            found = True
            break
    
    # if we didn't find any front for the current contract, we'll then chose the last contract
    if not found:
        tickerexchange_array.append([ticker, exchange, source])
        print('Could not find front for contract {0} - Selected last future with ticker {1}, maturity {2::%Y/%m/%d}'.format(contract, ticker, maturity))

In [None]:
# displaying our selected tickers
print(tickerexchange_array)

#### 3.3 Topology parameters

In [None]:
# set topology time granularity (daily, weekly...)
granularity = topologies.TOPOLOGY_GRANULARITY_DAILY

# set level: Trades or Trades and Book
my_level = level.LEVEL_TRADES_AND_BOOK

#### 3.4 Request creation
The following code snippet creates *gRPC client*, process request and ensure that the reply is not empty:

In [None]:
# define method to handle topologies request creation for each instrument
def get_topologies_request(ticker, exchange, granularity, level):
    request = topologies.TopologiesRequest(identifier = identifier.Identifier(exchange = exchange, ticker = ticker),
                                           granularity = granularity,
                                           level = level)
    return request

In [None]:
# process all topologies requests
credentials = grpc.ssl_channel_credentials()
futures_responses =[]
today = datetime.today()
      
# iterate all instrument identifiers: exhange/ticker pairs
for i in range(length):
    with channel_helpers.get_grpc_channel() as channel:  

        # instantiate the topologies service
        ticker = tickerexchange_array[i][0]
        exchange = tickerexchange_array[i][1]
        request = get_topologies_request(ticker, exchange, granularity, my_level)
        service = topologies_service.TopologiesServiceStub(channel)

        # process the topologies request
        response = service.Topologies(
            request=request, 
            metadata = [('authorization', token)]
        )
        
        #store
        futures_responses.append(response)
        
# get tick count data
print("Total asset requests: ", len(futures_responses))

### Step 4: Retreive data
The following code snippet enables to export computed metrics to *csv file*:

In [None]:
import csv

# process all topologies responses
today = datetime.today()
filename = 'alltime_futures_dashboard_{0:%Y%m%d}.csv'.format(today)

with open(filename, mode='w') as topologies_futures_file:
    topologies_futures_writer = csv.writer(topologies_futures_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    # write 1rst row
    topologies_futures_writer.writerow(['Ticker', 'Exchange', 'Source', 'Entries' ,'Total_ticks', 'First_tick', 'Last_tick', 'Missing_days'])
            
    # Iterate all exhange/ticker pairs
    for i in range(length):

        response = futures_responses[i]    
        # instantiate the topologies service
        ticker = tickerexchange_array[i][0]
        exchange = tickerexchange_array[i][1]

        entries_count = len(response.entries)
        tick_counts = sum([entry.ticks_count for entry in response.entries])
        first_date = datetime(response.entries[0].begin.year, response.entries[0].begin.month, response.entries[0].begin.day)
        last_date = datetime(response.entries[-1].end.year, response.entries[-1].end.month, response.entries[-1].end.day)
        missing_days = (today- last_date).days
        source = tickerexchange_array[i][2]
        print("Total entries for {0}-{1} ({2}) \t: {3} \t| total ticks count: {4} \t | b: {5:%Y/%m/%d} - {6:%Y/%m/%d} \t| Missing days: {7}".format(ticker, exchange, source ,entries_count, tick_counts, first_date, last_date,missing_days))
        topologies_futures_writer.writerow([ticker,exchange, source, entries_count, tick_counts, '{0:%Y/%m/%d}'.format(first_date), '{0:%Y/%m/%d}'.format(last_date), missing_days])

### Step 5: Visualize data

#### 5.1 Plot data normalization overview
The following code snippet enables to plot data normalization metrics per instrument in a single window to give an overview:

In [None]:
num_rows = 3
num_cols = 2
fig,axs = plt.subplots(num_rows,num_cols, figsize=(30,10))
for i in range(length):
    ticker = tickerexchange_array[i][0]
    exchange = tickerexchange_array[i][1]
    source = tickerexchange_array[i][2]
    counts = [entry.ticks_count for entry in futures_responses[i].entries]
    dates = [datetime(year=entry.begin.year,day=entry.begin.day, month=entry.begin.month) for entry in futures_responses[i].entries]
    col = i//num_rows
    row = i%num_rows
    axs[row, col].bar(dates, counts, color=colors[source])
    axs[row, col].set_title('{0}-{1} ({2})'.format(ticker, exchange, source))
    
# set the spacing between subplots
plt.subplots_adjust(left=0.2, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=1.2)

# add subtitle
plt.suptitle("Tick counts for all selected futures", size="20")

# plot
plt.plot()

#### 5.2 Plot data normalization details
The following code snippet enables to plot data normalization metrics per instrument in a multiple windows:

In [None]:
# One figure for each asset
for i in range(length):
    ticker = tickerexchange_array[i][0]
    exchange = tickerexchange_array[i][1]
    source = tickerexchange_array[i][2]
    counts = [entry.ticks_count for entry in futures_responses[i].entries]
    dates = [datetime(year=entry.begin.year,day=entry.begin.day, month=entry.begin.month) for entry in futures_responses[i].entries]
    
    # plot
    fig,ax = plt.subplots(1,1,figsize=(25,10))
    ax.bar(dates,counts, color=colors[source])
    plt.xlabel("Date",size="20")
    plt.ylabel("Tick count",size="20")
    plt.title("Tick count for {0}-{1} | source: {2}".format(ticker,exchange, source),size="20")
    plt.show()