# CAC 40 correlation analysis - Python

### Overview
This notebook is the basis of this blog post : https://ganymde.cloud/cac40-correlation.html

### Services used
This sample uses *gRPC requests* in order to retrieve index components reference data from the hosted service. The queried endpoint in this script are:
* *StaticDataService*: to directly retrieve reference data objects from the server
* *TickBarService*: to retrieve bars from the server

### Modules required
1. Systemathics packages:
    * *systemathics.apis.services.static_data.v1*
2. Open source packages
    * *googleapis-common-protos*
    * *protobuf*
    * *grpcio*
    * *pandas*
    
***

# Run CAC 40 correlation analysis

### Step 1: Install packages and import them

In [None]:
pip install googleapis-common-protos protobuf grpcio pandas  systemathics.apis matplotlib

In [None]:
import os
import grpc
import pandas as pd
import google.protobuf as pb
from datetime import datetime
from datetime import timedelta
import google.type.date_pb2 as date
import google.type.timeofday_pb2 as timeofday
import google.type.dayofweek_pb2 as dayofweek
import google.protobuf.duration_pb2 as duration
import systemathics.apis.type.shared.v1.identifier_pb2 as identifier
import systemathics.apis.type.shared.v1.constraints_pb2 as constraints
import systemathics.apis.type.shared.v1.date_interval_pb2 as dateinterval
import systemathics.apis.type.shared.v1.time_interval_pb2 as timeinterval
import systemathics.apis.services.tick_analytics.v1.tick_bars_pb2 as tick_bars
import systemathics.apis.services.tick_analytics.v1.tick_bars_pb2_grpc as tick_bars_service
import systemathics.apis.services.static_data.v1.static_data_pb2 as static_data
import systemathics.apis.services.static_data.v1.static_data_pb2_grpc as static_data_service
import systemathics.apis.helpers.token_helpers as token_helpers
import systemathics.apis.helpers.channel_helpers as channel_helpers

### Step 2: Prepare API requests
The following code snippets retrieve authentication token to be used in upcomming API requests:

In [None]:
token = token_helpers.get_token()
display(token)

### Step 3: Create and process request

The following code snippet enables to select the **index** by its *Name/code*:

In [None]:
# set index
index = 'CAC 40'

The following code snippets call the service, generate the request and return the reply: 

In [None]:
# generate static data request
request = static_data.StaticDataRequest( 
    asset_type = static_data.AssetType.ASSET_TYPE_EQUITY,
)

request.index.value = index # add index as per filter value
request.count.value = 1000 # by default the count is set to 100

In [None]:
try:
    # open a gRPC channel
    with channel_helpers.get_grpc_channel() as channel:  
        # instantiate the static data service
        service = static_data_service.StaticDataServiceStub(channel)
        
        # process the request
        response = service.StaticData(
            request = request, 
            metadata = [('authorization', token)]
        )
except grpc.RpcError as e:
    display(e.code().name)
    display(e.details())

In [None]:
display(len(response.equities))

### Step 4: Retrieve index components

In [None]:
# define a method to handle the equities reponse using a Pandas dataframe
def get_equities_dataframe(response):
    exchange = [equity.identifier.exchange for equity in response.equities]
    ticker = [equity.identifier.ticker for equity in response.equities]
    name = [equity.name for equity in response.equities]
    primary = [equity.primary for equity in response.equities]
    index = [equity.index for equity in response.equities]
    
    # Create pandas dataframe
    d = {'Index': index, 'Name': name, 'Ticker': ticker, 'Exchange': exchange, 'Primary':primary}
    df = pd.DataFrame(data=d)
    return df

In [None]:
# visualize request results
data = get_equities_dataframe(response)
display(data)

The following code snippet exports index components to a *csv file*:

In [None]:
data = data[ ((data['Exchange'] == "XPAR") | (data['Exchange'] == "XAMS")) & (data['Index'].str.contains('CAC 40'))]

In [None]:
data.to_csv('Export/CAC_Components.csv'.format(index), index=False)

In [None]:
data

### Step 5: Retrieve tick bars data

In [None]:
# set the bar duration
sampling = 5 * 60

# set the bar calculation field
field = tick_bars.BAR_PRICE_TRADE 

In [None]:
# create time intervals (we are using Google date format)
today = datetime.today()
start = today - timedelta(days=50)

date_interval = dateinterval.DateInterval(
    start_date = date.Date(year = start.year, month = start.month, day = start.day), 
    end_date = date.Date(year = today.year, month = today.month, day = today.day)
)

# build the market data request time interval (we are using Google time format)
# UTC time zone
time_interval = timeinterval.TimeInterval(
    start_time = timeofday.TimeOfDay(hours = 6, minutes = 0, seconds = 0), 
    end_time = timeofday.TimeOfDay(hours = 18, minutes = 0, seconds = 0)
)

In [None]:
# generate constraints based on the previous time selection
constraint = constraints.Constraints(
    date_intervals = [date_interval],
    time_intervals = [time_interval],
)

In [None]:
# generate tick bars request
def get_request(exchange, ticker):
    return tick_bars.TickBarsRequest(
                identifier = identifier.Identifier(exchange = exchange, ticker = ticker),
                constraints = constraint,
                sampling = duration.Duration(seconds = sampling),
                field = field)

In [None]:
requests = [ (row['Name'], get_request(row['Exchange'],row['Ticker'])) for index, row in data.iterrows() ]

In [None]:
try:
    with channel_helpers.get_grpc_channel() as channel:  
        
        # instantiate the tick bars service
        service = tick_bars_service.TickBarsServiceStub(channel)
        
        # process the tick bars request
        dataframe = pd.DataFrame({'Date': []})
        dataframe = dataframe.set_index('Date')
        metadata = [('authorization', token)]
        for name, request in requests :
            display(name)
            bars = []        
            for bar in service.TickBars(request=request, metadata=metadata):
                bars.append(bar)
            dates=[datetime.fromtimestamp(b.time_stamp.seconds) for b in bars]
            closes = [b.close for b in bars]
            df = pd.DataFrame(data ={'Date': dates, f'{name}': closes})
            df = df.set_index('Date')
            if (dataframe.size == 0):
                dataframe = df
            else:
                dataframe = pd.merge(dataframe, df, on="Date")
except grpc.RpcError as e:
    display(e.code().name)
    display(e.details())

In [None]:
dataframe

### Step 6: Compute correlation

In [None]:
corr = dataframe.corr()

In [None]:
corr

In [None]:
corr.to_csv("Export/CAC_Correlation_Matrix.csv")

### Step 7: Visualize data

In [None]:
import matplotlib.pyplot as plt
plt.pcolor(corr)
plt.show()

In [None]:
dataframe.describe()

The following code snippet enables to get the `closest correlated stocks`:

In [None]:
corr = corr.replace(1,0)
final = pd.DataFrame({ "Stock": corr.index, "Closest correlated stock" : corr.idxmax(), "Correlation value": corr.max() })
final = final[final["Correlation value"] > 0.90]
final.sort_values(by="Correlation value",ascending =False)