## Create latest reference data with trade history for demo testing




This notebook is an iteration of:

https://github.com/Ficc-ai/ficc/blob/dev/SQL_examples/Create_trade_history_with_reference_data.ipynb 

which aims to provide correct data for testing our demo server with a minimum of data exclusions. The reason to create an alternate pipeline rather than changing our current pipeline is so as not to disturb the functioning of the demo for our current users. It is expected that once we verify that the demo functions without the current exclusions, the exclusions in the main pipeline may be removed and that this  notebook may become obsolete.

The CUSIPs handled in this notebook include the following: 

1. We handle regular interest payment frequencies (monthly, quarterly, annual, and semi-annual). 
2. We handle fixed coupons, deferred interest, OID, and zero coupons.  

The percentage of all CUSIPs is calculated empirically at the end of the notebook. 

In [None]:
import os
from google.cloud import bigquery

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "../creds.json"
bq_client = bigquery.Client()

project = "eng-reactor-287421"

In [None]:
from datetime import date


def mkview(dataset,name,sql):
    db = f"{project}.{dataset}."
    name = db + name
    bq_client.delete_table(name, not_found_ok=True) 
    view = bigquery.Table(name)
    view.view_query = sql
    view = bq_client.create_table(view)
    return name


def sqltodf(sql,limit = ""):
    if limit != "": 
        limit = f" ORDER BY RAND() LIMIT {limit}"
    bqr = bq_client.query(sql + limit).result()
    return bqr.to_dataframe() 

The following view creates trade history for a given CUSIP.  This view is joined to a table containing calculation dates for each trade.  The value par_traded is assumed to be $5MM when the field par_traded is null and the is_trade_with_a_par_amount_over_5MM flag is true.  The exclusions are as follows: 

1) Trades with a par_traded under $10k, which we have found to be not useful for prediction. 
2) Trades with no dollar_price or yield.  

Note that these are only restrictions for trade data; we would still handle these CUSIPs if they are present in the reference data.  

In [None]:
trade_history_groupby = mkview("auxiliary_views","trade_history_groupby",
f"""
    SELECT
      a.cusip,
      ARRAY_AGG( STRUCT(
      a.msrb_valid_from_date, 
      a.msrb_valid_to_date, 
      a.rtrs_control_number, 
      a.trade_datetime, 
      a.publish_datetime, 
      a.yield, 
      a.dollar_price,   
      CASE
          WHEN a.par_traded IS NULL AND is_trade_with_a_par_amount_over_5MM IS TRUE THEN 5000000
        ELSE
        a.par_traded
      END
        AS par_traded,
       trade_type, 
       is_non_transaction_based_compensation, 
       is_lop_or_takedown, 
       brokers_broker, 
       is_alternative_trading_system, 
       is_weighted_average_price, 
        CASE
          WHEN a.settlement_date IS NULL AND a.assumed_settlement_date IS NOT NULL  THEN a.assumed_settlement_date
        ELSE
        a.settlement_date
      END
        AS settlement_date,
       b.calc_date, 
       b.calc_date_selection AS calc_day_cat, 
       a.maturity_date, 
       next_call_date, 
       par_call_date, 
       refund_date,
       a.sequence_number,
       a.transaction_type)
    ORDER BY
      a.trade_datetime DESC
    LIMIT
      32) AS recent
    FROM
      `auxiliary_views.msrb_final` a
    LEFT JOIN
      (select distinct * from eng-reactor-287421.auxiliary_views.calculation_date_and_price) b
  ON
  a.rtrs_control_number = b.rtrs_control_number
  AND a.trade_datetime = b.trade_datetime
  AND a.publish_datetime = b.publish_datetime
  --AND a.msrb_valid_to_date = b.msrb_valid_to_date
  WHERE a.msrb_valid_to_date > CURRENT_DATETIME('America/New_York')
  and b.msrb_valid_to_date > CURRENT_DATETIME('America/New_York')
  AND a.dollar_price IS NOT NULL
  AND (a.par_traded IS NULL
    OR a.par_traded >= 10000)
  AND (a.transaction_type <> "C" or a.transaction_type is null)
    GROUP BY a.cusip 
""")

print(trade_history_groupby)
%time df = sqltodf(f"SELECT * FROM {trade_history_groupby}", 3)
df

The following view joins the reference data v1 to the trade history.

In [None]:
trade_history_latest_ref_data_minimal_exclusions= mkview("auxiliary_views","trade_history_latest_ref_data_minimal_exclusions",
f"""
   SELECT
  ref_data_v1.current_coupon_rate AS coupon,
  ref_data_v1.issue_key as series_id,
  CONCAT(IFNULL(organization_primary_name, ''), ' ', IFNULL(instrument_primary_name, ''), ' ', IFNULL(conduit_obligor_name, '')) AS security_description,
  ref_data_v1.cusip,
  ref_valid_from_date,
  ref_valid_to_date,
  incorporated_state_code,
  organization_primary_name,
  instrument_primary_name,
  issue_key,
  issue_text,
  conduit_obligor_name,
  is_called,
  is_callable,
  is_escrowed_or_pre_refunded,
  first_call_date,
  call_date_notice,
  callable_at_cav,
  par_price,
  call_defeased,
  call_timing,
  call_timing_in_part,
  extraordinary_make_whole_call,
  extraordinary_redemption,
  make_whole_call,
  next_call_date,
  next_call_price,
  call_redemption_id,
  first_optional_redemption_code,
  second_optional_redemption_code,
  third_optional_redemption_code,
  first_mandatory_redemption_code,
  second_mandatory_redemption_code,
  third_mandatory_redemption_code,
  par_call_date,
  par_call_price,
  maximum_call_notice_period,
  called_redemption_type,
  muni_issue_type,
  refund_date,
  refund_price,
  redemption_cav_flag,
  max_notification_days,
  min_notification_days,
  next_put_date,
  put_end_date,
  put_feature_price,
  put_frequency,
  put_start_date,
  put_type,
  maturity_date,
  sp_long,
  sp_stand_alone,
  sp_icr_school,
  sp_prelim_long,
  sp_outlook_long,
  sp_watch_long,
  sp_Short_Rating,
  sp_Credit_Watch_Short_Rating,
  sp_Recovery_Long_Rating,
  moodys_long,
  moodys_short,
  moodys_Issue_Long_Rating,
  moodys_Issue_Short_Rating,
  moodys_Credit_Watch_Long_Rating,
  moodys_Credit_Watch_Short_Rating,
  moodys_Enhanced_Long_Rating,
  moodys_Enhanced_Short_Rating,
  moodys_Credit_Watch_Long_Outlook_Rating,
  has_sink_schedule,
  next_sink_date,
  sink_indicator,
  sink_amount_type_text,
  sink_amount_type_type,
  sink_frequency,
  sink_defeased,
  additional_next_sink_date,
  sink_amount_type,
  additional_sink_frequency,
  min_amount_outstanding,
  max_amount_outstanding,
  default_exists,
  has_unexpired_lines_of_credit,
  years_to_loc_expiration,
  escrow_exists,
  escrow_obligation_percent,
  escrow_obligation_agent,
  escrow_obligation_type,
  child_linkage_exists,
  put_exists,
  floating_rate_exists,
  bond_insurance_exists,
  is_general_obligation,
  has_zero_coupons,
  delivery_date,
  issue_price,
  primary_market_settlement_date,
  issue_date,
  outstanding_indicator,
  federal_tax_status,
  maturity_amount,
  available_denom,
  denom_increment_amount,
  min_denom_amount,
  accrual_date,
  bond_insurance,
  coupon_type,
  current_coupon_rate,
  daycount_basis_type,
  debt_type,
  default_indicator,
  first_coupon_date,
  interest_payment_frequency,
  issue_amount,
  last_period_accrues_from_date,
  next_coupon_payment_date,
  odd_first_coupon_date,
  orig_principal_amount,
  original_yield,
  outstanding_amount,
  previous_coupon_payment_date,
  sale_type,
  settlement_type,
  additional_project_txt,
  asset_claim_code,
  additional_state_code,
  backed_underlying_security_id,
  bank_qualified,
  capital_type,
  conditional_call_date,
  conditional_call_price,
  designated_termination_date,
  DTCC_status,
  first_execution_date,
  formal_award_date,
  maturity_description_code,
  muni_security_type,
  mtg_insurance,
  orig_cusip_status,
  orig_instrument_enhancement_type,
  other_enhancement_type,
  other_enhancement_company,
  pac_bond_indicator,
  project_name,
  purpose_class,
  purpose_sub_class,
  refunding_issue_key,
  refunding_dated_date,
  sale_date,
  sec_regulation,
  secured,
  series_name,
  sink_fund_redemption_method,
  state_tax_status,
  tax_credit_frequency,
  tax_credit_percent,
  use_of_proceeds,
  use_of_proceeds_supplementary,
  rating_downgrade,
  rating_upgrade,
  rating_downgrade_to_junk,
  min_sp_rating_this_year,
  max_sp_rating_this_year,
  min_moodys_rating_this_year, 
  max_moodys_rating_this_year,
      latest.* EXCEPT(cusip)
    FROM
     `reference_data_v1.reference_data_flat` ref_data_v1
    LEFT JOIN 
   eng-reactor-287421.auxiliary_views.trade_history_groupby latest
    ON
      latest.cusip = ref_data_v1.cusip   
    WHERE
         ref_data_v1.cusip IS NOT NULL
      AND ref_data_v1.ref_valid_to_date > current_timestamp
""")

print(trade_history_latest_ref_data_minimal_exclusions)
%time df = sqltodf(f"SELECT * FROM {trade_history_latest_ref_data_minimal_exclusions}", 3)
df

In [None]:
query_prod_pipeline = ("SELECT distinct (cusip) FROM `eng-reactor-287421.auxiliary_views.trade_history_latest_ref_data_minimal_exclusions`")

query_reference_data_v1_flat = ("SELECT distinct (cusip) FROM `reference_data_v1.reference_data_flat` where ref_valid_to_date > current_timestamp")

query_int_pay = ("SELECT distinct (cusip) FROM `reference_data_v1.reference_data_flat` ref_data_v1 where ref_data_v1.ref_valid_to_date > current_timestamp AND (ref_data_v1.interest_payment_frequency = 1 OR ref_data_v1.interest_payment_frequency = 2 OR ref_data_v1.interest_payment_frequency = 3 OR ref_data_v1.interest_payment_frequency = 5 OR ref_data_v1.interest_payment_frequency = 16)")

query_coupon_type = ("SELECT DISTINCT (cusip) FROM `reference_data_v1.reference_data_flat` ref_data_v1 WHERE ref_data_v1.ref_valid_to_date > current_timestamp AND (coupon_type = 4 OR coupon_type= 8 OR coupon_type= 10 OR coupon_type= 17)")

def count_total_rows(query):
    query_job = bq_client.query(query, location="US")
    results = query_job.result()
    results_as_int = int(format(results.total_rows))
    return results_as_int

int_pay_total = count_total_rows(query_int_pay)
coup_total = count_total_rows(query_coupon_type)
prod_pipeline_total = count_total_rows(query_prod_pipeline)
ref_data_v1_flat_total = count_total_rows(query_reference_data_v1_flat)
coupon_exclusions = round((ref_data_v1_flat_total-coup_total)/ref_data_v1_flat_total *100, 2)
int_pay_exclusions = round((ref_data_v1_flat_total-int_pay_total)/ref_data_v1_flat_total *100, 2)
percent = round(prod_pipeline_total/ref_data_v1_flat_total, 4) * 100

In [None]:
print ("Interest payment exclusions are " + str(int_pay_exclusions) +"% of the total. Coupon exclusions amount to " + str(coupon_exclusions) +"% of the total.")

In [None]:
print ("The production pipeline contains " + str(prod_pipeline_total)+" CUSIPs. Referene data has " + str(ref_data_v1_flat_total) +" CUSIPs. Our coverage is " + str(percent) +"% of Municipal CUSIPs.")

Interest payment exclusions are 0.98% of the total. Coupon exclusions amount to 5.13% of the total.

The production pipeline contains 1434343 CUSIPs. Reference data has 1044541 CUSIPs. Our coverage is 137.32% of Municipal CUSIPs.