# Check Price for Files Priced Using Point-in-Time Pricing


This notebook checks prices for the case when we are pricing CUSIPs that have traded, but not for the specific quantities or sides that traded. The procedure for pricing the trades is as follows: (1) get all of the trades that occurred on a specified date, and (2) use the archived model for that specific day to price these trades at specified quantities and trade types. See `point_in_time_pricing_actual_trades.py` for more details on this procedure. To evaluate our accuracy for these trades, we take the trades that occured and check the MAE for the closest prediction for the same trade direction and a similar trade size. We expect the MAE will be higher than the MAE would be if we compared a trade to a prediction with the same size and direction. The intent is simply to check if the MAE is within the expected range, in this case 10-15 bps. 

A secondary concern was to put files into a more user-friendly format by creating bid-ask spreads and by eliminating cases where we price some sizes and directions but not others.

In [None]:
import os

import pandas as pd

from google.cloud import bigquery

In [None]:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '../creds.json'
bq_client = bigquery.Client()

project = 'eng-reactor-287421'

We remove rows with error values and trades that are close to maturity as these will skew the MAE numbers. 

In [None]:
def remove_error_rows(df):
    df['ytw'] = pd.to_numeric(df['ytw'], errors='coerce')
    df['trade_datetime'] = pd.to_datetime(df['trade_datetime'])
    df = df[~df['ytw'].isna() & (df['ytw'] != -1)]
    df = df[(pd.to_datetime(df['yield_to_worst_date']) - pd.to_datetime(df['trade_datetime'])).dt.days >= 180]
    df = df.drop('yield_to_worst_date', axis=1)
    return df

We keep only CUSIPs where we have six prices, for both directions and three sizes (assuming that the file was priced with three different quantities). The reason that we may need to do this is because some error messages (e.g., "CUSIP is maturing very soon or has already matured" depends on the calc date which is something we learn after pricing) only affect a subset of the quantity, trade type pairs. We then create a bid-ask spread. These steps were intended to regularize the file for the end user, but may not be necessary.  If we do use them, we may want to run these before we remove the error rows, above, as the error rows are important for the end user. We may want to run only these two functions to create a cleaner file for the end user--the MAE testing should be distinct from cleaning the file.

In [None]:
def keep_only_cusips_with_all_six_quantities(df):
    df['cusip_trade_datetime'] = df['cusip'] + '_' + df['trade_datetime'].astype(str)
    df['combination_counts'] = df['cusip_trade_datetime'].map(df['cusip_trade_datetime'].value_counts())
    df = df[df['combination_counts'] == 6]
    return df

In [None]:
def create_bid_ask_spread(df):
    offered_df = df[df['trade_type'] == 'Offered Side'].copy()
    bid_df = df[df['trade_type'] == 'Bid Side'].copy()

    # Rename 'ytw' and 'price' columns in each DataFrame to reflect trade_type
    offered_df.rename(columns={'ytw': 'ytw_offered', 'price': 'price_offered'}, inplace=True)
    bid_df.rename(columns={'ytw': 'ytw_bid', 'price': 'price_bid'}, inplace=True)

    # Drop 'trade_type' from both DataFrames since it will be redundant post-merge
    offered_df.drop('trade_type', axis=1, inplace=True)
    bid_df.drop('trade_type', axis=1, inplace=True)

    assert len(bid_df) == len(offered_df)

    # Perform an outer merge to ensure all combinations are preserved
    combined_df = pd.merge(offered_df, bid_df, on=['cusip', 'quantity', 'trade_datetime', 'cusip_trade_datetime', 'combination_counts'], how='outer')
    assert len(combined_df) * 2 == len(df)
    combined_df['bid_ask_spread'] = combined_df['ytw_bid'] - combined_df['ytw_offered']
    new_column_order = [
        'cusip',
        'quantity',
        'ytw_offered',
        'price_offered',
        'ytw_bid',
        'price_bid',
        'bid_ask_spread',    # Placing ytw_bid, price_bid, bid_ask_spread after price_offered
        'trade_datetime',
        'cusip_trade_datetime',
    ]

    # Reorder the DataFrame according to the new column order
    combined_df = combined_df[new_column_order]
    return combined_df

In [None]:
def format_file(df):
    df = keep_only_cusips_with_all_six_quantities(df)
    df = create_bid_ask_spread(df)
    return df

Create helper functions for loading the results of the SQL query to a dataframe and for setting the quantity correctly.

In [None]:
def sqltodf(sql, limit=''):
    if limit != '': limit = f' ORDER BY RAND() LIMIT {limit}'
    bqr = bq_client.query(sql + limit).result()
    return bqr.to_dataframe()

In [None]:
def assign_quantity(par_traded, quantities):
    for quantity in quantities:
        if par_traded <= quantity:
            return quantity
    return quantities[-1]    # Return the last quantity if none match

Get the MAE for a dataframe.

In [None]:
def get_mae(df, filename):
    '''`filename` is used solely for printing.'''
    trade_date = df.iloc[0].trade_datetime.date()
    query = f'''SELECT * FROM auxiliary_views.msrb_final WHERE trade_date = "{trade_date}" and publish_date = "{trade_date}"'''
    trades = sqltodf(query)
    trades['cusip_trade_datetime'] = trades['cusip'] + '_' + trades['trade_datetime'].astype(str)
    quantities = sorted(df['quantity'].unique().tolist())
    trades['quantity'] = trades['par_traded'].apply(lambda x: assign_quantity(x, quantities))
    merged_df = trades.merge(df, on=['cusip_trade_datetime', 'quantity'], how='left')
    bid = merged_df[merged_df['trade_type'] == 'D'][['yield', 'ytw_bid', 'cusip_trade_datetime']]
    offer = merged_df[merged_df['trade_type'] == 'S'][['yield', 'ytw_offered', 'cusip_trade_datetime']]
    offer = offer.dropna(subset=['ytw_offered'])
    bid = bid.dropna(subset=['ytw_bid'])
    offer['diff'] = abs(offer['yield'] - offer['ytw_offered'])
    bid['diff'] = abs(bid['yield'] - bid['ytw_bid'])
    offer = offer.sort_values(by='diff', ascending=False)
    bid = bid.sort_values(by='diff', ascending=False)
    offer_mae = round((offer['diff'].mean() * 100), 3)
    bid_mae = round((bid['diff'].mean() * 100), 3)
    print(f'''Offer MAE for {filename} is : {offer_mae} bps.''')
    print(f'''Bid MAE for {filename} is : {bid_mae} bps.''')
    return (offer_mae, bid_mae)

In [None]:
def check_file(df, filename):
    '''`filename` is used solely for printing.'''
    df = remove_error_rows(df)
    df = keep_only_cusips_with_all_six_quantities(df)
    df = create_bid_ask_spread(df)
    get_mae(df, filename)
    return filename

Specify the path to your CSV file.

In [None]:
directory_path = '/Users/user/desktop/BMO'

Loop through each file in `directory_path` and analyze the MAE.

In [None]:
# Loop through each file in the directory
for filename in os.listdir(directory_path):
    # Construct the full file path
    file_path = os.path.join(directory_path, filename)

    # Check if the current item is a file and not a directory/subdirectory
    if os.path.isfile(file_path) and filename.endswith('.csv') and filename != '.DS_Store':
        print(f'Processing {file_path}...')

        # Attempt to read the file with different encodings
        successful_read = False
        for encoding in ['utf-8', 'ISO-8859-1', 'windows-1252']:
            try:
                df = pd.read_csv(file_path, encoding=encoding)
                successful_read = True
                print(f'Successfully read {filename} with encoding {encoding}')
                # Optionally, display the first few rows or filename to track progress
                # print(df.head())  # Uncomment to see the first few rows of each file

                # Run your function on the DataFrame (Assuming `check_file` is defined elsewhere)
                check_file(df, filename)
                break    # Exit the encoding loop on success
            except UnicodeDecodeError as e:
                print(f'Error reading {filename} with encoding {encoding}: {e}')
            except Exception as e:
                print(f'Unexpected error while reading {filename}: {e}')
                break    # Exit the encoding loop on unexpected error

        if not successful_read:
            print(f'Failed to read {filename} with tried encodings.')
    else:
        print(f'Skipping {filename}...')