# Data management in LUSID

In this training module we'll see how to use LUSID to perform the following task:

**<div align="center">As a data manager, I want to populate the LUSID instrument master with a mix of Equities, Bonds, Futures and FxForwards. The data is sourced from two systems, each with its own taxonomy and market identifiers. I want to a see a consolidated view for each instrument in LUSID, but retain the original format and lineage of the data.</div>**

In [1]:
# Set up LUSID
import os
import pandas as pd
import json
import uuid
import matplotlib.pyplot as plt
from IPython.core.display import HTML
import logging
logging.basicConfig(level=logging.INFO)

import lusid as lu
import lusid.api as la
import lusid.models as lm

from lusid.utilities import ApiClientFactory
from lusidjam import RefreshingToken
from lusidtools.pandas_utils.lusid_pandas import lusid_response_to_data_frame
from lusidtools.jupyter_tools import StopExecution
from lusidtools.lpt.lpt import to_date

# Set pandas display options
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.options.display.float_format = "{:,.3f}".format
#display(HTML("<style>.container { width:90% !important; }</style>"))

# Authenticate to SDK
# Run the Notebook in Jupyterhub for your LUSID domain and authenticate automatically
secrets_path = os.getenv("FBN_SECRETS_PATH")
# Run the Notebook locally using a secrets file (see https://support.lusid.com/knowledgebase/article/KA-01663)
if secrets_path is None:
    secrets_path = os.path.join(os.path.dirname(os.getcwd()), "secrets.json")

api_factory = ApiClientFactory(
    token = RefreshingToken(), 
    api_secrets_filename = secrets_path,
    app_name = "LusidJupyterNotebook"
)

# Confirm success by printing SDK version
api_status = pd.DataFrame(api_factory.build(lu.ApplicationMetadataApi).get_lusid_versions().to_dict())
display(api_status)

Unnamed: 0,api_version,build_version,excel_version,links
0,v0,0.6.8714.0,0.5.2632,"{'relation': 'RequestLogs', 'href': 'http://ja..."


## 1. Examining the source files

In [2]:
# Read security information into pandas dataframe from VendorA
VendorA_df = pd.read_csv("VendorA.csv", keep_default_na = False)
VendorA_df

Unnamed: 0,instrument_name,internal_id,currency,isin,figi,country_issue,market_sector,coupon,maturity_date
0,BP PLC,imd_43535553,GBP,GB0007980591,BBG000C05BD1,United Kingdom,equity,,
1,UK Gilt 0.375 10/22/26,imd_34994599,GBP,GB00BNNGP668,BBG00ZF1T9P5,united_kingdom,bond,0.0375,2026-10-22
2,EURO-BUND FUTURE Jun22,imd_34588699,EUR,DE0009652644,BBG012FFPKW7,,future,,2022-06-30
3,EURUSD 6M FWD,imd_34876136,USD,,,United States,fxforward,,2022-07-01


In [3]:
# Read security information into pandas dataframe from VendorB
VendorB_df = pd.read_csv("VendorB.csv", keep_default_na = False)
VendorB_df

Unnamed: 0,instrument_name,internal_id,figi,ticker,currency,origin,market_sector,coupon,maturity_date
0,BP,imd_43535553,BBG000C05BD1,BP/LN,GBP,GB,equity,,
1,UKT 0 ⅜ 10/22/26,imd_34994599,BBG00ZF1T9P5,UKT 0.375 10/22/26,GBP,Great Britain,bond,0.0375,2026-10-22
2,Euro-Bund Futures (FGBL),imd_34588699,BBG012FFPKW7,RXM2,EUR,DE,future,,2022-06-30
3,EUR/USD 6M 20220101,imd_34876136,,,USD,usa,fxforward,,2022-07-01


## 2. Mastering instruments in LUSID

An instrument must have at least one unique identifier type, such as Figi or ClientInternal. We can map the 'figi' column to Figi and the 'internal_id' column to ClientInternal, to give two unique identifiers per instrument. Because the values in these columns are the same for both vendors, LUSID automatically merges equivalent records to the same instrument. 

We can also map columns in either file to the Isin and Ticker non-unique identifiers for each instrument.

In addition, LUSID automatically generates a third unique identifier for each instrument: a LUID.

In [4]:
# Obtain the LUSID Instruments API
instruments_api = api_factory.build(la.InstrumentsApi)

# Capture the set of LUIDs automatically generated when instruments are mastered, for later use
instrument_luids = set()

# Create a convenience function to call for each vendor dataframe
def create_and_upsert_instruments(vendor_dataframe):

    # Create a dictionary of instrument definitions
    definitions = {}

    # Iterate over each row in the vendor dataframe
    for index, security in vendor_dataframe.iterrows():

        # Map possible identifier columns to case-sensitive LUSID identifier type names
        identifier_columns = [
            ("isin", "Isin"), 
            ("figi", "Figi"), 
            ("ticker", "Ticker"),
            ("internal_id", "ClientInternal")
        ]
        # Create instrument identifiers
        identifiers = {}
        for identifier in identifier_columns:
             # Test whether identifier column exists and (if so) is not NaN or empty
            if (identifier[0] in security) and pd.notna(security[identifier[0]]) and (security[identifier[0]] != u''):
                identifiers[identifier[1]] = lm.InstrumentIdValue(
                    value = security[identifier[0]]
            )

        # Model equities
        if security["market_sector"] == "equity":
            # Create definitions
            definitions[security["instrument_name"]] = lm.InstrumentDefinition(
                name = security["instrument_name"],
                identifiers = identifiers,
                definition = lm.Equity(
                    instrument_type = "Equity",
                    dom_ccy = "GBP",
                    identifiers = {}
                )
            )
        # Model bonds
        elif security["market_sector"] == "bond":
            definitions[security["instrument_name"]] = lm.InstrumentDefinition(
                name = security["instrument_name"],
                identifiers = identifiers,
                definition = lm.Bond(
                    instrument_type = "Bond",
                    start_date = "2021-01-01",
                    maturity_date = security["maturity_date"],
                    dom_ccy = security["currency"],
                    flow_conventions = lm.FlowConventions(
                        currency = security["currency"],
                        payment_frequency = "6M",
                        day_count_convention = "ActualActual",
                        roll_convention = "NoAdjustment",
                        payment_calendars = [],
                        reset_calendars = [],
                        settle_days = 0,
                        reset_days = 0
                    ),
                    principal = 1,
                    coupon_rate = security["coupon"]
                )
            )
        # Model futures
        elif security["market_sector"] == "future":
            definitions[security["instrument_name"]] = lm.InstrumentDefinition(
                name = security["instrument_name"],
                identifiers = identifiers,
                definition = lm.Future(
                    instrument_type = "Future",
                    start_date = "2022-01-01",
                    maturity_date = security["maturity_date"],
                    identifiers = {},
                    contract_details = lm.FuturesContractDetails(
                        dom_ccy = security["currency"],
                        contract_code = "FGBL",
                        contract_month = "M",
                        contract_size = 100000,
                        convention = "ActualActual",
                        country = "DE",
                        description = security["instrument_name"],
                        exchange_code = "EUREX",
                        exchange_name = "Eurex",
                        ticker_step = 0.01,
                        unit_value = 10,
                    ),
                    contracts = 1,
                    underlying = lm.ExoticInstrument(
                        instrument_format = lm.InstrumentDefinitionFormat(
                            "custom", "custom", "0.0.0"
                        ),
                        content = "{}",
                        instrument_type = "ExoticInstrument",
                    )
                )
            )
        # Model FxForwards
        elif security["market_sector"] == "fxforward":
            definitions[security["instrument_name"]] = lm.InstrumentDefinition(
                name = security["instrument_name"],
                identifiers = identifiers,
                definition = lm.FxForward(
                    instrument_type = "FxForward",
                    start_date = "2022-01-01",
                    maturity_date = security["maturity_date"],
                    dom_ccy = "EUR",
                    dom_amount = 1000000,
                    fgn_ccy = security["currency"],
                    fgn_amount = -1215520.25
                )
            )
    
    # Upsert instruments to LUSID
    upsert_instruments_response = instruments_api.upsert_instruments(request_body=definitions)
    
    # Capture LUIDs
    for instr in list(upsert_instruments_response.values.values()):
        instrument_luids.add(instr.lusid_instrument_id)

    # Transform API response to a dataframe and show internally-generated unique LUID for each mastered instrument
    upsert_instruments_response_df = lusid_response_to_data_frame(list(upsert_instruments_response.values.values()))
    display(upsert_instruments_response_df[["name", "lusid_instrument_id"]])
    
    
# The first time we call this function, instruments are created
create_and_upsert_instruments(VendorA_df)
# The second time, instruments are updated (the name changes but the LUID stays the same)
create_and_upsert_instruments(VendorB_df)

Unnamed: 0,name,lusid_instrument_id
0,EURO-BUND FUTURE Jun22,LUID_00003D5Z
1,EURUSD 6M FWD,LUID_00003D61
2,BP PLC,LUID_00003D5D
3,UK Gilt 0.375 10/22/26,LUID_00003D65


Unnamed: 0,name,lusid_instrument_id
0,BP,LUID_00003D5D
1,UKT 0 ⅜ 10/22/26,LUID_00003D65
2,EUR/USD 6M 20220101,LUID_00003D61
3,Euro-Bund Futures (FGBL),LUID_00003D5Z


## 3. Adding properties to instruments

Instrument definitions do not have fields to store information about domicile.

We can add as many properties to instruments as we like, to store extra information. We'll add two properties to each instrument, to store the data from VendorA ('country_issue') and VendorB ('origin') separately and in its original form.

### 3.1 Creating property definitions

The first task is to create a property definition for each property.

In [5]:
# Obtain the LUSID Property Definition API
property_definition_api = api_factory.build(la.PropertyDefinitionsApi)

# Create a convenience function to call for each property definition
def create_and_upsert_property_definition(property_scope, property_code):
    
    # Create property definition specific to instruments, with a unique scope and code
    property_definition = lm.CreatePropertyDefinitionRequest(
        domain = "Instrument",
        scope = property_scope,
        code = property_code,
        display_name = property_code,
        data_type_id = lm.ResourceId(
            scope = "system",
            code = "string"
        )
    )
    
    # Upsert property definition to LUSID
    try:
        upsert_property_definition_response = property_definition_api.create_property_definition(
            create_property_definition_request = property_definition
        )
        print(f"Property definition created with the following key: {upsert_property_definition_response.key}")
    except lu.ApiException as e:
        if json.loads(e.body)["name"] == "PropertyAlreadyExists":
            logging.info(
                f"Property definition with the following key already exists: {property_definition.domain}/{property_definition.scope}/{property_definition.code}"
            )
    
# Create a property definition representing 'country_issue' from VendorA
create_and_upsert_property_definition("VendorA","country_issue")
# Create a property definition representing 'origin' from VendorB
create_and_upsert_property_definition("VendorB","origin")

INFO:root:Property definition with the following key already exists: Instrument/VendorA/country_issue
INFO:root:Property definition with the following key already exists: Instrument/VendorB/origin


### 3.2 Adding properties to instruments

We can now iterate over the dataframe from each vendor, adding properties with appropriate values to each instrument in turn.

In [6]:
# Create a convenience function to call for each vendor dataframe
def add_properties_to_instruments(vendor_dataframe, property_scope, property_code):   
    property_request = [
        lm.UpsertInstrumentPropertyRequest(
            identifier_type = "ClientInternal",
            identifier = security["internal_id"],
            properties = [
                lm.ModelProperty(
                    key = f"Instrument/{property_scope}/{property_code}",
                    value = lm.PropertyValue(
                        label_value = security[property_code]
                        )
                )
            ]
        )
        for index, security in vendor_dataframe.iterrows()
    ]   
    upsert_properties_response = instruments_api.upsert_instruments_properties(
                        upsert_instrument_property_request = property_request)
    
    print(f"Properties from {property_scope} updated at {(str(upsert_properties_response.as_at_date))}")
    
    
# Add 'country_issue' properties from VendorA to instruments    
add_properties_to_instruments(VendorA_df, "VendorA", "country_issue")
# Add 'origin' properties from VendorB to instruments 
add_properties_to_instruments(VendorB_df, "VendorB", "origin")

Properties from VendorA updated at 2022-03-02 09:14:45.988961+00:00
Properties from VendorB updated at 2022-03-02 09:14:46.147846+00:00


### 3.3 Confirming properties have been added

We must explicitly request LUSID to 'decorate' properties onto instruments when retrieving the latter, using each property's 3-stage key. For example, we can use our set of LUIDs captured earlier to specify the instruments to retrieve, and request the properties to decorate using the `property_keys` parameter.

In [7]:
# Retrieve our instruments, decorating particular properties onto response
get_instruments_response = instruments_api.get_instruments(
    identifier_type = "LusidInstrumentId", 
    request_body = list(instrument_luids),
    property_keys = ["Instrument/VendorA/country_issue", "Instrument/VendorB/origin"]
)

#Transform API response to a dataframe and show the properties for each instrument
get_instruments_response_df = lusid_response_to_data_frame(list(get_instruments_response.values.values()))
display(get_instruments_response_df[["name", 
                                     "properties.0.key", 
                                     "properties.0.value.label_value",
                                     "properties.1.key",
                                     "properties.1.value.label_value",
                                    ]])

Unnamed: 0,name,properties.0.key,properties.0.value.label_value,properties.1.key,properties.1.value.label_value
0,Euro-Bund Futures (FGBL),Instrument/VendorB/origin,DE,Instrument/VendorA/country_issue,
1,EUR/USD 6M 20220101,Instrument/VendorB/origin,usa,Instrument/VendorA/country_issue,United States
2,UKT 0 ⅜ 10/22/26,Instrument/VendorB/origin,Great Britain,Instrument/VendorA/country_issue,united_kingdom
3,BP,Instrument/VendorB/origin,GB,Instrument/VendorA/country_issue,United Kingdom


## 4. Creating a derived property

### 4.1 Specifying a derivation formula

We want our derived property to:

1. Merge the 'country_issue' and 'origin' properties to account for missing values (using the coalesce function), and then
2. Normalise the values to a standard set (using the map function).

In [8]:
# The coalese function prefers values from the first property supplied ('country_issue'). Values from the second
# property ('origin') are only used if the first has no values. The map function then maps 
# values to a standard set, for example 'DE' to 'Germany'
derived_property_formula = """map(coalesce(Properties[Instrument/VendorA/country_issue], 
                                           Properties[Instrument/VendorB/origin], 'Unknown'): 
                                                'United Kingdom'='UK', 'united_kingdom'='UK', 
                                                'Great Britain'='UK', 'GB'='UK', 'DE'='Germany',
                                                'United States'='USA', 'usa'='USA', default='Unknown')"""

### 4.2 Creating a derived property definition

In [9]:
derived_property_definition = lm.CreateDerivedPropertyDefinitionRequest(
    domain = "Instrument",
    scope = "Derived",
    code = "Domicile",
    display_name = "Domicile",
    data_type_id = lm.ResourceId(code="string", scope="system"),
    derivation_formula = derived_property_formula,
)

# Upsert derived property definition to LUSID
try:
    derived_property_definition_response = property_definition_api.create_derived_property_definition(
        create_derived_property_definition_request = derived_property_definition
    )
    print(f"Derived property definition created with the following key: {derived_property_definition_response.key}")
except lu.ApiException as e:
    if json.loads(e.body)["name"] == "PropertyAlreadyExists":
        logging.info(
            f"Derived property definition with the following key already exists:{derived_property_definition.domain}/{derived_property_definition.scope}/{derived_property_definition.code}"
        )

INFO:root:Derived property definition with the following key already exists:Instrument/Derived/Domicile


### 4.3 Confirming the derived property can be calculated

We don't have to explicitly add derived properties to instruments; rather, LUSID calculates values on-the-fly when retrieving instruments (see the `property_keys` parameter for the request to decorate the derived property onto retrieved instruments).

In [10]:
# Retrieve our instruments, decorating particular properties onto response
get_instruments_response = instruments_api.get_instruments(
    identifier_type = "LusidInstrumentId",
    request_body = list(instrument_luids),
    property_keys = ["Instrument/VendorA/country_issue", "Instrument/VendorB/origin", "Instrument/Derived/Domicile"]
)

# Transform API response to a dataframe and show the properties for each instrument
get_instruments_response_df=lusid_response_to_data_frame(list(get_instruments_response.values.values()))
display(get_instruments_response_df[["name", 
                                     "properties.0.key", 
                                     "properties.0.value.label_value",
                                     "properties.1.key",
                                     "properties.1.value.label_value",
                                     "properties.2.key",
                                     "properties.2.value.label_value",
                                    ]])

Unnamed: 0,name,properties.0.key,properties.0.value.label_value,properties.1.key,properties.1.value.label_value,properties.2.key,properties.2.value.label_value
0,UKT 0 ⅜ 10/22/26,Instrument/VendorA/country_issue,united_kingdom,Instrument/VendorB/origin,Great Britain,Instrument/Derived/Domicile,UK
1,Euro-Bund Futures (FGBL),Instrument/VendorA/country_issue,,Instrument/VendorB/origin,DE,Instrument/Derived/Domicile,Germany
2,BP,Instrument/VendorA/country_issue,United Kingdom,Instrument/VendorB/origin,GB,Instrument/Derived/Domicile,UK
3,EUR/USD 6M 20220101,Instrument/VendorA/country_issue,United States,Instrument/VendorB/origin,usa,Instrument/Derived/Domicile,USA
