# HAPI V2 Training
## Chapter 1 - Bulk
#### Table of Contents

- Section 1 - JSON and Python Recap (optional)
- Section 2 - Initialization
- Section 3 - Exploration (optional)
- Section 4 - Download Dataset
- Section 5 - SSE Notification

### Housekeeping

1) cell output scroll setup:
   * highlight all cells (Ctrl+A)
   * right-click on any and select "Enable Scrolling for Outputs"

### Section 1 - JSON and Python Recap

In [None]:
# JSON string
a = '{"name":"John Doe", "email":"jdoe1234@bloomberg.net", "uuid":"12345678"}'

# python dict
b =  {"name":"John Doe", "email":"jdoe1234@bloomberg.net", "uuid":"12345678"}

In [None]:
# accessing string: by position
print(a[9:17])

In [None]:
# accessing dictionary: by key
print(b['name'])

In [None]:
# convert a JSON string to python dictionary
import json
c = json.loads(a)
print(c==b)

- Sample JSON from HAPI field query result
- Convert it to python dictionary and extract information

In [None]:
import json
d = '''
{
    "@context": {
        "@base": "https://api.bloomberg.com/eap/catalogs/bbg/fields/idBbGlobal/",
        "@vocab": "https://api.bloomberg.com/eap/ontology#"
    },
    "@id": "",
    "@type": [
        "Collection",
        "BasicContainer",
        "Field"
    ],
    "Created": "Mon, 17 Aug 2009 00:00:00 GMT",
    "DL Category": "User Entered Information",
    "DL Commercial Model Category": "Open Source",
    "DL: Extended Bulk": true,
    "DL:Bulk": true,
    "Data License": true,
    "Description": "Financial Instrument Global Identifier",
    "Field Id": "ID135",
    "Field Type": "Character",
    "IRI": "https://api.bloomberg.com/eap/catalogs/bbg/fields/idBbGlobal/",
    "Is Abstract": false,
    "Loading Speed": "Hare",
    "Mnemonic": "ID_BB_GLOBAL",
    "Old Mnemonic": null,
    "Platform: Static": false,
    "Platform: Streaming": false,
    "Platform: Terminal Required": false,
    "Range": "FinancialInstrument",
    "SAPI New Security Setup": true,
    "Standard Decimal Places": null,
    "Standard Width": null,
    "SuperPropertyIRI": "https://api.bloomberg.com/eap/catalogs/bbg/fields/instrumentIdentifier/",
    "YK: Commodity": true,
    "YK: Corporate": true,
    "YK: Currency": true,
    "YK: Equity": true,
    "YK: Index": true,
    "YK: Money Market": true,
    "YK: Mortgage": true,
    "YK: Municipal": true,
    "YK: Preferred": true,
    "YK: US Government": true,
    "description": "Twelve character, alphanumeric identifier. The first 2 characters are upper-case consonants (including \\"Y\\"), the third character is the upper-case \\"G\\", characters 4 -11 are any upper-case consonant (including \\"Y\\") or integer between 0 and 9, and the last character is a check-digit. An identifier is assigned to instruments of all asset classes, is unique to an individual instrument and once issued will not change for an instrument. For Equity instruments, ID135 is assigned specifically at the exchange/trading venue level.",
    "identifier": "idBbGlobal",
    "rdf:langString": null,
    "title": "Financial Instrument Global Identifier",
    "xsd:fractionDigits": null,
    "xsd:length": 12,
    "xsd:maxExclusive": null,
    "xsd:maxInclusive": null,
    "xsd:maxLength": null,
    "xsd:minExclusive": null,
    "xsd:minInclusive": null,
    "xsd:minLength": null,
    "xsd:pattern": "((BBG)[BCDFGHJKLMNPQRSTVWXYZ\\\\d]{8}\\\\d)",
    "xsd:type": "xsd:token"
}
'''

# fld is a python dictionary, from which we can easily extract useful info:
fld = json.loads(d)

print("Clean Name :", fld['identifier'])
print("Mnemonic   :", fld['Mnemonic'])
print("Field Id   :", fld['Field Id'])

### Section 2 - Initialization


In order to ensure that authentication is correctly handled, "beap_auth.py" script, provided by Bloomberg, is recommended. It is available for download from the Customer Service Center (CSC):
https://service.blpprofessional.com/portal/downloadcenter
(Choose the Python sample code under HAPI section)

`pip install -r requirements.txt`


Also, client need to download credential file from Enterprise Console and whitelist their IP. This will be covered in separate training sessions.

In [None]:
!pip install -r requirements.txt

#### 2.1 Initialize a 'Session'
The below code will do the following:
- import the necessary libraries to use such as beap_auth
- extract the credentials information from our credential.txt file
- start the session to be able to start interacting with HAPI

#### 2.2 Documentation

JWT Authentication

https://console.bloomberg.com/#/firm/OTAwMQ==/dev-console/docs?specUri=%2Fauthenticate%2Fv1%2Fdocumentation.json

Catalogs

https://service.blpprofessional.com/track_download/assets/HAPI/#tag/catalog

https://service.blpprofessional.com/track_download/assets/data-license/#2-5-catalogs

In [None]:
import requests
# Get ``Credentials`` and ``BEAPAdapater`` classes from beap_auth.py.
# JWT authentication tokens are constructed using the ``Credentials`` class.
# A token is injected into every request using the ``BEAPAdapter`` class.
from beap_lib.beap_auth import Credentials, BEAPAdapter

# Obtain credentials
# Replace the path with path to your credential file from console.bloomberg.com
CREDS = Credentials.from_file('Credentials/credential_Bulk.txt')

# Initialize the session
session = requests.Session()
session.mount('https://', BEAPAdapter(CREDS))


In [None]:
HOST = "https://api.bloomberg.com" # api.blpprofessional.com for China
CATALOG = '/eap/catalogs/bbg/' # always 'bbg' for BULK

base_url=HOST+CATALOG
print("Base URL:", base_url)

response = session.get(base_url)
response
#response.json()

### 2.1 Trouble Shooting
#### Common Errors

- Expired Credential
```javascript
{'errors': [{'title': 'unauthorized_client',
   'id': '269281e3-c4a5-4bb0-cb13-1614f8ae508a',
   'meta': {'server-time': 1639012888},
   'errorCode': 'unauthorized-client',
   'status': 401,
   'detail': 'Credential has expired.'}],
 'error_description': 'Credential has expired.',
 'error': 'unauthorized_client'}
```

- IP not whitelist
```javascript
 {'errors': [{'title': 'unauthorized_client',
   'id': '43e63713-ebe5-49a1-c99a-cb9c004a8f20',
   'meta': {'server-time': 1639013210},
   'errorCode': 'unauthorized-client',
   'status': 401,
   'detail': 'Invalid IP, IP 10.144.58.197 not whitelisted'}],
 'error_description': 'Invalid IP, IP 10.144.58.197 not whitelisted',
 'error': 'unauthorized_client'}
```

#### Support Team:
You can search all HAPI request on Humio:

https://humio.prod.bloomberg.com/bci/search

Input its request ID (preferred), URL path, client_id, DL account, etc.



TEAM:
https://cms.prod.bloomberg.com/team/pages/viewpage.action?pageId=1324615504

In [None]:
response.headers

In [None]:
print(response.headers['X-Request-ID'])

### Section 3 - Exploration

Now that we are set up, we can start hitting various URLs to replicate the same functionality that we see on the BEAP website. The below sample code does just that. There are a variety of URLs specified, split between the discovery of fields and files.

It is good to first off start with the catalogs available which show the structure of the API.

#### 3.1 Explore the datasets
Now that we are all set, we can start querying various URLs.

In [None]:
dataset_url = base_url+'datasets/'
print('GET URL:', dataset_url)
response = session.get(dataset_url)
response.json()
print("Total datasets:", response.json()['totalItems'])


#### 3.2 Show only my *subscribed* datasets (V2 new feature)
https://service.blpprofessional.com/track_download/assets/HAPI/#section/New-Features/Show-Subscribed-Datasets

In [None]:
url = dataset_url+'?subscribed=true'
print('GET URL', url)
response = session.get(url)
print("Subscribed datasets:", response.json()['totalItems'])

#### 3.3 Search datasets
https://service.blpprofessional.com/track_download/assets/HAPI/#section/Getting-Started/Search

In [None]:
url = dataset_url + '?q=equity+asia'
print('GET URL', url)
response = session.get(url)
response.json()

#### 3.4 Pagination
https://service.blpprofessional.com/track_download/assets/HAPI/#section/Getting-Started/Pagination

In [None]:
url = dataset_url + '?page=20'
print("GET URL:", url)
response = session.get(url)
#response.json()
print("Total pages:", response.json()['pageCount'])
print("Navigation:", json.dumps(response.json()['view'], indent=4))

#### 3.5 Explore fields

In [None]:
fields_url = base_url + 'fields/'
url = fields_url + 'pxLast/'
print("GET URL:", url)
response = session.get(url)
response.json()

<div class="alert alert-block alert-success">

#### Practice Time!
</div>

In [None]:
#Discovery of files
##########################
url1 = '/eap/catalogs/bbg/datasets/'
url2 = '/eap/catalogs/bbg/datasets/?page=1&q=apptopia'
url3 = '/eap/catalogs/bbg/datasets/?q=equity'
url4 = '/eap/catalogs/bbg/datasets/equityNamr/'
url5 = '/eap/catalogs/bbg/datasets/equityNamr/snapshots/'
url6 = '/eap/catalogs/bbg/datasets/equityNamr/snapshots/20200505/'
url7 = '/eap/catalogs/bbg/datasets/equityNamr/snapshots/20200505/distributions/'
url8 = '/eap/catalogs/bbg/fields/?q=pxLastPostSession'

#Discovery of fields
##########################
url9 = '/eap/catalogs/bbg/fields/pxLast/'
url10 = '/eap/catalogs/bbg/fields/?q=close+price'

The above are some sample URLs that can be queried. Any of the variables above can be used:

In [None]:
response = session.get(HOST+url10) # replace url1 with the url suffix
response.json()

### Section 4 - Download Dataset

In [None]:
from beap_lib.beap_auth import download

download?

In [None]:
url = base_url + 'datasets/equityAsia2/snapshots/latest/distributions/equityAsia2.csv'

download(session, url, './equityAsia2.csv')

#### 4.1 Downloading a specific file from a given dataset (obsolete)

The below code allows users to download a given file from a specific url and save it to local

In [None]:
def download_distribution(session, url, output_file, chunk_size=8192,
                          stream=True, headers={'Accept-Encoding': 'Identity'}):
    """
    Function to download the data to an output directory

    This function opts for the gzip output encoding by default and allows the
    user to specify the output location of this download. This function works
    for a single endpoint.

    You may set the 'Accept-Encoding' header to 'Identity' if you do not
    want receive the gzipped file.

    Set 'chunk_size' to a larger byte size to speed up download process on
    larger downloads.
    """
    print(datetime.now(), 'Start downloading:', url)
    response = session.get(url, stream=stream, headers=headers)
    with open(output_file, 'wb') as out_file:
        for chunk in response.raw.stream(chunk_size, decode_content=False):
            out_file.write(chunk)
        response.close()
        print(datetime.now(), 'File saved into:', output_file)
        return response

In [None]:
# Download a zipped distribution to a specified location
# Replace below: PATH with the distribution you want to download and
# OUTPUT_FILE as the location and filename you want downloaded


equitynamr_distributions_path = '/eap/catalogs/bbg/datasets/equityNamr/snapshots/latest/distributions/equityNamrSample.csv'
sample_url = HOST+equitynamr_distributions_path

output_file = "equityNamrSample.csv"
response = download_distribution(session, sample_url, output_file)
print("Status: {s}".format(s=response.status_code))
print("Content-Encoding: {h}".format(h=response.headers['Content-Encoding']))
print("Content-Length: {b} bytes".format(b=response.headers['Content-Length']))

#### 4.2 Explore all historical snapshots of a dataset

In [None]:
# Change the page number to see different result
equitynamr_path = '/eap/catalogs/bbg/datasets/equityNamr/snapshots/'
url = HOST+equitynamr_path
print("GET URL:", url)
response = session.get(url)

# Grab the total page count
page_count = response.json()['pageCount']
print("Page count:", page_count)

# Generate a list of snapshots
snapshots = [item['identifier'] for item in response.json()['contains']]
snapshots

## Section 5 - SSE Notification

### 5.1 Listen on SSE endpoint 
- https://api.bloomberg.com/eap/notifications/sse
- For demo purpose, all notifications are stored into the in-memory list 'sseList'
- https://service.blpprofessional.com/track_download/assets/data-license/#2-9-push-notifications

In [None]:
from sseclient import SSEClient
from threading import Thread
sseThread = None

# whenever new notification arrived, it will be appended into this list
def sse_run():
    global sseHB
    global sseList
    
    # Create an empty notification list
    sseList = []
    sseHB = 0
    
    # Initiate SSE client
    sse_url = HOST+'/eap/notifications/sse'
    print(datetime.now(), 'Start listening notification:', sse_url)
    sse_client = SSEClient(sse_url, session)
    print(datetime.now(), 'SSE connected')
    
    while(True):
        event = sse_client.read_event()
        if event.is_heartbeat():
            # Heartbeat generates once every 10 seconds
            # do nothing for heartbeat
            sseHB+=1
        else:
            # If not heartbeat, append the event into notification list
            sseList.append(event.data)

# create a worker thread and run it in the background

def start_sse():
    global sseThread
    if sseThread is not None:
        print('SSE thread already started')
        return
    sseThread = Thread(target=sse_run)
    sseThread.start()
    
start_sse()

#### 5.2 Printing the notifications received so far
- Even there is no notifications, heartbeat will still be sent every 10 seconds

In [None]:
print('Total notifications:', len(sseList), '\tTotal heartbeats:', sseHB)
print('The first 10 notifications:')
for i in range(min(len(sseList), 10)):
    sse=json.loads(sseList[i])
    print(i, sse['generated']['@id'])

### 5.3 Download file according to notification event

In [None]:
import os.path
def download_by_sse(sseInput):   
    if os.path.isfile(sseInput):
        # load from file
        sse = json.load(open(sseInput, 'r'))
    else:
        # load from string
        sse = json.loads(sseInput)    
    url = sse['generated']['@id']

    download_distribution(session, url, 'output.dat')
    
      
download_by_sse('sample_sse_bulk.json')

### 5.4 Validate file integrity by comparing digest value

In [None]:
import hashlib 

sse = json.load(open('sample_sse_bulk.json', 'r'))
print("SSE digest :", sse['generated']['digest']['digestValue'])

with open('output.dat', 'rb') as f:
    fileDigest=hashlib.sha512(f.read()).hexdigest()
    print('File digest:', fileDigest)