<a href="https://colab.research.google.com/github/mohit-sentieo/sentieo-public-api-examples/blob/master/Sentieo_Search_Public_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Search API Public Demo 
In this Collab Doc, we will show you how to use the search api. First we cover the basic features. Then we provide specific examples for edgar filings and notes to demonstrate more advanced features. Then we demonstrate some common errors and how to handle them.

For each block of code read the text above the block and then run the code. This first block of code sets up your notebook environment. Go ahead and run it.

In [0]:
!pip install ipywidgets 
!jupyter nbextension enable --py widgetsnbextension
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

### API CODE

This is the code which makes all the requests to the Sentieo API. It is structured like a class with functions for different functions. This does not have all the apis but only those required for this use case. Feel free to copy this class and the model directly into your code or to contribute here by adding more apis.

Please fill in your own Api Key and Email at the top and run the block.

In [0]:
from typing import NamedTuple, List, Mapping
from datetime import date
import requests, json
from enum import Enum
import time

APIKEY = 'YJgvtmHEWY39vxITA2fDW8cdrH6s4Epz9bYvlACE'
BASE_URL = 'https://api.sentieo.com/v1'
EMAIL = 'your@email.com'
default_headers = {
                  'x-api-key': APIKEY
                }

subtypes_map = json.loads(requests.request('GET','https://raw.githubusercontent.com/mohit-sentieo/sentieo-public-api-examples/master/subtypes.json').text)

class DocTypeEnum(Enum):
  EDGARFILINGS = 'ef'
  PRESENTATIONS = 'ppt'
  PRESS_RELEASES = 'ni'
  NEWS = 'nw'
  SD = 'sd'
  GLOBAL_FILINGS = 'gbf'
  BROKER_RESEARCH = 'rr'
  TRANSCRIPTS = 'tt'
  NOTES = 'note'

  @classmethod
  def get_type_names(cls):
    return [ m.name for m in cls.__members__.values() ]
  
  def get_subtypes(self):  
    return [ v for va in subtypes_map[self.value].values() for v in va  ]
    


class SearchParams(NamedTuple):
  size:int = 20
  start:int = 0
  tickers:List[str] = []
  use_synonyms:bool = False
  query:str = ''
  date_from:date = None
  date_to:date = None
  doc_type:List[DocTypeEnum] = []
  doc_type_filters:Mapping[DocTypeEnum,Mapping[str,List[str]]] = {}
  sectors:List[str] = []
  subsectors:Mapping[str,List[str]] = {}


class SentieoAPI:
  def __init__(self, debug=True):
    self.debug = debug

  def throttled_request(self, *args, **kwargs):
    response = requests.request(*args, **kwargs)
    if response.status_code == 429:
      time.sleep(0.2)
      response = requests.request(*args, **kwargs)
    return response

  def _make_post_api_call(self, url, payload=None, headers=None):
    payload = {} if payload is None else payload
    headers = { 'Content-Type': 'application/json' } if headers == None else headers
    headers.update(default_headers)
    response = self.throttled_request("POST", BASE_URL + url, headers=headers, data = json.dumps(payload))
    if self.debug:
      print(url, response.status_code)
    return response.status_code, response.json()
  
  def _make_get_api_call(self, url, params=None, headers=None):
    params = {} if params is None else params
    headers = { } if headers == None else headers
    headers.update(default_headers)
    response = self.throttled_request("GET", BASE_URL + url, headers=headers, params = params)
    if self.debug:
      print(url, response.status_code)
    return response.status_code, response.text

  def fetch_docs_from_search(self, filters:SearchParams=None):
    #date_range_from.value.strftime("%d-%b-%Y")
    filters = filters._asdict()
    if filters['date_from'] is not None:
      filters['date_range_from'] = filters['date_from'].strftime("%d-%b-%Y")
      del filters['date_from']
    
    if filters['date_to'] is not None:
      filters['date_range_to'] = filters['date_to'].strftime("%d-%b-%Y")
      del filters['date_to']

    doc_type_list = []
    for dt in filters['doc_type']:
      if dt in filters['doc_type_filters']:
        doc_type_list.append({'name': dt.value, **filters['doc_type_filters'][dt]  })
      else:
        doc_type_list.append(dt.value)

    filters['doc_type'] = doc_type_list
    del filters['doc_type_filters']

    sector_list = []
    for sc in filters['sectors']:
      if sc in filters['subsectors']:
        sector_list.append({'name': sc, 'subsectors': filters['subsectors'][sc]  })
      else:
        sector_list.append(sc)
    del filters['subsectors']
    filters['sectors'] =  sector_list

    if self.debug:
      print(filters)
    status, result = self._make_post_api_call('/documents/search', filters)
    return result
  
  def fetch_doc_content(self, doc_id):
    status, result = self._make_get_api_call('/documents/get', { 'id': doc_id })
    return result
  
  def fetch_doc_hits(self, doc_id, query, use_synonyms='false'):
    status, result = self._make_get_api_call('/documents/hits', {'query': query, 'doc_id': doc_id, 'use_synonyms': use_synonyms })
    return json.loads(result)

sAPI = SentieoAPI()

### UI CODE
This code is only for demo purposes. It handles the UI which is rendered below.

In a typical scenario, you will have your own UI built using Javascript which will send requests to your server. You server will then communicate with sentieo API. But since we cannot have that in a notebook, we have created a very simple UI in python itself. You might want to look at the submit_callback function which takes values from the UI and calls the sentieo API class.

Run this code and UI will render. You can then interact with the UI. To refresh the UI, run this code again.

In [0]:
tickers_ui = widgets.Text(
    value='msft, aapl',
    placeholder='Tickers seperated by commas',
    description='Tickers',
    disabled=False
)
start_ui = widgets.BoundedIntText(
    value=10,
    min=0,
    max=10,
    step=1,
    description='Skip docs',
    disabled=False
)
size_ui = widgets.BoundedIntText(
    value=10,
    min=0,
    max=2000,
    step=1,
    description='Total Docs',
    disabled=False
)
synonyms_ui = widgets.Checkbox(
    value=False,
    description='USE SYNONYMS',
    disabled=False,
    indent=False
)
sort_ui = widgets.Dropdown(
    options=['filing_date:desc' ,'filing_date:asc', 'score:desc', 'score:asc'],
    value='filing_date:desc',
    description='Order By',
    disabled=False,
)
query_ui = widgets.Text(
    value='',
    placeholder='in:title sales',
    description='Query',
    disabled=False
)
doc_type_ui = widgets.SelectMultiple(
    options=DocTypeEnum.get_type_names(),
    #rows=10,
    description='document types',
    disabled=False
)
doc_subtype_ui = widgets.SelectMultiple(
    options=[],
    description='Subtypes',
    disabled=True
)
date_to_ui = widgets.DatePicker(
    description='End Date',
    disabled=False
)
date_from_ui = widgets.DatePicker(
    description='Start Date',
    disabled=False
)

submit_button = widgets.Button(
    description='Fetch Documents',
    disabled=False,
    button_style='info', # 'success', 'info', 'warning', 'danger' or ''
)

doc_id_ui = widgets.Text(
    value='',
    placeholder='4ejak2njfi',
    continuous_update=False,
    description='Enter a Doc Id to load',
    disabled=False,
    layout={"width": "300px", "margin": "50px"}
)


output_html = widgets.Output(layout={'border': '1px solid black'})

document_html = widgets.Output(layout={'border': '1px solid black'})
document_html.layout.height = '500px'

def doc_id_handler(change):
  if change['name'] == 'value' and len(change['new']) > 23 :
    document_html.clear_output(wait=True)
    doc_html = sAPI.fetch_doc_content(change['new'])
    doc_html = '<div style="background:white;font-color:black;">' + doc_html + '</div>'
    with document_html:
      display(widgets.HTML(value=doc_html.replace("class=\"t ", "class=\" ")))

def doc_subtype_handler(change):
  if change['name'] == 'value' and len(change['new']) > 0:
    doc_subtype_ui.options = [ st['label'] + ' : ' + str(st['value']) for st in DocTypeEnum[change['new'][0]].get_subtypes() ]
    doc_subtype_ui.description = change['new'][0]
    doc_subtype_ui.disabled = False
    

def build_form_ui():
  date_layout = widgets.HBox([date_from_ui, date_to_ui])
  text_layout = widgets.HBox([query_ui, tickers_ui])
  synonyms_layout = widgets.HBox([synonyms_ui, sort_ui])
  size_layout = widgets.HBox([start_ui, size_ui])
  doc_type_layout = widgets.HBox([doc_type_ui,doc_subtype_ui])
  doc_type_ui.observe(doc_subtype_handler)
  display(widgets.VBox([text_layout, date_layout, synonyms_layout, size_layout, doc_type_layout, submit_button]))

def doc_to_html(doc):
  return "<div style=\"border: 2px solid #888; padding: 5px; margin:5px;max-width: 300px\"><h4>{title}</h4><p>#{doc_id}</p><p> {ticker}, {country} </p> <p>{doc_subtype}, {doc_type}</p></div>".format(**doc)

def html_separator():
  return "<br><br>"

def enclose_doc_list(html):
  return '<div style="width:300px;height:600px;overflow:scroll;margin:100px;background:white;">' + html + '</div>'

def submit_callback(event):
  output_html.clear_output(wait=True)
  doc_type = [ DocTypeEnum[dt] for dt in doc_type_ui.value ] 
  searchparam = SearchParams(tickers=tickers_ui.value.strip().split(","), query=query_ui.value, start=start_ui.value, size=size_ui.value, date_from=date_from_ui.value, date_to=date_to_ui.value, doc_type=doc_type, use_synonyms=synonyms_ui.value)
  docs = sAPI.fetch_docs_from_search(searchparam)
  doc_list_html = html_separator().join([ doc_to_html(doc) for doc in docs['result']['docs'] ])
  final_html = enclose_doc_list(doc_list_html)
  html_widget = widgets.HTML(
      value=final_html
  )
  html_widget.layout.height = "400px"
  html_widget.layout.background = "white"
  with output_html:
    display(html_widget)
  if len(docs['result']['docs']) > 0:
    doc_id_ui.value = docs['result']['docs'][0]['doc_id']

submit_button.on_click(submit_callback)
build_form_ui()
display(doc_id_ui)
doc_id_ui.observe(doc_id_handler)
display(widgets.HBox([ output_html, document_html ]))

ERROR! Session/line number was not unique in database. History logging moved to new session 60


VBox(children=(HBox(children=(Text(value='', description='Query', placeholder='in:title sales'), Text(value='m…

Text(value='', continuous_update=False, description='Enter a Doc Id to load', layout=Layout(margin='50px', wid…

HBox(children=(Output(layout=Layout(border='1px solid black')), Output(layout=Layout(border='1px solid black',…

### Note Search & Edgar Filings
This section goes into some advanced use cases for edgar filing and notes. 
We will cover, how to make requests for particular edgar filings as well as how to filter by note authors, categories etc.

- How to search for specific document types
- How to search by Note Authors, Note Topics, Note Origin and Note Type
- How to search by Broker Source and Research Report Style and Reasons
- How to search for subsectors

In [2]:
def search_for_10k_and_10q():
  doc_type_filters = { DocTypeEnum.EDGARFILINGS: { 'subtypes': ['10-k', '10-q' ]} }
  params = SearchParams(doc_type=[ DocTypeEnum.EDGARFILINGS ], doc_type_filters=doc_type_filters,size=100)
  result = sAPI.fetch_docs_from_search(params)
  return result

def search_for_notes():
  doc_type_filters = { DocTypeEnum.NOTES: { 'subtypes': ['typed', 'attachment'], 'authors': [ 'mohit.kumar' ], 'categories': [ 'General' ], 'tags': [ 'cs', 'sales'] } }
  params = SearchParams(doc_type=[ DocTypeEnum.NOTES ], doc_type_filters=doc_type_filters)
  result = sAPI.fetch_docs_from_search(params)
  return result

def search_broker_research():
  doc_type_filters = { DocTypeEnum.BROKER_RESEARCH: { 'ctbids': [ 'se_11097', 'se_11835' ],'reasons': ['rr_reasons_3'], 'styles': [ 'mohit.kumar' ], 'categories': [ 'General' ], 'tags': [ 'cs', 'sales'] } }
  params = SearchParams(doc_type=[ DocTypeEnum.BROKER_RESEARCH ], doc_type_filters=doc_type_filters)
  result = sAPI.fetch_docs_from_search(params)
  return result

def search_by_subsectors():
  subsectors = { 'Health Care': ['Hospital'] }
  params = SearchParams(sectors=[ 'Health Care' ], subsectors=subsectors, tickers=['msft'])
  result = sAPI.fetch_docs_from_search(params)
  return result

kandqdocs = search_for_10k_and_10q()

OrderedDict([('size', 100), ('start', 0), ('tickers', []), ('use_synonyms', False), ('query', ''), ('date_from', None), ('date_to', None), ('doc_type', [{'name': 'ef', 'subtypes': ['10-k', '10-q']}]), ('sectors', [])])
/documents/search 200


In [3]:
kandqdocs

{'response': {'msg': ['Success'], 'status': True},
 'result': {'doc_subtype': ['10-k', '10-q'],
  'doc_type': ['ef'],
  'docs': [{'company_name': 'ALTAIR INTERNATIONAL CORP.',
    'country': 'United States',
    'country_code': 'us',
    'doc_id': '5ec806f10972a46f2445769b',
    'doc_subtype': '10-k',
    'doc_type': 'ef',
    'filing_date': '2020-05-22T17:07:12',
    'is_pdf_doc': False,
    'sectors': ['Other'],
    'size': 19,
    'subsectors': ['Other'],
    'ticker': 'atao',
    'tickers': ['atao'],
    'title': '10-K FY 2019'},
   {'company_name': 'QuickLogic Corporation',
    'country': 'United States',
    'country_code': 'us',
    'doc_id': '5ec8044a0972a46f2cc4f35b',
    'doc_subtype': '10-q',
    'doc_type': 'ef',
    'filing_date': '2020-05-22T16:55:28',
    'is_pdf_doc': False,
    'sectors': ['Information Technology'],
    'size': 39,
    'subsectors': ['Semiconductors'],
    'ticker': 'quik',
    'tickers': ['quik', 'qkl:gr'],
    'title': '10-Q FY20 Q1'},
   {'company_n

### Parallel Execution
Here is a very simple multithreaded implementation. This is only for demonstration or simple use cases. Do not use this in production.

In [0]:
import requests
from threading import Thread
from typing import List, Mapping
import sys
import queue

class ThreadRunner:
    def __init__(self, fn_to_run, default_params):
        self.fn_to_run = fn_to_run
        self.default_params = default_params
        self.q = queue.Queue()
        self.results = []

    def doWork(self):
        while True:
            params = self.q.get()
            params = {**self.default_params, **params}
            res = self.fn_to_run(**params)
            self.results.append(res)
            self.q.task_done()

    def start(self, params_list:List[dict]) -> List[dict]:
        for i in range(len(params_list)):
            t = Thread(target=self.doWork)
            t.daemon = True
            t.start()
        try:
            for params in params_list:
                self.q.put(params)
            self.q.join()
            return self.results
        except KeyboardInterrupt:
            sys.exit(1)
    
    def reset(self):
        self.results = []

BATCH_SIZE = 5
def make_batches(lst):
    return [ lst[ i: i + BATCH_SIZE ] for i in range(0, len(lst), BATCH_SIZE) ]

doc_ids = [ { 'doc_id': d['doc_id'] }  for d in kandqdocs['result']['docs'] ]
hit_results = []
for batch in make_batches(doc_ids):
  thread_runner = ThreadRunner(fn_to_run=sAPI.fetch_doc_hits, default_params={'query': 'sales'})
  batch_results = thread_runner.start(batch)
  hit_results.append(batch_results)

### Get all Docs between two dates

Here is the code to get all documents between two date ranges. It splits the time interval into weeks and then gets all the docs for that week. This is done to improve pagination performance. 


In [0]:
from datetime import timedelta
def get_all_docs_in_date_range(start_date, end_date, doctypes):
  curr_date = start_date
  total_results = []
  while curr_date < end_date:
    s = curr_date
    e = curr_date + timedelta(days=7)
    e = e if e <= end_date else end_date
    weekly_result = []
    total = 0
    start = 0 
    size = 2000
    while start <= total:
      params = SearchParams(doc_type=doctypes , size=size, start=start, date_from=s, date_to=e)
      result = sAPI.fetch_docs_from_search(params)
      weekly_result.append(result['result']['docs'])
      total = result['result']['total']
      start = start + size
    total_results = total_results + weekly_result
    curr_date = e
  return total_results
  

### Error Handling

Here we provide common errors and how to handle them. Notice the throttled 429 is already handled in the throttled request function above.

### Admin Key Functionality

For an organization, an admin key can be used to make requests on behalf of any user of the organization. First we show you the admin key usage. Then we will cover an edge case which comes up often, which is searching for all notes in your organization.