# Logging and Reporting

## Table of contents
* [Parameters](#params)
* [Imports and setup](#imports)
* [Try every server](#every-server)
* [Report](#report)

<a class="anchor" id="params"></a>
## Parameters
The first code cell must contain parameters with string values for compatibility with Times Square.

See: https://rsp.lsst.io/v/usdfdev/guides/times-square/index.html

In [1]:
#Parameters
env = 'tucson'  # usdf-dev, tucson, slac, summit
record_limit = '9999'
response_timeout = '3.05'  # seconds, how long to wait for connection
read_timeout = '20'  # seconds

<a class="anchor" id="imports"></a>
## Imports and General Setup

In [2]:
import requests
from collections import defaultdict
import pandas as pd
from pprint import pp

In [3]:
limit = int(record_limit)
timeout = (float(response_timeout), float(read_timeout))

# Env list comes from drop-down menu top of:
# https://rsp.lsst.io/v/usdfdev/guides/times-square/
envs = dict(
    #rubin_usdf_dev = '',
    #data_lsst_cloud = '',
    #usdf = '',
    #base_data_facility = '',
    summit = 'https://summit-lsp.lsst.codes',
    usdf_dev = 'https://usdf-rsp-dev.slac.stanford.edu',
    #rubin_idf_int = '',
    tucson = 'https://tucson-teststand.lsst.codes',
)
envs

{'summit': 'https://summit-lsp.lsst.codes',
 'usdf_dev': 'https://usdf-rsp-dev.slac.stanford.edu',
 'tucson': 'https://tucson-teststand.lsst.codes'}

<a class="anchor" id="every-server"></a>
## Try to access every Server, every Log in our list
We call the combination of a specific Server and specific Log a "service".
This is a First Look.  As such, we don't try to get a useful list of records. 
Instead, we save a few pieces of data from each service.  A more tailored web-service call should be done to get useful records.  For each service, we save:
1. The number of records retrieved
1. The list of fields found in a record (we assume all records from a service have the same fields)
1. An example of 1-2 records.
1. The [Facets](https://en.wikipedia.org/wiki/Faceted_search) of the service for all service fields that are not explictly excluded.

In [4]:
verbose=False
fields = defaultdict(set) # fields[(env,log)] = {field1, field2, ...}
examples = defaultdict(list) # examples[(env,log)] = [rec1, rec2]
results = defaultdict(dict) # results[(env,log)] = dict(server,url, ok, numfields, numrecs)
facets = defaultdict(dict) # facets[(env,log)] = dict(field) = set(value-1, value-2, ...)

# Dumb! Using same ignore set for all LOGS.
ignore_fields = set(['tags', 'urls', 'message_text', 'id', 'date_added', 
                     'obs_id', 'day_obs', 'seq_num', 'parent_id', 'user_id',
                     'date_invalidated', 'date_begin', 'date_end',
                     'time_lost', # float
                     #'systems','subsystems','cscs',  # values are lists, special handling
                    ])
for env,server in envs.items():
    ok = True
    try:
        recs = None
        log = 'exposurelog'
        #!url = f'{server}/{log}/messages?is_human=either&is_valid=either&offset=0&{limit=}'
        url = f'{server}/{log}/messages?is_human=either&is_valid=either&{limit=}'
        print(f'\nAttempt to get logs from {url=}')
        response = requests.get(url, timeout=timeout)
        response.raise_for_status()
        recs = response.json()
        flds = set(recs[0].keys())
        if verbose:
            print(f'Number of {log} records: {len(recs):,}')
            print(f'Got {log} fields: {flds}')
            print(f'Example record: {recs[0]}')    
        fields[(env,log)] = flds
        examples[(env,log)] = recs[:2]  

        facflds = flds - ignore_fields
        # Fails when r[fld] is a LIST instead of singleton
        # I think when that happens occasionaly, its a BUG in the data! It happens.
        facets[(env,log)] = {fld: set([str(r[fld])
                                       for r in recs if not isinstance(r[fld], list)]) 
                             for fld in facflds}
    except Exception as err:
        ok = False
        print(f'ERROR getting {log} from {env=} using {url=}: {err=}')
    numf = len(flds) if ok else 0
    numr = len(recs) if ok else 0
    results[(env,log)] = dict(ok=ok, server=server, url=url,numfields=numf, numrecs=numr)

    print()
    try:
        recs = None
        log = 'narrativelog'
        #! url = f'{server}/{log}/messages?is_human=either&is_valid=true&offset=0&{limit=}'
        url = f'{server}/{log}/messages?is_human=either&is_valid=either&{limit=}'
        print(f'\nAttempt to get logs from {url=}')
        response = requests.get(url, timeout=timeout)
        response.raise_for_status()
        recs = response.json()
        flds = set(recs[0].keys())
        if verbose:
            print(f'Number of {log} records: {len(recs):,}')
            print(f'Got {log} fields: {flds}')
            print(f'Example record: {recs[0]}')
        fields[(env,log)] = flds    
        examples[(env,log)] = recs[:2] 

        facflds = flds - ignore_fields
        # Fails when r[fld] is a LIST instead of singleton
        # I think when that happens occasionaly, its a BUG in the data! It happens.
        # Look for BAD facet values like: {'None', None}
        facets[(env,log)] = {fld: set([r[fld] 
                                       for r in recs if not isinstance(r[fld], list)]) 
                             for fld in facflds}
    except Exception as err:
        ok = False
        print(f'ERROR getting {log} from {env=} using {url=}: {err=}')
    numf = len(flds) if ok else 0
    numr = len(recs) if ok else 0
    results[(env,log)] = dict(ok=ok, server=server, url=url,numfields=numf, numrecs=numr)


Attempt to get logs from url='https://summit-lsp.lsst.codes/exposurelog/messages?is_human=either&is_valid=either&limit=9999'
ERROR getting exposurelog from env='summit' using url='https://summit-lsp.lsst.codes/exposurelog/messages?is_human=either&is_valid=either&limit=9999': err=ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='summit-lsp.lsst.codes', port=443): Max retries exceeded with url: /exposurelog/messages?is_human=either&is_valid=either&limit=9999 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fd257930af0>, 'Connection to summit-lsp.lsst.codes timed out. (connect timeout=3.05)'))"))


Attempt to get logs from url='https://summit-lsp.lsst.codes/narrativelog/messages?is_human=either&is_valid=either&limit=9999'
ERROR getting narrativelog from env='summit' using url='https://summit-lsp.lsst.codes/narrativelog/messages?is_human=either&is_valid=either&limit=9999': err=ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='summit-lsp.lsst.code

<a class="anchor" id="report"></a>
## Report
This is a silly report that may be useful for developers. Not so much for astronomers.

<a class="anchor" id="ok_table"></a>
### Success/Failure table

In [5]:
show_columns = ['ok', 'server', 'numfields', 'numrecs']
df = pd.DataFrame(data=dict(results)).T.loc[:,show_columns]
print(f'Got results from {df["ok"].values.sum()} of {len(df)} env/logs')
df

Got results from 4 of 6 env/logs


Unnamed: 0,Unnamed: 1,ok,server,numfields,numrecs
summit,exposurelog,False,https://summit-lsp.lsst.codes,0,0
summit,narrativelog,False,https://summit-lsp.lsst.codes,0,0
usdf_dev,exposurelog,True,https://usdf-rsp-dev.slac.stanford.edu,18,7895
usdf_dev,narrativelog,True,https://usdf-rsp-dev.slac.stanford.edu,24,7597
tucson,exposurelog,True,https://tucson-teststand.lsst.codes,18,20
tucson,narrativelog,True,https://tucson-teststand.lsst.codes,24,110


<a class="anchor" id="field_names"></a>
### Field Names

In [6]:
print('Field names for each Environment/Log source:')
for (env,log),flds in fields.items():
    field_names = ', '.join(flds)
    print(f'\n{env}/{log}: {field_names}')
#!dict(fields)

Field names for each Environment/Log source:

usdf_dev/exposurelog: parent_id, instrument, obs_id, user_id, site_id, user_agent, is_human, date_added, message_text, level, tags, day_obs, exposure_flag, is_valid, seq_num, urls, date_invalidated, id

usdf_dev/narrativelog: systems, date_end, message_text, primary_hardware_components, user_agent, site_id, is_human, cscs, category, is_valid, date_invalidated, time_lost_type, tags, urls, time_lost, components, date_begin, parent_id, subsystems, user_id, date_added, level, primary_software_components, id

tucson/exposurelog: parent_id, instrument, obs_id, user_id, site_id, user_agent, is_human, date_added, message_text, level, tags, day_obs, exposure_flag, is_valid, seq_num, urls, date_invalidated, id

tucson/narrativelog: systems, date_end, message_text, primary_hardware_components, user_agent, site_id, is_human, cscs, category, is_valid, date_invalidated, time_lost_type, tags, urls, time_lost, components, date_begin, parent_id, subsystems,

<a class="anchor" id="facets"></a>
### Facets

In [7]:
dict(facets)
for (env,log),flds in facets.items():
    print(f'{env}/{log}:')
    for fld,vals in flds.items():
        print(f'  {fld}: \t{vals}')

usdf_dev/exposurelog:
  instrument: 	{'LSSTComCam', 'LATISS'}
  user_agent: 	{'notebook:nublado', 'LOVE'}
  is_human: 	{'True'}
  site_id: 	{'summit'}
  level: 	{'20', '10'}
  exposure_flag: 	{'junk', 'none', 'questionable'}
  is_valid: 	{'False', 'True'}
usdf_dev/narrativelog:
  subsystems: 	{None}
  systems: 	{None}
  user_agent: 	{'LOVE', 'string'}
  is_human: 	{True}
  site_id: 	{'summit'}
  cscs: 	{None}
  time_lost_type: 	{'weather', 'fault', None}
  level: 	{0, 100}
  primary_software_components: 	{None}
  primary_hardware_components: 	{None}
  category: 	{'None', 'SCIENCE', 'ENG', None}
  is_valid: 	{False, True}
  components: 	{None}
tucson/exposurelog:
  instrument: 	{'LSSTComCam', 'LATISS'}
  user_agent: 	{'LOVE'}
  is_human: 	{'True'}
  site_id: 	{'tucson'}
  level: 	{'10'}
  exposure_flag: 	{'junk', 'none', 'questionable'}
  is_valid: 	{'False', 'True'}
tucson/narrativelog:
  subsystems: 	{None}
  systems: 	{None}
  user_agent: 	{'LOVE', 'string'}
  is_human: 	{False, True

<a class="anchor" id="examples"></a>
### Example Records

In [8]:
for (env,log),recs in examples.items():
    print(f'\n{env=}, {log=}: ')
    print('  Example records: ')
    pp(recs)


env='usdf_dev', log='exposurelog': 
  Example records: 
[{'id': '0005ebfa-1832-4e53-bc7a-a69254deef88',
  'site_id': 'summit',
  'obs_id': 'AT_O_20230118_000113',
  'instrument': 'LATISS',
  'day_obs': 20230118,
  'seq_num': 113,
  'message_text': ' ',
  'level': 20,
  'tags': [],
  'urls': [],
  'user_id': 'dsanmartim',
  'user_agent': 'notebook:nublado',
  'is_human': True,
  'is_valid': False,
  'exposure_flag': 'junk',
  'date_added': '2023-01-19T07:26:28.887152',
  'date_invalidated': '2023-01-19T07:32:32.435376',
  'parent_id': '18345e26-3c8c-4472-864f-68257ced64f4'},
 {'id': '0007f3e6-ac70-4cf3-8edd-70d4a311aa4e',
  'site_id': 'summit',
  'obs_id': 'AT_O_20230118_000215',
  'instrument': 'LATISS',
  'day_obs': 20230118,
  'seq_num': 215,
  'message_text': '',
  'level': 20,
  'tags': [],
  'urls': [],
  'user_id': 'dsanmartim',
  'user_agent': 'notebook:nublado',
  'is_human': True,
  'is_valid': False,
  'exposure_flag': 'none',
  'date_added': '2023-01-19T07:00:11.295541',
  