<a href="https://colab.research.google.com/github/pradh/api-python/blob/svg/notebooks/Peer_SV_Finder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Use this notebook to list peers of a given SV along a specific constraint property.  Peer SVs differ only by a constraint value, with all else (`populationType`, `measuredProperty`, `statType`, etc.) being same.

In [13]:
# @title Setup (Run me first once)
import pandas as pd

def _get_key(row, i):
  nc = int(row['num_constraints'])
  key_list = [
      row['population_type'], row['measured_prop'], row['stat_type'],
      row['measurement_qualifier'], row['measurement_denominator']
  ]
  for j in range(1, nc + 1):
    cp = 'p' + str(j)
    cv = 'v' + str(j)
    key_list.append(row[cp])
    if i == j:
      key_list.append('_')
    else:
      key_list.append(row[cv])
  return ';'.join(key_list)


def load_df():
  df = pd.read_csv('https://media.githubusercontent.com/media/pradh/website/nltopics1/tools/nl/analysis/statvar/data/sv_schemaful.csv',
                   low_memory=False)
  df = df.fillna('')
  df = df.set_index('id')
  return df


def load_map(df):
  sv_map = {}
  for index, row in df.iterrows():
    sv = row.name
    nc = int(row['num_constraints'])
    for i in range(1, nc + 1):
      key = _get_key(row, i)
      cv = row['v' + str(i)]
      if key not in sv_map:
        sv_map[key] = {}
      sv_map[key][sv] = cv
  return sv_map


def get_def(id):
  try:
    row = df.loc[id]
    nc = int(row['num_constraints'])
    res = []
    for i in range(1, nc + 1):
      cp = 'p' + str(i)
      cv = 'v' + str(i)
      res.append(f'{row[cp]} ({row[cv]})')
    return '\n'.join(res)
  except KeyError as e:
    print(f'ERROR: {e} not found!')
    return ''


def get_peers(id, cprop):
  row = df.loc[id]
  nc = int(row['num_constraints'])
  idx = -1
  for i in range(1, nc + 1):
    cp = 'p' + str(i)
    if row[cp] == cprop:
      idx = i
      break
  if idx == -1:
    return ''
  key = _get_key(row, idx)
  return '\n'.join([f'{k} ({sv_map[key][k]})' for k in sorted(sv_map[key])])


def get_peer_svs(id, cprop):
  peers = get_peers(id, cprop)

## MAIN ##
df = load_df()
sv_map = load_map(df)

In [16]:
# @title Enter a Variable DCID

STAT_VAR = 'dc/jk6hj15v39b0d' #@param {type:"string"}

print('CONSTRAINT PROPS (VALS)')
print(get_def(STAT_VAR))

CONSTRAINT PROPS (VALS)
age (Years16Onwards)
commuteTime (MinuteUpto10)
employment (USC_EmployedAndWorking)
employmentStatus (BLS_InLaborForce)
placeOfWork (OutsideOfHome)


In [17]:
# @title Enter the constraint property for peers

CONSTRAINT_PROP = 'commuteTime' #@param {type:"string"}

print('PEER STAT VARS (VALS)')
print(get_peers(STAT_VAR, CONSTRAINT_PROP))

PEER STAT VARS (VALS)
dc/0mk1gv9me1cn (Minute20To24)
dc/2r6wbwvbkrz23 (Minute60Onwards)
dc/ddyrfe05v5347 (Minute15To19)
dc/e0zlyvx7twnr2 (Minute30To34)
dc/hntm9bl6g1gn5 (Minute35To44)
dc/jk6hj15v39b0d (MinuteUpto10)
dc/qkm5qdj8rqf5h (Minute45To59)
dc/xejg6s76dxle2 (Minute25To29)
dc/z00r6r95rb066 (Minute10To14)
