<a href="https://colab.research.google.com/github/pradh/api-python/blob/svg/notebooks/Peer_SV_Finder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Use this notebook to list peers of a given SV along a specific constraint property.  Peer SVs differ only by a constraint value, with all else (`populationType`, `measuredProperty`, `statType`, etc.) being same.

In [90]:
# @title Setup (Run me first once)
import pandas as pd

def _get_key(row, i):
  nc = int(row['num_constraints'])
  key_list = [
      row['population_type'], row['measured_prop'], row['stat_type'],
      row['measurement_qualifier'], row['measurement_denominator']
  ]
  for j in range(1, nc + 1):
    cp = 'p' + str(j)
    cv = 'v' + str(j)
    key_list.append(row[cp])
    if i == j:
      key_list.append('_')
    else:
      key_list.append(row[cv])
  return ';'.join(key_list)


def load_df():
  df = pd.read_csv('https://media.githubusercontent.com/media/pradh/website/nltopics1/tools/nl/analysis/statvar/data/sv_schemaful.csv',
                   low_memory=False)
  df = df.fillna('')
  df = df.set_index('id')
  return df


def load_map(df):
  sv_map = {}
  for index, row in df.iterrows():
    sv = row.name
    nc = int(row['num_constraints'])
    for i in range(1, nc + 1):
      key = _get_key(row, i)
      if key not in sv_map:
        sv_map[key] = set()
      sv_map[key].add(sv)
  return sv_map


def get_def(id):
  try:
    row = df.loc[id]
    nc = int(row['num_constraints'])
    res = []
    for i in range(1, nc + 1):
      cp = 'p' + str(i)
      cv = 'v' + str(i)
      res.append(f'{row[cp]} ({row[cv]})')
    return '\n'.join(res)
  except KeyError as e:
    print(f'ERROR: {e} not found!')
    return ''


def get_peers(id, cprop):
  row = df.loc[id]
  nc = int(row['num_constraints'])
  idx = -1
  for i in range(1, nc + 1):
    cp = 'p' + str(i)
    if row[cp] == cprop:
      idx = i
      break
  if idx == -1:
    return ''
  key = _get_key(row, idx)
  return '\n'.join(sorted(sv_map[key]))


def get_peer_svs(id, cprop):
  peers = get_peers(id, cprop)

## MAIN ##
df = load_df()
sv_map = load_map(df)

In [82]:
# @title Enter a Variable DCID

STAT_VAR = 'Count_Person_MarginalWorker_AgriculturalLabourers_Worked3To6Months' #@param {type:"string"}

print('CONSTRAINT_PROPS (VALS)')
print(get_def(STAT_VAR))

CONSTRAINT_PROPS (VALS)
workCategory (AgriculturalLabourers)
workPeriod (Month3To6)
workerClassification (MarginalWorker)
workerStatus (Worker)


In [87]:
# @title Enter the constraint property for peers

CONSTRAINT_PROP = 'workCategory' #@param {type:"string"}

print('PEERS')
print(get_peers(STAT_VAR, CONSTRAINT_PROP))

PEERS
Count_Person_MarginalWorker_AgriculturalLabourers_Worked3To6Months
Count_Person_MarginalWorker_Cultivators_Worked3To6Months
Count_Person_MarginalWorker_HouseholdIndustries_Worked3To6Months
Count_Person_MarginalWorker_OtherWorkers_Worked3To6Months
