# PLAsTiCC v2 taxonomy

_Alex Malz (GCCL@RUB)_ (add your name here)

The purpose of this notebook is to outline a bitmask schema for hierarchical classes of LSST alerts.
The bitmask corresponds to a "best" classification to be included in the alert.
Each digit in the bitmask, however, corresponds to a vector of classification probabilities, confidence flags, or scores that can be used to subsample the alert stream.
Persistent features could be queried for the subsampled objects from a separate database, which could be used for further selection.

In [1]:
from treelib import Node, Tree
import string

## Housekeeping

We need to think about how to sort through the classification information.
`directory` and `index` are very simplistic starting points.
It'll be easier when we have a better idea of what subsampling operations we'll perform.

In [2]:
directory = {}
index = {}

### Generating the integer codes

The idea is that every level of the tree corresponds to one digit in the bitmask.
The number of objects in the

In [3]:
### TODO Alex: Please reverse intger mask so no leading zeros
### map real to 1 and bogus to 0 - doesn't need a 0 for alert digit

digs = string.digits + string.ascii_letters

def int2base(x, base):
    if x < 0:
        sign = -1
    elif x == 0:
        return digs[0]
    else:
        sign = 1

    x *= sign
    digits = []

    while x:
        digits.append(digs[int(x % base)])
        x = int(x / base)

    if sign < 0:
        digits.append('-')

    digits.reverse()

    return ''.join(digits)

## Building a phylogenetic tree

Given the hierarchical class relationships, make a tree diagram (and record some hopefully useful information).

In [4]:
def branch(tree, parent, children, prepend=["Other"], append=None, directory=directory, index=index):
    directory[parent] = {}
    if prepend is not None:
        proc_pre = [parent + "/" + pre for pre in prepend]
        children = proc_pre + children
    if append is not None:
        proc_app = [parent + "/" + appe for app in append]
        children = children + proc_app
    bigbase = len(children)
    for i, child in enumerate(children):
        directory[parent][child] = i
        index[child] = int2base(i, bigbase) + index[parent]
        tree.create_node(index[child]+" "+child, child, parent=parent)
    return(bigbase, directory, index)

It would be better to start with something like `directory` than to build it as we go along, but, hey, this is a hack.

In [5]:
tree = Tree()

index["Alert"] = int2base(0, 1)
tree.create_node(index["Alert"] + " " + "Alert", "Alert")

branch(tree, "Alert", ["Bogus", "Real"])#, prepend=["Unclassified"])

branch(tree, "Real", ["Static", "Moving"])#, prepend=['Unclassified'])

branch(tree, "Static", ["Non-Recurring", "Recurring"])

branch(tree, "Recurring", ["Periodic", "Non-Periodic"])

branch(tree, "Periodic", ["Cepheid", "RR Lyrae", "Delta Scuti", "EB", "LPV/Mira"])

branch(tree, "Non-Periodic", ["AGN"])

branch(tree, "Non-Recurring", ["SN-like", "Fast", "Long"])

branch(tree, "SN-like", ["Ia", "Ib/c", "II", "Iax", "91bg"])

branch(tree, "Fast", ["KN", "M-dwarf Flare", "Dwarf Novae", "uLens"])

branch(tree, "Long", ["SLSN", "TDE", "ILOT", "CART", "PISN"])

tree.show()

0 Alert
├── 00 Alert/Other
├── 10 Bogus
└── 20 Real
    ├── 020 Real/Other
    ├── 120 Static
    │   ├── 0120 Static/Other
    │   ├── 1120 Non-Recurring
    │   │   ├── 01120 Non-Recurring/Other
    │   │   ├── 11120 SN-like
    │   │   │   ├── 011120 SN-like/Other
    │   │   │   ├── 111120 Ia
    │   │   │   ├── 211120 Ib/c
    │   │   │   ├── 311120 II
    │   │   │   ├── 411120 Iax
    │   │   │   └── 511120 91bg
    │   │   ├── 21120 Fast
    │   │   │   ├── 021120 Fast/Other
    │   │   │   ├── 121120 KN
    │   │   │   ├── 221120 M-dwarf Flare
    │   │   │   ├── 321120 Dwarf Novae
    │   │   │   └── 421120 uLens
    │   │   └── 31120 Long
    │   │       ├── 031120 Long/Other
    │   │       ├── 131120 SLSN
    │   │       ├── 231120 TDE
    │   │       ├── 331120 ILOT
    │   │       ├── 431120 CART
    │   │       └── 531120 PISN
    │   └── 2120 Recurring
    │       ├── 02120 Recurring/Other
    │       ├── 12120 Periodic
    │       │   ├── 012120 Periodic/Other
    │  

Yeah, not sure these are really useful. . .

In [6]:
print(directory)

{'Alert': {'Alert/Other': 0, 'Bogus': 1, 'Real': 2}, 'Real': {'Real/Other': 0, 'Static': 1, 'Moving': 2}, 'Static': {'Static/Other': 0, 'Non-Recurring': 1, 'Recurring': 2}, 'Recurring': {'Recurring/Other': 0, 'Periodic': 1, 'Non-Periodic': 2}, 'Periodic': {'Periodic/Other': 0, 'Cepheid': 1, 'RR Lyrae': 2, 'Delta Scuti': 3, 'EB': 4, 'LPV/Mira': 5}, 'Non-Periodic': {'Non-Periodic/Other': 0, 'AGN': 1}, 'Non-Recurring': {'Non-Recurring/Other': 0, 'SN-like': 1, 'Fast': 2, 'Long': 3}, 'SN-like': {'SN-like/Other': 0, 'Ia': 1, 'Ib/c': 2, 'II': 3, 'Iax': 4, '91bg': 5}, 'Fast': {'Fast/Other': 0, 'KN': 1, 'M-dwarf Flare': 2, 'Dwarf Novae': 3, 'uLens': 4}, 'Long': {'Long/Other': 0, 'SLSN': 1, 'TDE': 2, 'ILOT': 3, 'CART': 4, 'PISN': 5}}


In [7]:
print(index)

{'Alert': '0', 'Alert/Other': '00', 'Bogus': '10', 'Real': '20', 'Real/Other': '020', 'Static': '120', 'Moving': '220', 'Static/Other': '0120', 'Non-Recurring': '1120', 'Recurring': '2120', 'Recurring/Other': '02120', 'Periodic': '12120', 'Non-Periodic': '22120', 'Periodic/Other': '012120', 'Cepheid': '112120', 'RR Lyrae': '212120', 'Delta Scuti': '312120', 'EB': '412120', 'LPV/Mira': '512120', 'Non-Periodic/Other': '022120', 'AGN': '122120', 'Non-Recurring/Other': '01120', 'SN-like': '11120', 'Fast': '21120', 'Long': '31120', 'SN-like/Other': '011120', 'Ia': '111120', 'Ib/c': '211120', 'II': '311120', 'Iax': '411120', '91bg': '511120', 'Fast/Other': '021120', 'KN': '121120', 'M-dwarf Flare': '221120', 'Dwarf Novae': '321120', 'uLens': '421120', 'Long/Other': '031120', 'SLSN': '131120', 'TDE': '231120', 'ILOT': '331120', 'CART': '431120', 'PISN': '531120'}


## Building a structure for hierarchical classification

The whole point of this, for me, is for the classification to have corresponding posterior probabilities, or at least confidence flags or scores, because I'd want to use them to rapidly select follow-up candidates.
[This](https://community.lsst.org/t/projects-involving-irregularly-shaped-data/4466) looks potentially relevant.
I guess it could also be used for packaging up additional features into an alert without bloating it up too much.