<h2>A JSON to Bytestream converter for Testing the MVTX AI Heavy Flavor Trigger Firmware Block </h2>
<p>nwuerfel@umich.edu - Noah Wuerfel /* ~AP AP AP~ goblu */</p>

<h3>Purpose and Motivation</h3>
<p> A LANL-led project to test the firmware implementation of an AI-based heavy flavor trigger for the inner tracking system of the sPHENIX experiment. Currently, the collaboration simulation outputs a .json file containing hit information in the format of pixel, chip, etc which is used by our collaborators to train the AI trigger using software tools. A concurrent effort is underway in the collaboration to utilize the hls4ml package [^1] to generate a firmware block implementation of the trained, software AI engine. To test the firmware implementation of the trigger, the LANL group plans to use a pair of FPGA development boards (Xilinx KCU105 and VC709) to emulate the optical links and data transfer of the detector electronics. The Xilinx XDMA direct memory access IP has been used to establish virtual file references for the DDR3 memory on the boards. This software takes the output of our simulation in .json format and converts it to a bytestream - to be loaded on the KCU105 and send via optical link to the VC709 which will house the AI trigger block implementation where final results will be read out over DMA.</p> 

<h3>An overview of the Hardware Setup</h3>
<p>Ultimately, this software is approximating the performance of the front end electronics to convert pixel and chip hits to the bytestream those detectors would actually produce during runtime. The Monolithic Active Pixel Sensor Based Vertex Detector (MVTX) is comprised of a number of "staves" which, in the inner layers, hold 9 detector chips called ALPIDES which are (512x1024) arrays of sensitive silicon pixel detectors. Fast tracking data from these detectors is used to generate physics triggers for detector readout. The ALPIDE chips are divided horozontally into 32 readout regions, each region contains 16 priority encoders which readout 1024 pixels each in double columns. Pixel data is arranged according to the ALPIDE data format described later in the notebook. 8 Staves each send 3 chips worth of data each on 3 parallel lines (for a total of 72 ALPIDEs) to each Front End Exchange unit (FELIX) where we plan on sending a copy to the AI engine. The data protocol from Stave to FELIX / AI block is the "GBT format", sets of 80 bit data words defined by the sPHENIX MVTX data format. There will be more in depth discriptions of each data format near the respective classes in code. </p>

<h3>A List of Classes</h3>

- <b>Data Classes</b>
    - Hit
    - RawDataHeaderWord
    - StartPacketWord
    - EndPacketWord
    - ITSHeaderWord
    - TriggerDataHeader
    - TriggerDataTrailer
- <b>Classes Representing Electronics or Helpers</b>
    - StaveInfo
    - GBTLink
    - AlpideInfo
    - RegionInfo


<h3>Case and Code Standards</h3>
<p>Typically I try to maintain consistent code standards and prefer camelCase for my variable and function names with ALLCAPS for preprocessor directives (or in python, constant globals or class features) but in this case I'm trying to be consistent with my own coding preferences and names which match documentation. As much as possible, I have tried to maintain the original casing of the various documented bytefields. For example, many classes have an "isEmpty" method but the valid alpide datawords have fields such as "encoder_id". Oh well...</p> 

[^1]: J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics”, JINST 13 P07027 (2018), arXiv:1804.06913. ; S. Summers et al., “Fast inference of boosted decision trees in FPGAs for particle physics”, arXiv:2002.02534 ; G. Di Guglielmo et al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml”, arXiv:2003.06308 - https://fastmachinelearning.org/hls4ml/index.html

<h1>Helper Functions</h1>

<p>Here we have various helper functions.</p>

In [None]:
# returns index of a colpixel inside region
def pEncoderID(pixel_z):
    return (int(pixel_z/2) % 16)

# returns encoder region given a colpixel
def pEncoderRegion(pixel_z):
    return (int(pixel_z/32))

# checks even odd for number
def isEven(num):
    if(num%2 == 0):
        return True
    else:
        return False
    
#return unique elements of array    
def unique(x):
    x = np.array(x)
    return np.unique(x)

# returns encoder addresse given pixel row and column
def pixelAddr(row, column):
    doubleColIdx = (column % 2)
    #C0
    if(doubleColIdx == 0):
        if(isEven(row)):
            return 2*row
        else:
            return 2*(row+1) -1
    #C1
    else:
        if(isEven(row)):
            return 2*row + 1
        else:
            return 2*row
        return   

# returns a bytearray from a binary string (requires byte alignment of arguments)
def toBytes(Data):
    assert((len(Data) % 8) == 0)
    ByteData = bytearray()
    for idx, bit in enumerate(Data):
        if ((idx % 8) == 0):
            byte = Data[idx: idx + 8]
            byte = int(byte,2)
            byte = bytearray([byte])
            ByteData = ByteData + byte  
    return ByteData

<h1>Helper Functions Demos</h1>

<p>Here we have various helper functions demonstrated.</p>

In [None]:
# Demonstrates priority encoder mappings
columns = range(0,1024)
encoderIds = [pEncoderID(x) for x in columns]
for i in range(0,16):
    for j in range(0,2):
        print('{:0>4}'.format(pixelAddr(i,j)) , end=" ")
    print()
 
print()  
    
# Demonstrates toBytes:
bitstring = '11100000'
byteData = toBytes(bitstring)
print(bitstring)
print(byteData)

<h1>Hit format from JSON</h1>
<p>The Hit class is a container for the basic input from the .json simulation output. Information comes in the format of: Layer, Stave, Chip, Pixel_x and Pixel_z which are mapped from ALPIDE row and column in simulation as below:</p>

```
{
    unsigned int pixel_x = MvtxDefs::getRow(hitkey);
	unsigned int pixel_z = MvtxDefs::getCol(hitkey);
}
```


In [None]:
class Hit:
    def __init__(self, TrkID, Layer, Stave, Chip, Pixel_x, Pixel_z):
        self.MVTXTrkID = TrkID
        self.Layer = Layer
        self.Stave = Stave
        self.Chip = Chip
        self.Pixel_x = Pixel_x
        self.Pixel_z = Pixel_z
        
    def Print(self):
        print("Hit Info-- Pixel_x: ", self.Pixel_x, " Pixel_z: ", self.Pixel_z, 
                  " pixelAddr: ", pixelAddr(self.Pixel_x,self.Pixel_z))

<h1>DATA GBT Words</h1>
<p>Self-explanatory</p>

In [None]:
# Defines the RDH 
class RawDataHeaderWord:
    
    # defined fields
    ReservedHead = f'{0:08b}'
    ReservedHeadLarge = f'{0:032b}'
    ReservedMiddle = f'{0:020b}'
    
    #TODO
    # partially defined for mvtx?
    DetectorField = f'{0:032b}'
    
    SourceID = f'{32:08b}'
    
    #TODO
    FEEID = f'{0:016b}'
    
    HeaderSize = f'{64:08b}'
    HeaderVersion = f'{8:08b}'
    
    #TODO
    #Trigger message BCO from GTM ?
    GTMBCO = f'{0:048b}'

    #TODO
    #Trigger message bunch crossing LHC clock from RU ?
    LHCBC = f'{0:012b}'
    
    #TODO
    # when 0x1 the packet is moved forward with priority?
    PriorityBit = f'{0:08b}'
    
    #TODO
    #Counter to keep track of CRU Data packet in same heartbeat
    PagesCounter = f'{0:016b}'
    
    #TODO
    #Trigger type bit set by Felix at HB trigger...
    TrgType= f'{0:032b}'
    
    def __init__(self, StopBit = 0, PagesCounter = 0):
        self.StopBit = f'{StopBit:08b}'
        self.PagesCounter = f'{PagesCounter:016b}'
        self.GBT0 = (self.ReservedHead + self.DetectorField + self.SourceID + self.FEEID 
                     + self.HeaderSize + self.HeaderVersion)
        self.GBT1 = self.ReservedHead + self.GTMBCO + self.ReservedMiddle + self.LHCBC
        self.GBT2 = self.ReservedHeadLarge + self.PriorityBit + self.StopBit + self.PagesCounter + self.TrgType
        self.Data = self.GBT0 + self.GBT1 + self.GBT2 
        self.ByteData = toBytes(self.Data)
                
    def Print(self):
        print (self.Data)
        
    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )
        
    

In [None]:
#Example RDH
a = RawDataHeaderWord()
b = RawDataHeaderWord(1,3)
a.PrintBytes()
b.PrintBytes()

In [None]:
class StartPacketWord:
    #DV is internal to CRU I think
    #DV = f'{0:01b}'
    ControlCode = f'{1:04b}'
    Length = f'{0:016b}'
    TTSBusy = f'{0:016b}'
    Reserved = f'{0:044b}'
    Data = ControlCode + Length + TTSBusy + Reserved
    
    def __init__(self):
        self.ByteData = toBytes(self.Data)
    
    def Print(self):
        print(self.Data)   
    
    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )

In [None]:
class EndPacketWord:
    #DV = f'{0:01b}'
    ControlCode = f'{2:04b}'
    Length = f'{0:016b}'
    Checksum = f'{0:032b}'
    EndFlag = f'{0:01b}'
    Reserved = f'{0:027b}'
    Data = ControlCode + Length + Checksum + EndFlag + Reserved
    
    def __init__(self):
        self.ByteData = toBytes(self.Data)  
        
    def Print(self):
        print(self.Data)
        
    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )

In [None]:
class ITSHeaderWord:
    ID = '11100000'
    Reserved = f'{0:044b}'
    
    #TODO assumed all lanes active now
    active_lanes = '1' * 28
    #active_lanes = f'{0:028b}'
    
    Data = ID + Reserved + active_lanes
    
    def __init__(self):
        self.ByteData = toBytes(self.Data)

    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )

In [None]:
class TriggerDataHeader:
    ID = '11101000'
    
    # TODO not in sim
    GTM_BCO = f'{0:040b}'
    
    Reserved_1 = f'{0:04b}'
    
    # TODO not in sim
    LHC_BC = f'{0:012b}'
    
    Reserved_2 = '0'
    
    # Continuation needs to be initialized
    
    # NoData needs to be initialized
    
    # Triggered mode - internal trigger not used
    InternalTrigger = '0'
    
    # TODO not in sim
    TriggerType = f'{0:012b}'
    
    def __init__(self, Continuation = '0', NoData = '0'):
        self.Continuation = Continuation
        self.NoData = NoData
        self.ByteData = toBytes(self.ID + self.GTM_BCO + self.Reserved_1 + self.LHC_BC + 
                                self.Reserved_2 + self.Continuation + self.NoData 
                                    + self.InternalTrigger + self.TriggerType)

    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )

In [None]:
class TriggerDataTrailer:
    ID = '11110000'
    Reserved_1 = f'{0:04b}'
    
    # TODO assume all lanes work fine
    lane_starts_violation = '0'
    
    Reserved_2 = '0'
    
    # TODO
    transmission_timeout = '0'
    
    # TODO 
    packet_done = '1'
    
    Reserved_3 = f'{0:08b}'
    
    # TODO assume all lanes work fine
    lane_status = f'{0:056b}'
    
    def __init__(self):
        self.ByteData = toBytes(self.ID + self.Reserved_1 + self.lane_starts_violation + self.Reserved_2 
                                    + self.transmission_timeout + self.packet_done 
                                        + self.Reserved_3 + self.lane_status)
        
    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )    

<h1>Stave and GBT Formatting</h1>
<p>Following are the StaveInfo and GBTLink classes which package the data from the ALPIDES on the frontends into the GBT format expected over the optical links at FELIX / AI Engine. The StaveInfo class simply packages the data on the ALPIDES for the GBTLink class which does the real work of formatting GBT words. Data packets on the GBT lines are formatted into pages of the following format: Start of Packet, Raw Data Header Word, ITS Header Word, Trigger Data Header, ITS Data, Trigger Data Trailer, End of Packet. The Pages have a 512 word limit excluding the start and end words, and for now I also copy the other data words in new pages if needed as well. In our case, the ITS data is the ALPIDE formatted data, described near the alpide class. Data from the Alpies is split into 9 byte chunks and sent LSB first along with an appended MSB called the RU_GBT_ID_WORD containing geographic information about the chip dat is generated from.</p>

In [None]:
class StaveInfo:
    def __init__(self, Layer, Stave, AlpideList):
        self.Layer = Layer
        self.Stave = Stave
        self.AlpideList = AlpideList.copy()
        self.GBTLinks = []
        
        
    def formatData(self):
        # the alpide chips are split on 3 separate GBT lines coming from the Stave going to the RU
        # the chips are currently arranged on the lines in geographic order; 0-2, 3-5, 6-8
        
       #print("stave to link alpide lists...")
       # for alpide in self.AlpideList:
       #     print(alpide.Chip)
        self.GBTLinks.append(GBTLink(self.AlpideList[0:3]))
        self.GBTLinks.append(GBTLink(self.AlpideList[3:6]))
        self.GBTLinks.append(GBTLink(self.AlpideList[6:9]))
        for link in self.GBTLinks:
            link.formatData()
        
    def Print(self):
        print("|------- Stave Info --------")
        print( "Layer: ", self.Layer, " Stave: ", self.Stave, " Chip List... ")
        print()
        for Alpide in self.AlpideList:
            Alpide.Print()
    
    def PrintChips(self):
        print("|------- Stave ByteInfo --------")
        print( "Layer: ", self.Layer, " Stave: ", self.Stave, " Chip List... ")
        print()
        for Alpide in self.AlpideList:
            Alpide.PrintBytes()
            
    def PrintGBTLinks(self):
        print("|------ Stave GBTInfo --------")
        for idx, link in enumerate(self.GBTLinks):
            print("GBT LINK: " + str(idx) + " Bytes:")
            link.PrintBytes()
            
class ReadoutUnit:
    def __init__(self, Stave):
        self.Stave = Stave

In [None]:
class GBTLink:
    
    MAX_PAGE_SIZE = 512
    CHIP_PER_LINK = 3
    BIT_PER_GBT = 80
    BYTE_PER_GBT = 10
    GBT_IN_RDH = 3
    
    def __init__(self, AlpideList):
        self.AlpideList = AlpideList
        self.ByteData = bytearray([])
        
    def formatData(self):
        alpideBytes = []
        linkMVTXData = []
        totalDataWords = 0
        
        # first let's build ITS data words to get a count for paging later
        for alpide in self.AlpideList:
            
            localMVTXData = []
            # empty chips send the EMPTY CHIP TRAILER, not NOTHING...
            byteData = alpide.getFormattedData()
            
            #inner barrel ITS for MVTX is '001' + 5'b chipid 
            RU_GBT_WORD_ID = '001' + f'{alpide.Chip:05b}'
            RU_GBT_WORD_ID = toBytes(RU_GBT_WORD_ID)
            
            #break chip data into its data format RU_GBT_WORD + 9 AlpideBytes
            #note that the alpide data is already 9 byte aligned
            byteNo = len(byteData)
            assert((byteNo % 9) == 0)
            for i in range(0,byteNo):
                if (i % 9 == 0):
                    datawords = byteData[i:i+9]
                    #byte data gets send LSB first...
                    datawords = datawords[::-1]
                    mvtxWord = RU_GBT_WORD_ID + datawords
                    localMVTXData.append(mvtxWord)
            linkMVTXData.append(localMVTXData)
            totalDataWords = totalDataWords + len(localMVTXData)
        
        # need to count pages to build packets.. Pagesize is 512
        # for inner barrel staves, the 3 chip links get RR into the CRU packets
        # page includes 3 RDH GBT words, plust status and data words
        # our dat looks like: SOP, IHW, TDH, ITSDATA, TDT, EOP split as many packets as needed for paging
        
        #TODO Probably don't understand the paging properly wrt the TDT TDH IWH
        # not sure if I get one set of IHW TDH TDT per trigger or if we get a set per page...
        #determine num of pages... we have data + 3 RDH words per page + IWH,TDH,TDT per page
        numPages = math.ceil(totalDataWords/(self.MAX_PAGE_SIZE - self.GBT_IN_RDH - 3))
        
        #common data words that don't change...
        SOP = StartPacketWord()
        SOPBytes = SOP.ByteData
        EOP = EndPacketWord()
        EOPBytes = EOP.ByteData
        IHW = ITSHeaderWord()
        IHWBytes = IHW.ByteData
        
        # keep track of idx since we round robin our data out...
        # not guarunteed to have hits on all three of our chips
        chipIdx = [0]*self.CHIP_PER_LINK
        chipTotWords = []
        for i in range(0,self.CHIP_PER_LINK):
            chipTotWords.append(len(linkMVTXData[i]))
        
        # build packets...
        for i in range(0,numPages):
            dataPacket = bytearray([])
            
            # set stopbit if needed in header
            if(i == (numPages-1)):
                RDH = RawDataHeaderWord(1,i)
            else:
                RDH = RawDataHeaderWord(0,i)
            RDHBytes = RDH.ByteData
            
            #set continuation if we're in a new page
            if( i == 0 ):
                TDH = TriggerDataHeader()
            else:
                TDH = TriggerDataHeader(1,0)
            TDHBytes = TDH.ByteData
            
            TDT = TriggerDataTrailer()
            TDTBytes = TDT.ByteData
            
            dataPacket = dataPacket + SOPBytes + RDHBytes + IHWBytes + TDHBytes
            
            # RR data from the Alpides until we fill page
            # previously misunderstood that simming time ordering in RR would involve random sampling to
            # emulate time ordering differences in RR, instead I just fill cyclically
            for j in range(0, (self.MAX_PAGE_SIZE - self.GBT_IN_RDH - 3)):
                
                
                lastChip = 0
                chipSelected = False
                dataToRead = False
                
                # check if any chips have data left to read...
                for k in range(0,len(linkMVTXData)):
                    if(chipIdx[k] < chipTotWords[k]):
                        dataToRead = True
                        break
                        
                if (not dataToRead):
                    break
                
                # cyclically select chip with data to RR if it has data left to read                
                while(not chipSelected):
                    chipToRead = (lastChip) % self.CHIP_PER_LINK
                    lastChip = lastChip + 1
                    # check last index of chip data to make sure we're in range
                    if (chipIdx[chipToRead] < chipTotWords[chipToRead]):
                        chipSelected = True
                        
                # take a word from cyclically selected chip and add to packet
                chipDataArray = linkMVTXData[chipToRead]
                ITSDataWordBytes = chipDataArray[chipIdx[chipToRead]]
                dataPacket = dataPacket + ITSDataWordBytes
                
                # increment the idx for that chip
                chipIdx[chipToRead] = chipIdx[chipToRead] + 1
                
            # add the TDT and EOP
            dataPacket = dataPacket + TDTBytes + EOPBytes
            
            # add page to data
            self.ByteData = self.ByteData + dataPacket
            
            
    def PrintBytes(self):
        print( ' '.join( '{:02x}'.format(x) for x in self.ByteData ) )
        print()

<h5>Alpide valid data words:</h5>

| Data Word   | Length (Bits) | Value (binary)                                  |
| :----------- | :------------- | :-----------------------------                   |
| CHIP HEADER        | 16             | 1010<chip_id[3:0]><BUNCH_COUNTER_FOR_FRAME[10:3]>|
| CHIP TRAILER        | 8             | 1011<readout_flags[3:0]>|
| CHIP EMPTY FRAME        | 16           | 1110<chip_id[3:0]><BUNCH_COUNTER_FOR_FRAME[10:3]>|
| REGION HEADER        | 8             | 110<region_id[4:0]>|
| IDLE        | 8             | 1111_1111                                       |
| Data Short  | 16            | 01<encoder_id[3:0]><addr[9:0]>                  |
| Data Long   | 24            | 00<encoder_id[3:0]><addr[9:0]>_0_<hit_map[6:0]> |

<h5> DATA SHORT </h5>
<p> encoder_id is the index of the priority encoder inside a region and addr is the pixel hit index from the pixel encoder. From part of the description in DATA LONG, it seems that the pixel with lowest address gets read out if there are multiple hits -- TODO CONFIRM</p>

<h5> DATA LONG </h5>
<p> only used if clustering is enabled - encoder_id and addr are the same as DATA SHORT, containing the geographical information of the first pixel (lowest address) the hit_map[6:0] contains the cluster shape information as a bitmap - a bit in the hit_map is set for any active pixel among the 7 after (based on PE addr) the one in the addr[9:0] field.</p>

<h5> Row and Column mappings: </h5>
| C0 | C1 |
| --- | ---|
| 0 | 1|
| 3 | 2|
|---|---|
| 4 | 5 |
| 7 | 6 |
 |  |

<p>In C0 of double column, the even rows have address: 2*row, and the odd rows have address (row+1)*2 -1. In C1 of double column, the even rows have address 2*row+1 and the odd rows have 2*row </p>

<h5> Clustering readout and DATA LONG </h5>

<h5> Data readout </h5>
<p> Region data frames are sent sequentially in ascending order - region header only comes from regions with pixelhit information </p>

In [None]:
class AlpideInfo:
    def __init__(self,Layer,Stave,Chip,HitList):
        self.Layer = Layer
        self.Stave = Stave
        self.Chip = Chip
        self.HitList = HitList.copy()
        self.regionData = []
        self.byteData = bytearray([])
        self.formatted = False
        
        # empty chip
        if (self.Layer == -1):
            # empty chip frame
            # for now there's no Bunch cross counter...
            self.byteData = '1110' + f'{self.Chip:04b}' + f'{0:08b}'
            self.byteData = toBytes(self.byteData)
            # pad with IDLE word to 9 byte alignment for later...
            self.byteData = self.byteData + bytearray([255]*7)
            self.formatted = True
        
    def getFormattedData(self):
        if (not self.formatted):
            self.formatData()
        return self.byteData
        
    def formatData(self):
        if (self.formatted):
            return
        
        localRegionData=[]
        PEHitList = []
        if(self.HitList):
            
            #format by region (32)
            for i in range(0,32):
                regionHits = [x for x in self.HitList if pEncoderRegion(x.Pixel_z) == i]
                regionHits = sorted(regionHits, key = lambda hit : hit.Pixel_z)
                data = []
                #format by priority Encoder (16 per region)
                for PEIdx in range(0,16):
                    PEHits = [x for x in regionHits if pEncoderID(x.Pixel_z) == PEIdx]
                    PEHits = sorted(PEHits, key = lambda hit: pixelAddr(hit.Pixel_x, hit.Pixel_z))
                    PEHitList.append(PEHits)
                    
                    # build data words... 
                    # misunderstood documentation: IF clustering is NOT enabled, we do data short for each hit
                    # not just the lowest pixel
                    # in the case that clustering IS enabled, but there's no cluster, we send only DATA_SHORT 
                    # rather than emtpy bitmap
                    if PEHits:
                        # if clustering we only need lowest pix addr hit
                        if(EnableClustering):
                            lowPixAddrHit = PEHits[0]
                            #build clusterHitMap
                            #hit map takes the 7 hits after the lowest pixel index
                            lowPixIdx = pixelAddr(lowPixAddrHit.Pixel_x, lowPixAddrHit.Pixel_z)
                            encoderID = pEncoderID(lowPixAddrHit.Pixel_z)
                            addr = pixelAddr(lowPixAddrHit.Pixel_x, lowPixAddrHit.Pixel_z)
                            hitMap = 7*[0]
                            for idx, j in enumerate(range(lowPixIdx+1, lowPixIdx+8)):
                                #this has to happen only on the same priority encoder...
                                for hit in PEHits[1:]:
                                    if (pixelAddr(hit.Pixel_x, hit.Pixel_z) == j ):
                                        hitMap[idx] = 1
                                        break
                            #format hitmap as Little Endian 7 bit word
                            hitMap = int("".join(str(x) for x in hitMap),2)
                            hitMap = f'{hitMap:07b}'
                            hitMap = hitMap[::-1]
                            # note from Yasser: if hitmap == f'{0:07b}' we just write DataShort
                            # I handle that later in the code
                            DATA_SHORT = [encoderID, addr]
                            data = [DATA_SHORT, hitMap]

                            
                        # without clustering we read out every pixel hit as data_shorts
                        else:
                            print("noClustering")
                            for hit in PEHits:
                                encoderID = pEncoderID(hit.Pixel_z)
                                addr = pixelAddr(hit.Pixel_x, hit.Pixel_z)
                                DATA_SHORT = [encoderID, addr]
                                data.append(DATA_SHORT)
                                
                    # empty PE
                    else:
                        DATA_SHORT = []
                        if(EnableClustering):
                            data = [DATA_SHORT,'']
                    
                    localRegionData.append(data)
                    data = []
                
                # set chip regionData
                self.regionData.append(RegionInfo(PEHitList.copy(), localRegionData.copy(), i))
                localRegionData.clear()
                PEHitList.clear()
            
            # empty chips get empty chip header on init
            if self.isEmpty():
                print("chip" + self.Chip + "empty... no region readout...")
                return
            
            # now format the binary region data...
            # regions are read in parallel by round robin... 
            # ignore sim of time ordering, no shuffling...
            # random.shuffle(self.regionData)
            
            # Chip Data Header
            # TODO currently there isn't a bunch crossing counter in the simulation data....
            BCC = '00000000'
            chipDataHeader = '1010' + f'{self.Chip:04b}' + BCC
            chipDataHeaderBytes = bytearray([int(chipDataHeader[0:8],2),int(chipDataHeader[8:16],2)])
            self.byteData = self.byteData + chipDataHeaderBytes
            
            #Chip Region Data
            for region in self.regionData:
                if region.isEmpty():
                    continue
                regionBytes = region.getFormattedData()
                self.byteData = self.byteData + regionBytes
            
            # Chip Data Trailer
            # TODO readoutflags not currently in the simulation....
            readout_flags = '0000'
            chipDataTrailer = '1011' + readout_flags
            chipDataTrailerBytes = bytearray([int(chipDataTrailer,2)])
            self.byteData = self.byteData + chipDataTrailerBytes
            
            #align to 9 bytes for RU formatting- remainder data is IDLE word: FF
            if(len(self.byteData) != 9):
                remainder = 9 - (len(self.byteData) % 9)
                if(remainder):
                    for i in range (0, remainder):
                        self.byteData = self.byteData + bytearray([255])  
            self.formatted = True
            return

    def Print(self):
        if self.isEmpty():
            return
        print("|------- Chip Info -------")
        print("Chip: ", self.Chip, " Region Hit Lists...")
        print()
        for region in self.regionData:
            region.Print()
        print()
        
    def PrintBytes(self):
        if self.isEmpty():
            return
        print("|------- Chip: " , self.Chip , " ByteInfo -------")
        print( ' '.join( '{:02x}'.format(x) for x in self.getFormattedData() ) )
        print()

     
    def isEmpty(self):
        for region in self.regionData:
            if(not region.isEmpty()):
                return False
        return True
    
    ALPIDE_REGIONS = 32
    PENCODER_PER_REGION = 16

In [None]:
class RegionInfo:
    def __init__(self, PEHitList, Data, RegionNo):
        self.PEHitList = PEHitList
        self.Data = Data
        self.RegionNo = RegionNo
        self.byteData = bytearray([])
        self.formatted = False
        
    def getFormattedData(self):
        if not self.formatted:
            self.formatData()
        return self.byteData

    def formatData(self):
        # Readout PE in sequence....
        # Data is Region header + PE words if hits
        if self.isEmpty():
            return
        regionHeader = '110' + f'{self.RegionNo:05b}'
        tempBinData = regionHeader
        for PE , hits in enumerate(self.PEHitList):
            if hits:
                if(EnableClustering):
                    DATA_LONG = self.Data[PE]
                    DATA_SHORT = DATA_LONG[0]
                    hit_map = DATA_LONG[1]
                    encoder_id = int(DATA_SHORT[0])
                    addr = int(DATA_SHORT[1])
                    # zero suppress non-clusters...
                    if (hit_map == f'{0:07b}'):
                        #send data short - header is 01 not 00
                        binDataLong = '01' + f'{encoder_id:04b}' + f'{addr:010b}'
                    else:
                        binDataLong = '00' + f'{encoder_id:04b}' + f'{addr:010b}' + '0' + hit_map
                    tempBinData = tempBinData + binDataLong
                    
                #otherwise we have many DATA_SHORT to read...
                else:
                    DATA_SHORTS = self.Data[PE]
                    for DATA_SHORT in DATA_SHORTS:
                        encoder_id = int(DATA_SHORT[0])
                        addr = int(DATA_SHORT[1])
                        binDataShort = '01' + f'{encoder_id:04b}' + f'{addr:010b}'
                        tempBinData = tempBinData + binDataShort
                    
        # break bin data into bytes....
        self.byteData = toBytes(tempBinData)  
        self.formatted = True
                
    def Print(self):
        if self.isEmpty():
            return
        print("-------- Region: ", self.RegionNo, "--------")
        for PE, hits in enumerate(self.PEHitList):
            if hits:
                print("-------- Priority Encoder: " , PE , "--------")
                for hit in hits:
                    hit.Print()
                if(EnableClustering):
                    DATA_LONG = self.Data[PE]
                    DATA_SHORT = DATA_LONG[0]
                    ClusterMap = DATA_LONG[1]
                    print("ClusterMap: ", ClusterMap)
                    print("Data_SHORT: ", DATA_SHORT)
                else:
                    DATA_SHORTS = self.Data[PE]
                    for DATA_SHORT in DATA_SHORTS:
                        print("DATA_SHORT: ", DATA_SHORT)
                print()
                
    def PrintBytes(self):
        if self.isEmpty():
            return
        print("-------- Region: ", self.RegionNo, "--------")
        print( ' '.join( '{:02x}'.format(x) for x in self.getFormattedData() ) )
        print()
        
    def isEmpty(self):
        for PE in self.PEHitList:
            if len(PE) != 0:
               return False 
        return True     

In [None]:
import json
import numpy as np
import math
import random

MVTXLayers = 3
ChipsPerStave = 9
EnableClustering = True

datafd = open('Signal.json')
outfd = open('Signal_gbt.bin','w')
data = json.load(datafd)

EventGBTinfo = []
AlpideData = []
StaveData = []
HitList = []
print("TotalEvents:", len(data['Events']))
print()

for evidx, ev in enumerate(data['Events']):
    
    print("--------------------")        
    print("--------------------")    
    print("Start of event: ", evidx)
    print("--------------------")
    print("--------------------")   
    print()
    
    EventGBTinfo.clear()
    AlpideData.clear()
    StaveData.clear()
    HitList.clear()
    RawHit = ev['RawHit']
    MVTXHits = RawHit['MVTXHits']
    
    # pull mvtx hit data info 
    for mvtxhit in MVTXHits:
        hitinfo = mvtxhit['ID']
        if(hitinfo['MVTXTrkID'] > 0):
            HitList.append(Hit(hitinfo['MVTXTrkID'],hitinfo['Layer'],hitinfo['Stave'],hitinfo['Chip'],
                                         hitinfo['Pixel_x'], hitinfo['Pixel_z']))
    
            
    # geographic grouping of Hits by Layer, Stave:
    for layer in range(0,MVTXLayers):
        layerHitList = [x for x in HitList if x.Layer == layer]
        stavesInLayer = unique([x.Stave for x in layerHitList])
        for stave in stavesInLayer:
            layerStaveHitList = [x for x in layerHitList if x.Stave == stave]
            # reduce data on ChipID 
            for ChipNo in range(0,ChipsPerStave):
                ChipHits = [x for x in layerStaveHitList if x.Chip == ChipNo]
                if ChipHits:
                    SampleHit = ChipHits[0]
                    AlpideData.append(AlpideInfo(SampleHit.Layer,SampleHit.Stave,SampleHit.Chip,
                                         ChipHits))
                #else add empty alpide chip - empty chips still read out empty chip header later on...
                else:
                    AlpideData.append(AlpideInfo(-1,-1,ChipNo,[]))
            StaveData.append(StaveInfo(layer,stave,AlpideData))
            AlpideData.clear()

            
    # format 9 alpide worth of data for each stave for sending to RU     
    for stave in StaveData:
        stave.formatData()
        stave.Print()
        print("--------------------------------")
        stave.PrintChips()
        stave.PrintGBTLinks()
        
    print("------------")        
    print("------------")    
    print("end of event")
    print("------------")
    print("------------")
    print()
    break