# Analyse problems with deliver_events_first

- See test set `test_bad_connections.py`/`run_bad_connections.py`
    - One population of 50 parrot neurons
    - All neurons receive input spike from spike generator at 0.1 ms
    - All neurons produce output spike at 0.2 ms; these are transmitted locally from spike gen to parrot
    - Neurons are connected to other neurons in same population with delay 0.1 ms
    - Thus spikes fired by parrots at 0.2 ms evoke output from parrots at 0.3 ms; these spikes are transmitted via MPI
    - Sources are given as two arrays (good/bad) and connected one-to-one to targets 1..50, in-degree 1
        - For both good and bad sources, several sources have an out-degree > 1
    - Expected behavior when simulating for 0.3 ms: Each parrot neuron spikes once at 0.2ms and once at 0.3ms
    - Observed behavior
        - For the "good" sources, NEST behaves as expected
        - For the "bad" sources, 17 additional spikes are fired at 0.3 ms
- Analysis indicates that problems are related to communication of target information from post- to presynaptic side

## Problem documentation

In [1]:
import pandas as pd
import numpy as np
import pickle

### Load source information

NB: Requires NEST in Python path.

In [2]:
from run_bad_connections import good_sources, bad_sources


              -- N E S T --
  Copyright (C) 2004 The NEST Initiative

 Version: def_debug@bbe555867
 Built: Feb 28 2023 00:47:45

 This program is provided AS IS and comes with
 NO WARRANTY. See the file LICENSE for details.

 Problems or suggestions?
   Visit https://www.nest-simulator.org

 Type 'nest.help()' to find out more about NEST.



In [4]:
print(good_sources)

[45, 50, 37, 13, 47, 29, 9, 46, 15, 10, 15, 38, 34, 29, 47, 45, 14, 23, 35, 1, 44, 3, 20, 46, 46, 13, 3, 49, 3, 48, 5, 9, 15, 28, 30, 25, 40, 30, 16, 3, 40, 40, 24, 40, 40, 17, 50, 32, 43, 42]


In [5]:
print(bad_sources)

[10, 49, 41, 41, 9, 44, 19, 46, 9, 25, 11, 33, 46, 37, 36, 27, 45, 29, 15, 27, 21, 50, 27, 38, 3, 5, 38, 3, 41, 49, 42, 37, 36, 45, 5, 3, 21, 29, 9, 30, 34, 40, 35, 44, 41, 48, 44, 27, 36, 47]


### Load spike output from test executions

- Data created by runnning
    ```
    mpirun -np [1|2] python run_bad_connections.py . True [good|bad]
    ```
  and moving output files manually to `[good|bad]_log` directories
- Files `1-0`, `2-[0|1]` contain pickled spike recorder output for 1/2 ranks and rank numbers 0 and 1 and are the same files used by `test_bad_connections.py` to transfer results between runner and checker.
- Code below combines data from ranks and sorts to allow comparison; this is the same check as performed by the test.

#### Good case

In [7]:
g1 = pickle.load(open('good_log/1-0', 'rb')).sort_values(['times', 'senders'], ignore_index=True)
g20 = pickle.load(open('good_log/2-0', 'rb'))
g21 = pickle.load(open('good_log/2-1', 'rb'))
g2 = pd.concat([g20, g21], ignore_index=True).sort_values(['times', 'senders'], ignore_index=True)

In [9]:
all(g1 == g2)

True

#### Bad case

In [10]:
b1 = pickle.load(open('bad_log/1-0', 'rb')).sort_values(['times', 'senders'], ignore_index=True)
b20 = pickle.load(open('bad_log/2-0', 'rb'))
b21 = pickle.load(open('bad_log/2-1', 'rb'))
b2 = pd.concat([b20, b21], ignore_index=True).sort_values(['times', 'senders'], ignore_index=True)

- When running on single rank, behavior is as expected, same as for good case

In [11]:
all(g1 == b1)

True

- Spikes at 0.2 are ok on two mpi ranks as well (locally transmitted spikes)

In [12]:
all(b1.loc[b1.times==0.2] == b2.loc[b2.times==0.2])

True

#### Problems in bad case in two ranks at 0.3 ms
- In the good case / on a single rank, each neuron fires a single spike at 0.3 ms

In [14]:
all(b1.loc[b1.times==0.3].senders == range(1, 51))

True

- For the bad case and two ranks, we have extra spikes at 0.3ms

In [29]:
b2_03_s = b2.loc[b2.times==0.3].senders.values

In [30]:
len(b2_03_s)

67

- 17 extra spikes
- Confirm that all neurons have spiked

In [31]:
set(b2_03_s) == set(range(1, 51))

True

- Find neurons that have spikes multiple times

In [46]:
extra_spike_nrns = b2_03_s[:-1][(b2_03_s[1:] - b2_03_s[:-1]) == 0]
print(extra_spike_nrns)

[ 1  6  8 13 15 22 24 27 31 33 40 41 42 44 46 47 49]


- The remaining neurons have no extra spikes

In [45]:
no_extra_spike_nrns = sorted(set(range(1, 51)) - set(extra_spike_nrns))
print(no_extra_spike_nrns)

[2, 3, 4, 5, 7, 9, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 30, 32, 34, 35, 36, 37, 38, 39, 43, 45, 48, 50]


- Obtain source-target pairs for connections with and without extra spikes
    - `-1` to translate from node id to array index

In [52]:
extra_spike_conns = sorted((bad_sources[tgt-1], tgt) for tgt in extra_spike_nrns)
print(extra_spike_conns)

[(10, 1), (30, 40), (34, 41), (36, 15), (36, 33), (36, 49), (38, 24), (38, 27), (40, 42), (42, 31), (44, 6), (44, 44), (44, 47), (46, 8), (46, 13), (48, 46), (50, 22)]


In [53]:
no_extra_spike_conns = sorted((bad_sources[tgt-1], tgt) for tgt in no_extra_spike_nrns)
print(no_extra_spike_conns)

[(3, 25), (3, 28), (3, 36), (5, 26), (5, 35), (9, 5), (9, 9), (9, 39), (11, 11), (15, 19), (19, 7), (21, 21), (21, 37), (25, 10), (27, 16), (27, 20), (27, 23), (27, 48), (29, 18), (29, 38), (33, 12), (35, 43), (37, 14), (37, 32), (41, 3), (41, 4), (41, 29), (41, 45), (45, 17), (45, 34), (47, 50), (49, 2), (49, 30)]


- Note that several sources occur several times
- Reduce to unique sources in both groups

In [54]:
extra_spike_sources = sorted(set(s for s, t in extra_spike_conns))
extra_spike_sources

[10, 30, 34, 36, 38, 40, 42, 44, 46, 48, 50]

In [55]:
no_extra_spike_sources = sorted(set(s for s, t in no_extra_spike_conns))
no_extra_spike_sources

[3, 5, 9, 11, 15, 19, 21, 25, 27, 29, 33, 35, 37, 41, 45, 47, 49]

### Conclusion

Source lead to extra spikes if and only if they have even node ids, i.e., reside on rank 0.

## Trace process of building target tables

In [56]:
tgts = range(1, 51)
conns_bad = pd.DataFrame(zip(bad_sources, tgts), columns=['Src', 'Tgt'])
conns_good = pd.DataFrame(zip(good_sources, tgts), columns=['Src', 'Tgt'])

In [57]:
def add_vp(conns):
    """For connection, add source and target virtural process, rank, and thread"""
    conns['Sv'] = conns.Src % 4
    conns['Tv'] = conns.Tgt % 4

    conns['Sr'] = conns.Sv % 2
    conns['Tr'] = conns.Tv % 2

    conns['St'] = (conns.Sv // 2) % 2
    conns['Tt'] = (conns.Tv // 2) % 2
    
    return conns

In [59]:
conns_bad = add_vp(conns_bad)
conns_good = add_vp(conns_good)

### Number of connections with sources and targets on given ranks 0 and 1

In [62]:
def s_t_rank_count(conns, num_ranks=2):
    print('Source  Target  Num of')
    print('  rank    rank   conns')
    print('----------------------')
    for sr in range(num_ranks):
        for tr in range(num_ranks):
            print(f'{sr:6d}{tr:8d}{sum((conns.Sr == sr) & (conns.Tr == tr)):8d}')

In [63]:
s_t_rank_count(conns_bad)

Source  Target  Num of
  rank    rank   conns
----------------------
     0       0       8
     0       1       9
     1       0      17
     1       1      16


In [64]:
s_t_rank_count(conns_good)

Source  Target  Num of
  rank    rank   conns
----------------------
     0       0      12
     0       1      12
     1       0      13
     1       1      13


### Class representing connection tables

In [306]:
class ConnTables:
    """
    Build tables to be constructed by NEST kernel while creating and exchanging connections.
    
    All tables in this class have an outer MPI-rank dimension which represents the MPI rank on
    which the corresponding table would be built. This dimension is not present in the corresponding
    NEST code.
    
    The syn_id dimension is not represented in this class.
    
    - source_table
        - The SourceTable built by repeated add_node() calls. It is always sorted here.
        - In NEST, entries only contain source ID and primary flag
        - Here we store source and target id and target thread for information purposes.
    - compressible_sources
        - Table as constructed by SourceTable::collect_compressible_sources()
        - One entry per unique source neuron on thread
        - Corresponds to SourceTable::compressible_sources_
        - Compare with dumped csrc entries
    - compressed_spike_data_map & compressed_spike_data
        - Tables as constructed by SourceTable::fill_compressed_spike_data()
        - ..._map maps source node IDs to entries in compressed_spike_data (csd)
        - csd[rank(not in NEST)][source index from csd_map][target_thread] contains
            local conn ID and target thread
        - Compare with dumped csdm and csd entries
    - target_send_buffers
        - as built by ConnectionManager::fill_target_buffer(), but without resizing/blocking
    """
    
    def __init__(self, conns, n_vp, n_mpi):
        """
        conns: (src, tgt) pairs
        n_vp: number of virtual processes
        n_mpi: number of mpi ranks
        """
        
        assert n_vp % n_mpi == 0
        
        self.conns = list(conns)
        self.n_vp = n_vp
        self.n_mpi = n_mpi
        self.n_thr = n_vp // n_mpi
        
        self._build_source_table()
        self._collect_compressible_sources_per_thread()
        self._compress_source_data_per_rank()
        self._build_send_buffers()
        self._build_target_table()
        
    def _get_rank_thread(self, nid):
        vp = nid % self.n_vp
        rk = vp % self.n_mpi
        tr = vp // self.n_mpi
        return rk, tr
    
    def _build_source_table(self):
        self.source_table = [[[] for _ in range(self.n_thr)] for _ in range(self.n_mpi)]
        for src, tgt in self.conns:
            s_rnk, s_thr = self._get_rank_thread(src)
            t_rnk, t_thr = self._get_rank_thread(tgt)
            self.source_table[t_rnk][t_thr].append((src, tgt, t_thr))

        for l in self.source_table:
            for ll in l:
                ll.sort()
            
    @staticmethod        
    def _compress_one_thread(targets_source_tab):
        """Helper for _compress_sources()"""

        ctab = []
        lcid = -1
        last_gid = -1
        for src, tgt, tgt_thr in targets_source_tab:
            lcid += 1
            if src != last_gid:
                last_gid = src
                ctab.append((src, (tgt_thr, lcid)))
        return ctab

    def _collect_compressible_sources_per_thread(self):
        """
        Compress sources on each thread separately, so each src appears only once.
        """
        self.compressible_sources = [
            [self._compress_one_thread(thr_tab) for thr_tab in rank_tab] 
            for rank_tab in self.source_table]

    def _compress_source_data_per_rank(self):
        
        self.compressed_spike_data_map = []
        self.compressed_spike_data = []
        
        for comp_sources in self.compressible_sources:  # outer loop over MPI processes
            cmap = {}
            csd = []
            for thread_idx, comp_sources_on_thread in enumerate(comp_sources):
                for src, (tgt_thr, lcid) in comp_sources_on_thread:
                    assert thread_idx == tgt_thr   # consistency check
                    if src not in cmap:
                        cmap[src] = len(csd)
                        csd.append([[] for _ in range(self.n_thr)])
                    six = cmap[src]
                    csd[six][tgt_thr].append({'lcid': lcid, 'tgt_thr': tgt_thr})
            self.compressed_spike_data_map.append(cmap)
            self.compressed_spike_data.append(csd)
            
    def _build_send_buffers(self):
        """
        From csd/csdm, compose send buffers for presynaptic exchange.
        
        Logic:
         - Each rank has one send buffer for each other rank
         - Go through compressed_spike_data_map
         - For each source neuron, determine rank responsible for source neuron
         - Append csd_map entry to send buffer for that rank
         - After exchange, each rank then will have csd_map entries for all spikes it needs to send
        """
        
        self.target_send_buffers = []

        for csd_map in self.compressed_spike_data_map:  # outer loop over MPI processes
            send_buffers = [[] for _ in range(self.n_mpi)]  # one buffer for each rank to send to
            for src, tgt_data in csd_map.items():
                s_rnk, _ = self._get_rank_thread(src)
                send_buffers[s_rnk].append({'s': src, 'ci': tgt_data})
            self.target_send_buffers.append(send_buffers)
            
    def _build_target_table(self):
        """
        From target_send_buffers build target_table as done by EventDeliveryManager::distribute_target_data_buffers_().
        
        In the table here, we use the absolute node id, not the local id. Since we do not use local id, instead of a
        vector index by lid we use a dict indexed by node id. Map elements are lists of Target entries.
        """
        
        self.target_table = []
        for rk in range(self.n_mpi):
            incoming = []
            for t_rk, in_buffer in enumerate(self.target_send_buffers):
                for entry in in_buffer[rk]:
                    entry['tgt_rk'] = t_rk
                    incoming.append(entry)
            # incoming now has all connection information sent to this rank
            tt = [{} for _ in range(self.n_thr)]
            for entry in incoming:
                s = entry['s']
                ci = entry['ci']
                _, s_thr = self._get_rank_thread(s)
                if s not in tt[s_thr]:
                    tt[s_thr][s] = []
                tt[s_thr][s].append((entry['tgt_rk'], entry['ci']))
            self.target_table.append(tt)
            
    def print_source_table(self):
        print('Source Table')
        for rk, rst in enumerate(self.source_table):
            for tr, trst in enumerate(rst):
                s, t, _ = zip(*trst)
                lbl = f'Rank {rk}, Thread {tr}'
                w_lbl = len(lbl)
                print(lbl, '  Idx:', end='')
                for v in range(len(s)): print(f'{v:4d}', end='')
                print()
                print(' '*w_lbl, '  Src:', end='')
                for v in s: print(f'{v:4d}', end='')
                print()
                print(' '*w_lbl, '  Tgt:', end='')
                for v in t: print(f'{v:4d}', end='')
                print()
                print()

    def print_compressible_sources(self):
        print('Compressible Sources')
        for rk, rst in enumerate(self.compressible_sources):
            for tr, trst in enumerate(rst):
                s, ix = zip(*trst)
                lbl = f'Rank {rk}, Thread {tr}'
                w_lbl = len(lbl)
                print(lbl, '       Src:', end='')
                for v in s: print(f'{v:4d}', end='')
                print()
                print(' '*w_lbl, '  1st LCID:', end='')
                for v in ix: print(f'{v[1]:4d}', end='')
                print()
                print()

    def print_compressed_spike_data_map(self):
        print('Compressed Spike Data Map')
        for rk, csdm in enumerate(self.compressed_spike_data_map):
            s, ix = zip(*sorted(csdm.items()))
            lbl = f'Rank {rk}'
            w_lbl = len(lbl)
            print(lbl, '     Src:', end='')
            for v in s: print(f'{v:4d}', end='')
            print()
            print(' '*w_lbl, ' CSD Idx:', end='')
            for v in ix: print(f'{v:4d}', end='')
            print()
            print()

    def print_compressed_spike_data(self):
        print('Compressed Spike Data')
        for rk, csd in enumerate(self.compressed_spike_data):
            n = len(csd)
            lbl = f'Rank {rk}'
            w_lbl = len(lbl)
            print(lbl, '   Idx:', end='')
            for v in range(n): print(f'{v:6d} |', end='')
            print()
            print(' '*w_lbl, ' TT LC:', end='')
            for v in csd: 
                if v[0]:
                    d = v[0][0]
                    print(f'{d["tgt_thr"]:3d}{d["lcid"]:3d} |', end='')
                elif v[1]:
                    d = v[1][0]
                    print(f'{d["tgt_thr"]:3d}{d["lcid"]:3d} |', end='')
            print()
            print(' '*w_lbl, ' TT LC:', end='')
            for v in csd: 
                if v[0] and v[1]:
                    d = v[1][0]
                    print(f'{d["tgt_thr"]:3d}{d["lcid"]:3d} |', end='')
                else:
                    print('       |', end='')
            print()
            print()
            
    def print_target_send_buffers(self):
        print('Target send buffers')
        for from_rk, rst in enumerate(self.target_send_buffers):
            for to_rk, trst in enumerate(rst):
                df = pd.DataFrame.from_records(trst)
                lbl = f'From rank {from_rk} To rank {to_rk}:'
                w_lbl = len(lbl)
                print(lbl, '     Src:', end='')
                for v in df.s: print(f'{v:4d}', end='')
                print()
                print(' '*w_lbl, ' CSD Idx:', end='')
                for v in df.ci: print(f'{v:4d}', end='')
                print()
                print()
                
    def print_target_send_buffers_sizes(self):
        print('Target send buffers sizes')
        for from_rk, rst in enumerate(self.target_send_buffers):
            for to_rk, trst in enumerate(rst):
                print(f'From rank {from_rk} To rank {to_rk}: {len(trst):3d}')
                
    def print_target_table(self):
        print('Target Table')
        for rk, ttr in enumerate(self.target_table):
            for thr, tt in enumerate(ttr):
                s, tl = zip(*sorted(tt.items()))
                lbl = f'Rank {rk}, Thread {thr}'
                w_lbl = len(lbl)
                print(lbl, '   Src:', end='')
                for v in s: print(f'{v:6d} |', end='')
                print()
                print(' '*w_lbl, ' TR CI:', end='')
                for v in tl:
                    assert 1 <= len(v) <= 2, "Print code cannot handle anything else"
                    print(f'{v[0][0]:3d}{v[0][1]:3d} |', end='')
                print()
                print(' '*w_lbl, ' TR CI:', end='')
                for v in tl:
                    if len(v) > 1:
                        print(f'{v[1][0]:3d}{v[1][1]:3d} |', end='')
                    else:
                        print('       |', end='')
                print()
                print()
                
    def print_connectivity(self):
        cm = {}
        for s, t in self.conns:
            if s not in cm:
                cm[s] = []
            cm[s].append(t)
        mx = max(len(tl) for tl in cm.values())
        s, tl = zip(*sorted(cm.items()))
        print('Src:', end='')
        for v in s: print(f'{v:3d} |', end='')
        print()
        for tr in range(mx):
            print('Tgt:', end='')
            for tll in tl:
                try:
                    print(f'{tll[tr]:3d} |', end='')
                except IndexError:
                    print('    |', end='')
            print()

                


## Trace spike transmission 

- Build full connectivity tables for bad connectivty

In [307]:
bad_ct = ConnTables(zip(bad_sources, tgts), 4, 2)

- Look first at overall connectivity.
- Since we know that spikes from even-numbered neurons cause problems, consider 44 as source. It has targets 6, 44, and 47.

In [308]:
bad_ct.print_connectivity()

Src:  3 |  5 |  9 | 10 | 11 | 15 | 19 | 21 | 25 | 27 | 29 | 30 | 33 | 34 | 35 | 36 | 37 | 38 | 40 | 41 | 42 | 44 | 45 | 46 | 47 | 48 | 49 | 50 |
Tgt: 25 | 26 |  5 |  1 | 11 | 19 |  7 | 21 | 10 | 16 | 18 | 40 | 12 | 41 | 43 | 15 | 14 | 24 | 42 |  3 | 31 |  6 | 17 |  8 | 50 | 46 |  2 | 22 |
Tgt: 28 | 35 |  9 |    |    |    |    | 37 |    | 20 | 38 |    |    |    |    | 33 | 32 | 27 |    |  4 |    | 44 | 34 | 13 |    |    | 30 |    |
Tgt: 36 |    | 39 |    |    |    |    |    |    | 23 |    |    |    |    |    | 49 |    |    |    | 29 |    | 47 |    |    |    |    |    |    |
Tgt:    |    |    |    |    |    |    |    |    | 48 |    |    |    |    |    |    |    |    |    | 45 |    |    |    |    |    |    |    |    |


- Transmission starts with the target table.
- We find source 44 in the target table for Rank 0, Thread 0.
- I has two entries: The first points to target rank 0, compressed spike data entry 7, and rank 1, csd entry 17

In [309]:
bad_ct.print_target_table()

Target Table
Rank 0, Thread 0    Src:    36 |    40 |    44 |    48 |
                  TR CI:  1  5 |  0 12 |  0  7 |  0 15 |
                  TR CI:       |       |  1 17 |       |

Rank 0, Thread 1    Src:    10 |    30 |    34 |    38 |    42 |    46 |    50 |
                  TR CI:  1  2 |  0  2 |  1  4 |  0  5 |  1 16 |  0  8 |  0 17 |
                  TR CI:       |       |       |  1 15 |       |  1  8 |       |

Rank 1, Thread 0    Src:     5 |     9 |    21 |    25 |    29 |    33 |    37 |    41 |    45 |    49 |
                  TR CI:  0  9 |  1  1 |  1  3 |  0 10 |  0 11 |  0  3 |  0  4 |  0  6 |  0 13 |  0 16 |
                  TR CI:  1  9 |       |       |       |       |       |       |  1  6 |  1  7 |       |

Rank 1, Thread 1    Src:     3 |    11 |    15 |    19 |    27 |    35 |    47 |
                  TR CI:  0  0 |  1 10 |  1 11 |  1 12 |  0  1 |  1 14 |  0 14 |
                  TR CI:  1  0 |       |       |       |  1 13 |       |       |



In the compressed spike data, we find
- on rank 0, index 7, entries pointing to thread 0, LCID 10 and thread 1, LCID 6
- on rank 1, index 17, and entry pointing to thread 1, LCID 11

In [310]:
bad_ct.print_compressed_spike_data()

Compressed Spike Data
Rank 0    Idx:     0 |     1 |     2 |     3 |     4 |     5 |     6 |     7 |     8 |     9 |    10 |    11 |    12 |    13 |    14 |    15 |    16 |    17 |
        TT LC:  0  0 |  0  2 |  0  5 |  0  6 |  0  7 |  0  8 |  0  9 |  0 10 |  0 11 |  1  0 |  1  1 |  1  2 |  1  5 |  1  7 |  1  8 |  1  9 |  1 10 |  1 12 |
        TT LC:       |       |       |       |  1  4 |       |       |  1  6 |       |       |       |       |       |       |       |       |       |       |

Rank 1    Idx:     0 |     1 |     2 |     3 |     4 |     5 |     6 |     7 |     8 |     9 |    10 |    11 |    12 |    13 |    14 |    15 |    16 |    17 |
        TT LC:  0  0 |  0  1 |  0  3 |  0  4 |  0  6 |  0  7 |  0  9 |  0 11 |  0 12 |  1  0 |  1  2 |  1  3 |  1  4 |  1  5 |  1  6 |  1  8 |  1 10 |  1 11 |
        TT LC:       |  1  1 |       |       |       |  1  7 |  1  9 |       |       |       |       |       |       |       |       |       |       |       |



- On rank 0, thread 0, LCID 10, we find target 44
- On rank 0, thread 1, LCID 6, we find target 6
- On rank 1, thread 1, LCID 11, we find target 47

Thus, we have deliverd to all targets of source 44.

We only use the source table or looking up targets here for convenience, actual spike delivery is via connection infrastructure.

In [311]:
bad_ct.print_source_table()

Source Table
Rank 0, Thread 0   Idx:   0   1   2   3   4   5   6   7   8   9  10  11
                   Src:   3   3  27  27  27  30  33  37  38  41  44  46
                   Tgt:  28  36  16  20  48  40  12  32  24   4  44   8

Rank 0, Thread 1   Idx:   0   1   2   3   4   5   6   7   8   9  10  11  12
                   Src:   5  25  29  29  37  40  44  45  47  48  49  49  50
                   Tgt:  26  10  18  38  14  42   6  34  50  46   2  30  22

Rank 1, Thread 0   Idx:   0   1   2   3   4   5   6   7   8   9  10  11  12
                   Src:   3   9   9  10  21  21  34  36  36  41  41  45  46
                   Tgt:  25   5   9   1  21  37  41  33  49  29  45  17  13

Rank 1, Thread 1   Idx:   0   1   2   3   4   5   6   7   8   9  10  11
                   Src:   5   9  11  15  19  27  35  36  38  41  42  44
                   Tgt:  35  39  11  19   7  23  43  15  27   3  31  47



- For a different case, consider source 3 with targets 25, 28 and 36.
    - From the target table of rank 1/thread 1, we find entries for rank 0, csd index 0 and rank 1, csd index 0.
    - Both point to target thread 0, LCID 0
    - On rank 0/thread 0 we find target 28 at LCID 0. We also notice that the next entry also has source 3, so we also deliver to 36
    - On rank 1/thread 0 we find target 25 at LCID 0.
    - We have found all targets, 25, 28 and 36.

### How are the tables built?

- While connections are created on the target process, source_table is built in parallel to connection infrastructure

In [312]:
bad_ct.print_source_table()

Source Table
Rank 0, Thread 0   Idx:   0   1   2   3   4   5   6   7   8   9  10  11
                   Src:   3   3  27  27  27  30  33  37  38  41  44  46
                   Tgt:  28  36  16  20  48  40  12  32  24   4  44   8

Rank 0, Thread 1   Idx:   0   1   2   3   4   5   6   7   8   9  10  11  12
                   Src:   5  25  29  29  37  40  44  45  47  48  49  49  50
                   Tgt:  26  10  18  38  14  42   6  34  50  46   2  30  22

Rank 1, Thread 0   Idx:   0   1   2   3   4   5   6   7   8   9  10  11  12
                   Src:   3   9   9  10  21  21  34  36  36  41  41  45  46
                   Tgt:  25   5   9   1  21  37  41  33  49  29  45  17  13

Rank 1, Thread 1   Idx:   0   1   2   3   4   5   6   7   8   9  10  11
                   Src:   5   9  11  15  19  27  35  36  38  41  42  44
                   Tgt:  35  39  11  19   7  23  43  15  27   3  31  47



- On every thread, we then check for repeated sources and create a map of sources to the first index in the source table for the given thread with a connection for a given source
- Not shown here is that a "has more targets" marker is set in connection objects

In [313]:
bad_ct.print_compressible_sources()

Compressible Sources
Rank 0, Thread 0        Src:   3  27  30  33  37  38  41  44  46
                   1st LCID:   0   2   5   6   7   8   9  10  11

Rank 0, Thread 1        Src:   5  25  29  37  40  44  45  47  48  49  50
                   1st LCID:   0   1   2   4   5   6   7   8   9  10  12

Rank 1, Thread 0        Src:   3   9  10  21  34  36  41  45  46
                   1st LCID:   0   1   3   4   6   7   9  11  12

Rank 1, Thread 1        Src:   5   9  11  15  19  27  35  36  38  41  42  44
                   1st LCID:   0   1   2   3   4   5   6   7   8   9  10  11



- On each rank, we now integrate the data from the compressible sources table across threads on each rank
- We end up with two data structures:
    - For each unique source node, Compressed Spike Data contains a list of entries pointing to the source table (and thus connection infrastructure) location of the first target of the source on any given thread.
    - Compressed Spike Data Map maps source node ids to indices in the Compressed Spike Data array
- See also lookup examples above

In [314]:
bad_ct.print_compressed_spike_data_map()

Compressed Spike Data Map
Rank 0      Src:   3   5  25  27  29  30  33  37  38  40  41  44  45  46  47  48  49  50
        CSD Idx:   0   9  10   1  11   2   3   4   5  12   6   7  13   8  14  15  16  17

Rank 1      Src:   3   5   9  10  11  15  19  21  27  34  35  36  38  41  42  44  45  46
        CSD Idx:   0   9   1   2  10  11  12   3  13   4  14   5  15   6  16  17   7   8



In [315]:
bad_ct.print_compressed_spike_data()

Compressed Spike Data
Rank 0    Idx:     0 |     1 |     2 |     3 |     4 |     5 |     6 |     7 |     8 |     9 |    10 |    11 |    12 |    13 |    14 |    15 |    16 |    17 |
        TT LC:  0  0 |  0  2 |  0  5 |  0  6 |  0  7 |  0  8 |  0  9 |  0 10 |  0 11 |  1  0 |  1  1 |  1  2 |  1  5 |  1  7 |  1  8 |  1  9 |  1 10 |  1 12 |
        TT LC:       |       |       |       |  1  4 |       |       |  1  6 |       |       |       |       |       |       |       |       |       |       |

Rank 1    Idx:     0 |     1 |     2 |     3 |     4 |     5 |     6 |     7 |     8 |     9 |    10 |    11 |    12 |    13 |    14 |    15 |    16 |    17 |
        TT LC:  0  0 |  0  1 |  0  3 |  0  4 |  0  6 |  0  7 |  0  9 |  0 11 |  0 12 |  1  0 |  1  2 |  1  3 |  1  4 |  1  5 |  1  6 |  1  8 |  1 10 |  1 11 |
        TT LC:       |  1  1 |       |       |       |  1  7 |  1  9 |       |       |       |       |       |       |       |       |       |       |       |



- The data in the compressed spike data map is then transmitted, together with the source node id, to the presynaptic rank

In [316]:
bad_ct.print_target_send_buffers()

Target send buffers
From rank 0 To rank 0:      Src:  30  38  44  46  40  48  50
                        CSD Idx:   2   5   7   8  12  15  17

From rank 0 To rank 1:      Src:   3  27  33  37  41   5  25  29  45  47  49
                        CSD Idx:   0   1   3   4   6   9  10  11  13  14  16

From rank 1 To rank 0:      Src:  10  34  36  46  38  42  44
                        CSD Idx:   2   4   5   8  15  16  17

From rank 1 To rank 1:      Src:   3   9  21  41  45   5  11  15  19  27  35
                        CSD Idx:   0   1   3   6   7   9  10  11  12  13  14



- From this information, the presynaptic ranks can then build their target tables

In [317]:
bad_ct.print_target_table()

Target Table
Rank 0, Thread 0    Src:    36 |    40 |    44 |    48 |
                  TR CI:  1  5 |  0 12 |  0  7 |  0 15 |
                  TR CI:       |       |  1 17 |       |

Rank 0, Thread 1    Src:    10 |    30 |    34 |    38 |    42 |    46 |    50 |
                  TR CI:  1  2 |  0  2 |  1  4 |  0  5 |  1 16 |  0  8 |  0 17 |
                  TR CI:       |       |       |  1 15 |       |  1  8 |       |

Rank 1, Thread 0    Src:     5 |     9 |    21 |    25 |    29 |    33 |    37 |    41 |    45 |    49 |
                  TR CI:  0  9 |  1  1 |  1  3 |  0 10 |  0 11 |  0  3 |  0  4 |  0  6 |  0 13 |  0 16 |
                  TR CI:  1  9 |       |       |       |       |       |       |  1  6 |  1  7 |       |

Rank 1, Thread 1    Src:     3 |    11 |    15 |    19 |    27 |    35 |    47 |
                  TR CI:  0  0 |  1 10 |  1 11 |  1 12 |  0  1 |  1 14 |  0 14 |
                  TR CI:  1  0 |       |       |       |  1 13 |       |       |



### Debugging

- Comparison with debugging output indicdates that all steps up to and including building the compressed spike data (map) works.
- Also filling of send buffers and reading them out works in principle.
- But for some reason a resizing of the target data transmission buffers happens, which causes double transmission of connections and thus the problem at hand.
- Note also that the heuristics for determining the initial target buffer size are definitely wrong for compressed spikes and dubious otherwise.