## File format inconsistencies between `.bin` and `.pos` files wrt position data

The File format specifications from Axona (http://space-memory-navigation.org/DacqUSBFileFormats.pdf) indicate that the 20 bytes of position data in the packet header of the `.bin` file are formatted the same way as the data in the `.pos` file. Specifically, the following should be true (copied from the manual):

```
The format of the “.pos” file depends on the tracking mode. Each position sample is 20 bytes long, and consists of a 4-byte frame counter (incremented at around 50 Hz, according to the camera sync signal), and then 8 2-byte words. [...] In two-spot mode, they are big_spotx, big_spoty, little_spotx, little_spoty, number_of_pixels_in_big_spot, number_of_pixels_in_little_spot, total_tracked_pixels, and the 8th word is unused. Each word is MSB-first. If a position wasn't tracked (e.g., the light was obscured), then the values for x and y will both be 0x3ff (= 1023).
```

1.) However, when reading the position data from the `.bin` file (Axona Raw in https://drive.google.com/drive/u/0/folders/1QEwtyUNJKHzwVLt0WMUlhzLofQDPtGO4), the data looks like the order described is incorrect:
```
    X     Y    x     y      PX    px     PX+px        unused
[  55,   74, 1023, 1023,    0,   17,       0,            17],
[  56,   76, 1023, 1023,    0,   13,       0,            13],
...
```

Specifically, columns 1 to 4 look sensible, but columns 6 to 8 should be 'shifted' left and column 5 should be inserted in the current column 8. For comparison this is what the data looks like from the 'Axona_TINT_1ms' example data (https://drive.google.com/drive/u/0/folders/1cKXXuZR-cH49066ZB7xxF9__j2k3xZUu):
```
  X       Y     x      y    PX   px     PX+px        unused
[ 55     74   1023   1023   17   0        17             0]
[ 56     76   1023   1023   13   0        13             0]
...
```

2.) In addition, the data in the `.bin` file is little endian, whereas the data in the `.pos` file is big endian (most significant bytes first, as described in the file format manual). 

Below are a few snippets of code that illustrate this behavior. You need to substitute appropriate file filenames as needed.


In [None]:
# Full filename to .bin file
bin_filename = '/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/new_session_data/06172021-HPC-B6-RAW/06172021-HPC-B6-RAW.bin'

# Full filename to .pos file
pos_file = '/mnt/d/freelance-work/catalyst-neuro/hussaini-lab-to-nwb/new_session_data/06172021-HPC-B6-UNIT/06172021-HPC-B6-UNIT/06172021-B6-HPC-UNIT.pos'

In [1]:
# --- Read `.bin` file data with BinConverter tools and convert to `.pos` file ---

import os
import numpy as np

from BinConverter.core.readBin import get_bin_data, get_raw_pos
from BinConverter.core.CreatePos import create_pos


def get_header_bstring(file):
    """
    Scan file for the occurrence of 'data_start' and return the header
    as byte string

    Parameters
    ----------
    file (str or path): file to be loaded

    Returns
    -------
    str: header byte content
    """

    header = b''
    with open(file, 'rb') as f:
        for bin_line in f:
            if b'data_start' in bin_line:
                header += b'data_start'
                break
            else:
                header += bin_line
    return header


# Read bin file position data with BinConverter and show first 4 rows
raw_position = get_raw_pos(bin_filename)
print('position data read from `.bin` file with BinConverter tool:\n\n', raw_position[0:4, :].astype(int))


# Save to .pos file
pos_file_from_bin = bin_filename.replace('.bin', '.pos')
create_pos(pos_file_from_bin, raw_position)


# Read .pos file data and show first 4 rows
bytes_packet = 20
footer_size = len('\r\ndata_end\r\n')
header_size = len(get_header_bstring(pos_file_from_bin))
num_bytes = os.path.getsize(pos_file_from_bin) - header_size - footer_size
num_packets = num_bytes // bytes_packet

# Set dtypes (Big endian, read left to right - this is how it is described in the file format manual)
pos_dt = np.dtype([('t', ">i4"), ('X', ">i2"), ('Y', ">i2"), ('x', ">i2"), ('y', ">i2"), 
                   ('PX', ">i2"), ('px', ">i2"), ('tot_px', ">i2"), ('unused', ">i2")])

# Read position data from .pos file after conversion from .bin
np_pos = np.memmap(
    filename=pos_file_from_bin,
    dtype=pos_dt,
    mode='r',
    offset=len(get_header_bstring(pos_file_from_bin)),
    shape=(num_packets, ),
)
print('\n\n\nposition data read from `.pos` file with numpy (after converting from .bin with BinConverter):\n\n', np_pos[0:4,])

position data read from `.bin` file with BinConverter tool:

 [[150121    137     42   1023   1023      0     17      0     17]
 [150122    137     44   1023   1023      0     10      0     10]
 [150123    137     42   1023   1023      0     13      0     13]
 [150124    136     44   1023   1023      0     15      0     15]]



position data read from `.pos` file with numpy (after converting from .bin with BinConverter):

 [(150121, 137, 42, 1023, 1023, 0, 17, 0, 17)
 (150122, 137, 44, 1023, 1023, 0, 10, 0, 10)
 (150123, 137, 42, 1023, 1023, 0, 13, 0, 13)
 (150124, 136, 44, 1023, 1023, 0, 15, 0, 15)]


Illustrate column mismatch betwen `.bin` and `.pos` files (`.pos` files follow description in file format manual, `.bin` does not):

In [11]:
# --- Illustrate column mismatch betwen `.bin` and `.pos` files 
#     (`.pos` files follow description in file format manual, `.bin` does not): ---

import os
import numpy as np

from BinConverter.core.readBin import get_bin_data, get_raw_pos
from BinConverter.core.CreatePos import create_pos


def get_header_bstring(file):
    """
    Scan file for the occurrence of 'data_start' and return the header
    as byte string

    Parameters
    ----------
    file (str or path): file to be loaded

    Returns
    -------
    str: header byte content
    """

    header = b''
    with open(file, 'rb') as f:
        for bin_line in f:
            if b'data_start' in bin_line:
                header += b'data_start'
                break
            else:
                header += bin_line
    return header

# Read position data from `.bin` file
pos_dt_se = np.dtype([('t', "<i4"), ('X', "<i2"), ('Y', "<i2"), ('x', "<i2"), ('y', "<i2"), 
                   ('PX', "<i2"), ('px', "<i2"), ('tot_px', "<i2"), ('unused', "<i2")])

bin_dt = np.dtype([('id', "S4"), ('packet', "<i4"), ('di', "<i2"), ('si', "<i2"),
                   ('pos', pos_dt_se),
                   ('ephys', np.byte, 384),
                   ('trailer', np.byte, 16)
])

np_bin = np.memmap(
    filename=bin_filename,
    dtype=bin_dt,
    mode='r',
    offset=0,
)

pos_mask = [np_bin['id'] == b'ADU2']

pos_data = np_bin['pos'][pos_mask]

print('Reading .bin position data as little endian with numpy:\n\n', pos_data[0:12, ])



# Read .pos file data and show first 6 rows
bytes_packet = 20
footer_size = len('\r\ndata_end\r\n')
header_size = len(get_header_bstring(pos_file))
num_bytes = os.path.getsize(pos_file) - header_size - footer_size
num_packets = num_bytes // bytes_packet

# Set dtypes (Big endian, read left to right - this is how it is described in the file format manual)
pos_dt = np.dtype([('t', ">i4"), ('X', ">i2"), ('Y', ">i2"), ('x', ">i2"), ('y', ">i2"), 
                   ('PX', ">i2"), ('px', ">i2"), ('tot_px', ">i2"), ('unused', ">i2")])

# Read position data from .pos file after conversion from .bin
np_pos = np.memmap(
    filename=pos_file,
    dtype=pos_dt,
    mode='r',
    offset=len(get_header_bstring(pos_file)),
    shape=(num_packets, ),
)
print('\n\nposition data read from `.pos` file with numpy:\n\n', np_pos[0:6,])

Reading .bin position data as little endian with numpy:

 [(125330, 56, 152, 1023, 1023, 0, 16, 0, 16)
 (150120, 44, 136, 1023, 1023, 0, 11, 0, 11)
 (150120, 44, 136, 1023, 1023, 0, 11, 0, 11)
 (150121, 42, 137, 1023, 1023, 0, 17, 0, 17)
 (150121, 42, 137, 1023, 1023, 0, 17, 0, 17)
 (150122, 44, 137, 1023, 1023, 0, 10, 0, 10)
 (150122, 44, 137, 1023, 1023, 0, 10, 0, 10)
 (150123, 42, 137, 1023, 1023, 0, 13, 0, 13)
 (150123, 42, 137, 1023, 1023, 0, 13, 0, 13)
 (150124, 44, 136, 1023, 1023, 0, 15, 0, 15)
 (150124, 44, 136, 1023, 1023, 0, 15, 0, 15)
 (150125, 43, 137, 1023, 1023, 0, 15, 0, 15)]


position data read from `.pos` file with numpy:

 [(150121, 137, 42, 1023, 1023, 17, 0, 17, 0)
 (150122, 137, 44, 1023, 1023, 10, 0, 10, 0)
 (150123, 137, 42, 1023, 1023, 13, 0, 13, 0)
 (150124, 136, 44, 1023, 1023, 15, 0, 15, 0)
 (150125, 137, 43, 1023, 1023, 15, 0, 15, 0)
 (150126, 138, 45, 1023, 1023, 16, 0, 16, 0)]


Illustrate little endiannes of `.bin` file:

In [7]:
# --- Illustrate little endiannes of `.bin` file: ---

pos_dt_se = np.dtype([('t', "<i4"), ('X', "<i2"), ('Y', "<i2"), ('x', "<i2"), ('y', "<i2"), 
                   ('PX', "<i2"), ('px', "<i2"), ('tot_px', "<i2"), ('unused', "<i2")])

bin_dt = np.dtype([('id', "S4"), ('packet', "<i4"), ('di', "<i2"), ('si', "<i2"),
                   ('pos', pos_dt_se),
                   ('ephys', np.byte, 384),
                   ('trailer', np.byte, 16)
])

np_bin = np.memmap(
    filename=bin_filename,
    dtype=bin_dt,
    mode='r',
    offset=0,
)

pos_mask = [np_bin['id'] == b'ADU2']

pos_data = np_bin['pos'][pos_mask]

print('Reading .bin position data as little endian with numpy:\n\n', pos_data[0:6, ])



pos_dt_msb_first = np.dtype([('t', ">i4"), ('X', ">i2"), ('Y', ">i2"), ('x', ">i2"), ('y', ">i2"), 
                   ('PX', ">i2"), ('px', ">i2"), ('tot_px', ">i2"), ('unused', ">i2")])

bin_dt_msb_first = np.dtype([('id', "S4"), ('packet', ">i4"), ('di', ">i2"), ('si', ">i2"),
                   ('pos', pos_dt_msb_first),
                   ('ephys', np.byte, 384),
                   ('trailer', np.byte, 16)
])

np_bin_msb_first = np.memmap(
    filename=bin_filename,
    dtype=bin_dt_msb_first,
    mode='r',
    offset=0,
)

pos_data_msb_first = np_bin_msb_first['pos'][pos_mask]

print('\n\nReading .bin position data as big endian with numpy:\n\n', pos_data_msb_first[0:6, ])

Reading .bin position data as little endian with numpy:

 [(125330, 56, 152, 1023, 1023, 0, 16, 0, 16)
 (150120, 44, 136, 1023, 1023, 0, 11, 0, 11)
 (150120, 44, 136, 1023, 1023, 0, 11, 0, 11)
 (150121, 42, 137, 1023, 1023, 0, 17, 0, 17)
 (150121, 42, 137, 1023, 1023, 0, 17, 0, 17)
 (150122, 44, 137, 1023, 1023, 0, 10, 0, 10)]


Reading .bin position data as big endian with numpy:

 [(-1830223616, 14336, -26624, -253, -253, 0, 4096, 0, 4096)
 ( 1749680640, 11264, -30720, -253, -253, 0, 2816, 0, 2816)
 ( 1749680640, 11264, -30720, -253, -253, 0, 2816, 0, 2816)
 ( 1766457856, 10752, -30464, -253, -253, 0, 4352, 0, 4352)
 ( 1766457856, 10752, -30464, -253, -253, 0, 4352, 0, 4352)
 ( 1783235072, 11264, -30464, -253, -253, 0, 2560, 0, 2560)]
