# CSI raw data 三种数据格式下的文件大小和预览能力对比

数据来源：WiMorse中`data`文件夹下所有数据的拼接

目前raw data中参与记录的数据项有：

* timestamp_low
* rssi_a, rssi_b, rssi_c
* agc
* csi

## 数据格式

### CSV 格式

In [1]:
import numpy as np
import pandas as pd

from Bfee import Bfee
from CSVConverter import CSVConverter

csi_all = Bfee.records_from_offline_file("./data/csi_all.dat", timeCount=True)
CSVConverter.dumpIntoCSV(csi_all, timeCount=True)

time costed during reading data: 38.426286697387695 s
time cost of CSV dump: 17.647857427597046 s


In [2]:
dataframe = pd.read_csv("./dump.csv")
dataframe.head()

Unnamed: 0,timestamp_low,csi[0 0 0],csi[0 1 0],csi[0 2 0],csi[1 0 0],csi[1 1 0],csi[1 2 0],csi[2 0 0],csi[2 1 0],csi[2 2 0],...,csi[28 0 0],csi[28 1 0],csi[28 2 0],csi[29 0 0],csi[29 1 0],csi[29 2 0],rssi_a,rssi_b,rssi_c,agc
0,963642667,5+4j,-3+7j,-1+9j,3-10j,11-1j,15-5j,-14+3j,-9-15j,-17-16j,...,14-5j,15+11j,19+7j,-6-6j,2-10j,-1-11j,38,39,41,27
1,963652668,5-3j,4+5j,7+4j,-7-6j,4-10j,2-14j,-3+12j,-15+1j,-18+7j,...,1-13j,14-7j,12-12j,-7+3j,-8-5j,-10-3j,38,39,41,27
2,963662666,-4+5j,-6-3j,-8-1j,8+5j,-4+11j,5,3-13j,16-1j,20-7j,...,13+6j,2+18j,8+17j,2-8j,9-4j,8-7j,38,39,41,27
3,963672669,-5-3j,2-7j,-1-9j,-1+10j,-11+2j,-13+7j,13-5j,10+14j,18+13j,...,-12+8j,-17-8j,-20-2j,7+4j,0,4+10j,38,39,41,27
4,963682666,-5-1j,6,-2-8j,1+8j,-9+5j,-10+9j,10-7j,12+9j,18+6j,...,-8+10j,-16-3j,-18+3j,7+2j,3+9j,6+8j,38,39,41,27


### JSON 格式

单个bfee的格式转换集成于Bfee类中，bfee列表的格式转换使用如下`dumpIntoJson`函数。

In [3]:
def dumpIntoJson(bfees, file_name = "./dump.json", maxLimitN=0, timeCount=False):
    # bfees: bfee数据列表
    # file_name: 写出文件名称
    # maxLimitN: 写出记录最大条数限制。如果为0，则不限制
    # timeCount: 是否对该dump过程计时
    
    from Bfee import Bfee
    assert len(bfees) >= 1
    assert type(bfees[0]) == Bfee
    assert maxLimitN >= 0
    assert not os.path.isfile(file_name), "File already exists"

    if timeCount:
        import time
        time_sta = time.time()

    maxLimitN = len(bfees) if maxLimitN == 0 else min(maxLimitN, len(bfees))
    counter = 0
    objStrtList = list()
    for bfee in bfees:
        if counter >= maxLimitN:
            break
        objStrtList.append(bfee.to_json())
        counter += 1

    objsStr = ','.join(objStrtList)
    with open(file_name, "w") as file:
        file.write('[')
        file.write(objsStr)
        file.write("]")

    if timeCount:
        time_end = time.time()
        print("time costed dumping data into json form:", time_end - time_sta, "s")

In [4]:
dumpIntoJson(csi_all, timeCount=True)

time costed dumping data into json form: 18.788880348205566 s


该文件中json数组的第一个`bfee`对象预览如下：

``` json
{
    "timestamp_low":963642667,
    "rssi_a":38,
    "rssi_b":39,
    "rssi_c":41,
    "agc":27,
    "csi":[
        [
            [
                "5+4j"
            ],
            [
                "-3+7j"
            ],
            [
                "-1+9j"
            ]
        ],
        [
            [
                "3-10j"
            ],
            [
                "11-1j"
            ],
            [
                "15-5j"
            ]
        ],
        [
            [
                "-14+3j"
            ],
            [
                "-9-15j"
            ],
            [
                "-17-16j"
            ]
        ],
        [
            [
                "6+15j"
            ],
            [
                "-13+12j"
            ],
            [
                "-13+20j"
            ]
        ],
        [
            [
                "14-13j"
            ],
            [
                "19+10j"
            ],
            [
                "28+5j"
            ]
        ],
        [
            [
                "-12-11j"
            ],
            [
                "7-17j"
            ],
            [
                "2-24j"
            ]
        ],
        [
            [
                "-10+14j"
            ],
            [
                "-19-4j"
            ],
            [
                "-26+2j"
            ]
        ],
        [
            [
                "13+7j"
            ],
            [
                "-1+17j"
            ],
            [
                "4+22j"
            ]
        ],
        [
            [
                "7-13j"
            ],
            [
                "16+2j"
            ],
            [
                "21-5j"
            ]
        ],
        [
            [
                "-11-8j"
            ],
            [
                "3-15j"
            ],
            [
                "-3-20j"
            ]
        ],
        [
            [
                "-8+12j"
            ],
            [
                "-16-3j"
            ],
            [
                "-21+3j"
            ]
        ],
        [
            [
                "12+8j"
            ],
            [
                "-3+16j"
            ],
            [
                "3+21j"
            ]
        ],
        [
            [
                "9-12j"
            ],
            [
                "16+4j"
            ],
            [
                "22-2j"
            ]
        ],
        [
            [
                "-11-9j"
            ],
            [
                "4-16j"
            ],
            [
                "-2-21j"
            ]
        ],
        [
            [
                "-10+11j"
            ],
            [
                "-15-5j"
            ],
            [
                "-21+1j"
            ]
        ],
        [
            [
                "-10+10j"
            ],
            [
                "-15-5j"
            ],
            [
                "-19+1j"
            ]
        ],
        [
            [
                "10+11j"
            ],
            [
                "-6+16j"
            ],
            [
                "1"
            ]
        ],
        [
            [
                "12-10j"
            ],
            [
                "16+8j"
            ],
            [
                "22+2j"
            ]
        ],
        [
            [
                "-10-12j"
            ],
            [
                "8-16j"
            ],
            [
                "1-22j"
            ]
        ],
        [
            [
                "-13+8j"
            ],
            [
                "-15-10j"
            ],
            [
                "-21-4j"
            ]
        ],
        [
            [
                "9+14j"
            ],
            [
                "-9+17j"
            ],
            [
                "-3+22j"
            ]
        ],
        [
            [
                "14-10j"
            ],
            [
                "18+10j"
            ],
            [
                "23+3j"
            ]
        ],
        [
            [
                "-11-17j"
            ],
            [
                "12-21j"
            ],
            [
                "4-27j"
            ]
        ],
        [
            [
                "-18+13j"
            ],
            [
                "-24-11j"
            ],
            [
                "-30-2j"
            ]
        ],
        [
            [
                "16+15j"
            ],
            [
                "-7+25j"
            ],
            [
                "3+29j"
            ]
        ],
        [
            [
                "13-20j"
            ],
            [
                "28+2j"
            ],
            [
                "31-9j"
            ]
        ],
        [
            [
                "-19-7j"
            ],
            [
                "-5-24j"
            ],
            [
                "-14-23j"
            ]
        ],
        [
            [
                "1+19j"
            ],
            [
                "-19+12j"
            ],
            [
                "-15+19j"
            ]
        ],
        [
            [
                "14-5j"
            ],
            [
                "15+11j"
            ],
            [
                "19+7j"
            ]
        ],
        [
            [
                "-6-6j"
            ],
            [
                "2-10j"
            ],
            [
                "-1-11j"
            ]
        ]
    ]
}
```

### 二进制格式

认为`Nrx`和`Ntx`已知（或已经记录于文档中），有二进制数据记录格式如下：

![二进制数据格式](./fig_数据存储格式/二进制数据格式.png)

其中，csi数据项为*有符号*复数，其余数据项为无符号整数且使用**小端**存储。复数格式为：

![复数二进制格式](./fig_数据存储格式/复数二进制格式.png)

csi复数的存储顺序为“从最后一个维度开始自增”，一个$30\times 3 \times 2$（WiMorse为$30\times 3 \times 1$）的csi数据的存储`index`顺序如下：

```
0 0 0
0 0 1
0 1 0
0 1 1
0 2 0
0 2 1
1 0 0
...
```

单个`bfee`的格式转换集成于`Bfee`类中，bfee列表的转换使用如下`dumpIntoBin`函数：

In [5]:
def dumpIntoBin(bfees, file_name = "./dump.bin", maxLimitN=0, timeCount=False):
    # bfees: bfee数据列表
    # file_name: 写出文件名称
    # maxLimitN: 写出记录最大条数限制。如果为0，则不限制
    # timeCount: 是否对该dump过程计时
    
    from Bfee import Bfee
    assert len(bfees) >= 1
    assert type(bfees[0]) == Bfee
    assert maxLimitN >= 0
    assert not os.path.isfile(file_name), "File already exists"

    if timeCount:
        import time
        time_sta = time.time()

    maxLimitN = len(bfees) if maxLimitN == 0 else min(maxLimitN, len(bfees))
    with open(file_name, "wb") as file:
        counter = 0
        for bfee in bfees:
            if counter >= maxLimitN:
                break
            file.write(bfee.to_simple_bytes())
            counter += 1

    if timeCount:
        time_end = time.time()
        print("time costed dumping data into binary form:", time_end - time_sta, "s")

In [6]:
dumpIntoBin(csi_all, timeCount=True)

time costed dumping data into binary form: 20.371546745300293 s


In [7]:
csi_all[0].csi[29,2,0].imag # 用于验证

-11.0

一个该形式存储的对应于`csi_all[0]`的bfee如下：

![](./fig_数据存储格式/二进制存储样例（一个bfee记录数据）.png)

总长为$4 + 1 + 1 + 1 + 30 \times 3 \times 1 \times 2 = 188 bytes$，数据的正确性已验证。

## 数据大小对比

三种格式下的记录数据和采集到的raw data在zip压缩前后数据大小的对比如下：

![](./fig_数据存储格式/三种格式的数据大小对比.png)

根据此图，从文件大小、读写速度和可预览性这三个方面，我认为选择现有的*CSV*和*二进制*格式来记录和发布数据比较合适。

## 未来需要改进的点

二进制的读写速度应该比文本快的说……可能是转换格式消耗了较多时间……或许这个工作应该用c++写比较好。