Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot correctly compress unsigned 64 bit integers #3

Closed
djordjijeK opened this issue Apr 10, 2021 · 6 comments
Closed

Cannot correctly compress unsigned 64 bit integers #3

djordjijeK opened this issue Apr 10, 2021 · 6 comments

Comments

@djordjijeK
Copy link

djordjijeK commented Apr 10, 2021

Hello @ldhulipala,

I am trying to insert the following sequence of elements into a C-Tree (source node 190):

0     2292492
1     5900230
2     7081380
3     9443342
4     45252689
5     55309280
6     72199199
7     167185196
8     173106711
9     190827105
10   274996453
11   279926447
12   282677159
13   397165590
14   484836423
15   636048859
16   681366938
17   752569505
18   777406108
19   794845798
20   835441400
21   948701687
22   971568942
23   1004827021
24   1014104131
25   1092434791
26   1099453513
27   1187698529
28   1252806354
29   1274847185
30   1316093405
31   1350415586
32   1448563966
33   1578075731
34   1657955540
35   1769296280
36   1854422159
37   1973136809
38   2013407205
39   2151104521
40   2153145709
41   2226707853
42   2331731050
43   2376952936
44   2395319823
45   2484324838
46   2515122883
47   2746598551
48   2800314886
49   2822266596
50   2830984936
51   2836947498
52   3110181467
53   3140257526
54   3213983282
55   3312923470
56   3402622332
57   3474041940
58   3516016149
59   3577954138
60   3609606505
61   3648402113
62   4093824822
63   4212399599
64   4240353940

In general, I need 64 bit integers to store my numbers. I enabled the VERTEXLONG flag, thus turning uintV into unsigned long integers. However, when I print the content of a C-Tree (for source 190) I get the following:

0 2292492
1 5900230
2 7081380
3 9443342
4 45252689
5 55309280
6 72199199
7 167185196
8 173106711
9 190827105
10 274996453
11 279926447
12 282677159
13 397165590
14 484836423
15 636048859
16 681366938
17 752569505
18 777406108
19 794845798
20 835441400
21 948701687
22 971568942
23 1004827021
24 1014104131
25 1092434791
26 1099453513
27 1187698529
28 1252806354
29 1274847185
30 1316093405
31 1350415586
32 1448563966
33 1578075731
34 1657955540
35 1769296280
36 1854422159
37 1973136809
38 2013407205
39 18446744071565688841
40 18446744071567730029
41 18446744071641292173
42 18446744071746315370
43 18446744071791537256
44 18446744071809904143
45 18446744071898909158
46 18446744071929707203
47 18446744072161182871
48 18446744072214899206
49 18446744072236850916
50 18446744072245569256
51 18446744072251531818
52 18446744072524765787
53 18446744072554841846
54 18446744072628567602
55 18446744072727507790
56 18446744072817206652
57 18446744072888626260
58 18446744072930600469
59 18446744072992538458
60 18446744073024190825
61 18446744073062986433
62 18446744073508409142
63 18446744073626983919
64 18446744073654938260

After diving deeper into the code I noticed that first 38 elements are stored in a plus of a tree while other elements are stored in only head of this tree and its corresponding list. For picking heads I am using 255 mask, meaning that all lists should have approx. 256 elements. As far as I can see, compression fails in this case for some reason, and the value is wrongly decoded.
Could you please point me to changes that I need to introduce in order to properly compress unsigned 64 bit numbers? Thanks in advance!

@djordjijeK
Copy link
Author

Hello again @ldhulipala,

I tried to debug my code and to see where the things go wrong. However, I still cannot figure out why is this happening. I sort the values, remove duplicates and make sure that all are unique before inserting them (just like the paper says). This is my latest example:

Source Vertex: 428
First head index: 39

Index: 0 C-Tree Value: 529205 Intended value: 529205
Index: 1 C-Tree Value: 531888 Intended value: 531888
Index: 2 C-Tree Value: 10068198 Intended value: 10068198
Index: 3 C-Tree Value: 46908941 Intended value: 46908941
Index: 4 C-Tree Value: 51409576 Intended value: 51409576
Index: 5 C-Tree Value: 51438163 Intended value: 51438163
Index: 6 C-Tree Value: 62805785 Intended value: 62805785
Index: 7 C-Tree Value: 69656145 Intended value: 69656145
Index: 8 C-Tree Value: 104960837 Intended value: 104960837
Index: 9 C-Tree Value: 108535170 Intended value: 108535170
Index: 10 C-Tree Value: 117051190 Intended value: 117051190
Index: 11 C-Tree Value: 129140942 Intended value: 129140942
Index: 12 C-Tree Value: 187827617 Intended value: 187827617
Index: 13 C-Tree Value: 203804477 Intended value: 203804477
Index: 14 C-Tree Value: 311699617 Intended value: 311699617
Index: 15 C-Tree Value: 315773479 Intended value: 315773479
Index: 16 C-Tree Value: 385258656 Intended value: 385258656
Index: 17 C-Tree Value: 396328733 Intended value: 396328733
Index: 18 C-Tree Value: 397723264 Intended value: 397723264
Index: 19 C-Tree Value: 472062957 Intended value: 472062957
Index: 20 C-Tree Value: 525739086 Intended value: 525739086
Index: 21 C-Tree Value: 642622948 Intended value: 642622948
Index: 22 C-Tree Value: 828691642 Intended value: 828691642
Index: 23 C-Tree Value: 866478256 Intended value: 866478256
Index: 24 C-Tree Value: 1054821135 Intended value: 1054821135
Index: 25 C-Tree Value: 1104299789 Intended value: 1104299789
Index: 26 C-Tree Value: 1133871201 Intended value: 1133871201
Index: 27 C-Tree Value: 1183842095 Intended value: 1183842095
Index: 28 C-Tree Value: 1324450858 Intended value: 1324450858
Index: 29 C-Tree Value: 1334660134 Intended value: 1334660134
Index: 30 C-Tree Value: 1406925354 Intended value: 1406925354
Index: 31 C-Tree Value: 1521702510 Intended value: 1521702510
Index: 32 C-Tree Value: 1624251477 Intended value: 1624251477
Index: 33 C-Tree Value: 1807015221 Intended value: 1807015221
Index: 34 C-Tree Value: 1847108893 Intended value: 1847108893
Index: 35 C-Tree Value: 1923261453 Intended value: 1923261453
Index: 36 C-Tree Value: 1988893221 Intended value: 1988893221
Index: 37 C-Tree Value: 2102681453 Intended value: 2102681453
Index: 38 C-Tree Value: 2138877776 Intended value: 2138877776
Index: 39 C-Tree Value: 2170441789 Intended value: 2170441789
Index: 40 C-Tree Value: 18446744071626123641 Intended value: 2211539321
Index: 41 C-Tree Value: 18446744071765185883 Intended value: 2350601563
Index: 42 C-Tree Value: 18446744071814409115 Intended value: 2399824795
Index: 43 C-Tree Value: 18446744071898909238 Intended value: 2484324918
Index: 44 C-Tree Value: 18446744071920788611 Intended value: 2506204291
Index: 45 C-Tree Value: 18446744072265610667 Intended value: 2851026347
Index: 46 C-Tree Value: 18446744072449382542 Intended value: 3034798222
Index: 47 C-Tree Value: 18446744072481307364 Intended value: 3066723044
Index: 48 C-Tree Value: 18446744072501498027 Intended value: 3086913707
Index: 49 C-Tree Value: 18446744072599042509 Intended value: 3184458189
Index: 50 C-Tree Value: 18446744072668830709 Intended value: 3254246389
Index: 51 C-Tree Value: 18446744072982854685 Intended value: 3568270365
Index: 52 C-Tree Value: 18446744073303728681 Intended value: 3889144361
Index: 53 C-Tree Value: 18446744073538150814 Intended value: 4123566494
Index: 54 C-Tree Value: 18446744073542647268 Intended value: 4128062948
Index: 55 C-Tree Value: 22179868 Intended value: 4317147164
Index: 56 C-Tree Value: 186799780 Intended value: 4481767076
Index: 57 C-Tree Value: 226250733 Intended value: 4521218029
Index: 58 C-Tree Value: 345640017 Intended value: 4640607313
Index: 59 C-Tree Value: 563680749 Intended value: 4858648045
Index: 60 C-Tree Value: 564098692 Intended value: 4859065988
Index: 61 C-Tree Value: 770059537 Intended value: 5065026833
Index: 62 C-Tree Value: 786584201 Intended value: 5081551497
Index: 63 C-Tree Value: 813708489 Intended value: 5108675785
Index: 64 C-Tree Value: 882842598 Intended value: 5177809894
Index: 65 C-Tree Value: 1046011889 Intended value: 5340979185
Index: 66 C-Tree Value: 1066640573 Intended value: 5361607869
Index: 67 C-Tree Value: 1132301926 Intended value: 5427269222
Index: 68 C-Tree Value: 1265121469 Intended value: 5560088765
Index: 69 C-Tree Value: 1350752177 Intended value: 5645719473
Index: 70 C-Tree Value: 1429165713 Intended value: 5724133009
Index: 71 C-Tree Value: 1961052030 Intended value: 6256019326

After plus is generated (correctly) the head and compression tree is not working. Either I am doing something wrong or there is a bug in the compression itself (reader/writer). This happens when I enable VERTEXLONG. However when I use 32 bits to store edges initially things work correctly but my use case needs 64 bits to work since some numbers are greater then 2^32 - 1.
I would appreciate if you could help me with this. Best!

@spapadias
Copy link

@ldhulipala Could you please kindly reply to our super important question, which is related to our project that utilizes Aspen?

We really necessitate running Aspen with 64-bit integers inside the C-trees. It would be really great and I would appreciate it if you could think of some potential causes of this issue such as problematic scenarios where the different encoding scheme that you couple with the variable-byte codes is not functioning properly for 64-bit numbers (btw for 32-bit numbers the same code operates smoothly)

I am waiting for your reply as an expert and creator of the Aspen system. Thanks in advance! :)

Best,
Makis

@ldhulipala
Copy link
Owner

Hi Makis,

Just FYI, we are working on a new release of Aspen which will fix this issue. The ETA is to make this available in this repo in mid-June (it is impossible to do it earlier due to a number of other commitments and deadlines in May and early June).

Thanks for your patience,
Laxman

@spapadias
Copy link

Hi @ldhulipala,

Thanks a lot for your reply. It is really nice that you intend to release a new version of Aspen. I really believe C-trees are very impactful man.

It would be really great if you plan to support even bigints in Aspen such as

  1. https://github.com/boostorg/multiprecision
  2. https://www.boost.org/doc/libs/1_58_0/libs/multiprecision/doc/html/index.html

It would be really super if you could support this in your release as I am working with really huge integers. I am aware of an unofficial extension of Microsoft's compiler but I am not sure this workaround with work smoothly. (https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc)

I definitely understand that you are a super busy man (I have checked your awesome publication record :-) and I am convinced about that haha).

I am really looking forward to your release!

Best and thanks,
Makis

@djordjijeK
Copy link
Author

Hello again @ldhulipala

We spent some time trying to make the compression work for 64-bit numbers. We noticed that in compression.h lines like this one: (b & 0x7f) << shiftAmount produce 32-bit numbers which cause the problems I explained in my previous posts. Furthermore, we changed compression_iter.h to support 64-bit numbers in offsets. For example:

  • Before
offset += sizeof(uint16_t) + sizeof(uint16_t) + 2*sizeof(uint32_t);

uint16_t* deg = (uint16_t*)node_ptr;
uint16_t* total_size = (uint16_t*)(node_ptr + sizeof(uint16_t));
uint32_t* ref_ct = (uint32_t*)(node_ptr + 2*sizeof(uint16_t));
uint32_t* last_ngh_ptr = (uint32_t*)(node_ptr + 2*sizeof(uint16_t) + sizeof(uint32_t));
  • Now:
offset += sizeof(uint16_t) + sizeof(uint16_t) + sizeof(uint32_t) + sizeof(uint64_t);

uint16_t* deg = (uint16_t*)node_ptr;
uint16_t* total_size = (uint16_t*)(node_ptr + sizeof(uint16_t));
uint32_t* ref_ct = (uint32_t*)(node_ptr + 2*sizeof(uint16_t));
uint64_t* last_ngh_ptr = (uint64_t*)(node_ptr + 2*sizeof(uint16_t) + sizeof(uint32_t));

and changed the rest of the file accordingly. The first results show that this works, however, I would like to know if there is anything else that we need to change in order to produce 64-bit number compression. This is really important for us since we have deadlines soon and we would be grateful if you could help us (and sorry to bother you again with this).

Thanks in advance!

@spapadias
Copy link

Hello @ldhulipala,

We know your schedule is full until the end of May. It would be awesome if you find our fixes correct and proper (since you know the codebase better than us) to confirm it is correct. To the best of our knowledge, C-trees are now populated properly now with 64-bit integers as well.

If I may remind you, Aspen fans i.e., us for sure :), would like to see Aspen v2 support bigints like 128-bit integers if you can plug in multiprecision boost lib relatively easy to your system.

We are waiting to use Aspen v2 man!

Best,
Makis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants