Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitfields #1616

Merged
merged 1 commit into from
Sep 13, 2017
Merged

Bitfields #1616

merged 1 commit into from
Sep 13, 2017

Conversation

stevelinton
Copy link
Contributor

This is work in progress on an implementation of fast access to bitfields within small integers.
It seems to work, but is not thoroughly tested. As currently committed it needs a CPU with BEXTR (Haswell or later). I'll set up autoconf for that, of course.

Performance seems pretty decent. My laptop does a loop with 200 million gets in about 1.8s and 200 million sets in about 2.5s.

Relates to issue #1611

@stevelinton stevelinton added the do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) label Aug 22, 2017
@codecov
Copy link

codecov bot commented Aug 23, 2017

Codecov Report

Merging #1616 into master will increase coverage by <.01%.
The diff coverage is 89.43%.

@@            Coverage Diff             @@
##           master    #1616      +/-   ##
==========================================
+ Coverage   64.44%   64.44%   +<.01%     
==========================================
  Files        1002     1002              
  Lines      326730   326852     +122     
  Branches    13202    13216      +14     
==========================================
+ Hits       210550   210656     +106     
- Misses     113321   113331      +10     
- Partials     2859     2865       +6
Impacted Files Coverage Δ
src/intfuncs.c 90.58% <89.43%> (-0.6%) ⬇️
src/hpc/threadapi.c 34.08% <0%> (-0.47%) ⬇️
src/funcs.c 70.7% <0%> (-0.29%) ⬇️
src/hpc/traverse.c 78.21% <0%> (-0.09%) ⬇️
src/gap.c 57.07% <0%> (+0.08%) ⬆️
src/stats.c 73.2% <0%> (+0.13%) ⬆️
src/hpc/thread.c 46.64% <0%> (+0.19%) ⬆️
src/objset.c 82.59% <0%> (+0.23%) ⬆️

@stevelinton stevelinton force-pushed the bitmasks branch 2 times, most recently from 0789834 to a280e5c Compare August 23, 2017 13:12
@stevelinton stevelinton changed the title WIP Bitmasks MaskSets Aug 23, 2017
@stevelinton stevelinton removed the do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) label Aug 23, 2017
@stevelinton
Copy link
Contributor Author

Now essentially complete and ready for review.

@Stefan-Kohl
Copy link
Member

How is it possible that merging this reduces global code coverage by about 14 percent? -- I would guess that adding such little feature shouldn't change code coverage in any significant way(?)

@olexandr-konovalov
Copy link
Member

@Stefan-Kohl https://codecov.io/gh/gap-system/gap/pull/1616 reports coverage at 64.01% - I think the comment above stays from some earlier test which failed (so less code covered) and then codecov failed to update that comment by some reason.

@fingolfin
Copy link
Member

The coverage data above is indeed meaningless, as some of the tests failed. Ultimately, this is still a bug in codecov (it should report something like "unable to generate coverage percentage" or so). But one we can easily recognize and live with.

@olexandr-konovalov
Copy link
Member

Yes, maskset.tst fails with

testing: /home/travis/build/gap-system/gap/tst/testinstall/maskset.tst
# line 7 of 41 (17%)gap: ./src/calls.h:233: SET_BODY_FUNC: Assertion `TNUM_OBJ(body) == T_BODY' failed.

lib/masksets.gd Outdated
## This function sets up the machinery for a set of bitfields of the given
## widths. The total of the widths must not exceed 60 bits on 64-bit architecture
## or 28 bits on a 32-bit architecture and the number of widths must not exceed
## 10 on a 64-bit aerchitecture or 6 on a 32-bit architecture. Also for performance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "aerchitecture"

lib/masksets.gd Outdated
## or 28 bits on a 32-bit architecture and the number of widths must not exceed
## 10 on a 64-bit aerchitecture or 6 on a 32-bit architecture. Also for performance
## reasons some checks that one might wish to do are ommitted. In particular,
## the builder and setter functions do not check if the value[s] passed to them are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uneven offset of the comment text

lib/masksets.gd Outdated
## the builder and setter functions do not check if the value[s] passed to them are
## negative or too large.
##
## You can tell which architecture you are running on by acccessing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which architecture" -> "which type architecture" (or "kind" of whatever)

lib/masksets.gd Outdated
##
## <Description>
##
## This function sets up the machinery for a set of bitfields of the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why not call this a "bitfield" then? The name is well-known, at least for people who know C/C++. Whereas "MaskSet" is not (and I don't think I would have guessed what it means here).

lib/masksets.gd Outdated
## one of the fields from an immediate integer</Item>
## <Mark><C>setters</C></Mark> <Item> a list of <M>n</M> functions of two arguments each of which returns the packed value
## in which one of the fields has been replaced by the given value
## note that this does NOT modify the immediate integer.</Item>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not immediately clear what "the" immediate integer here is. Perhaps say: "... a list of n functions, each taking two arguments, and ; the return value of the i-th setter is a new packed value, with the value of the i-th field set to ."

All in all, I think it would be a good idea to pick a specific set of terms for the various "parts" involved here, and stick to them. E.g. referring to the same thing sometimes as "immediate integer" and sometimes as "packed value" is confusing. I'd explain the connection once, then stick to one term.

src/intfuncs.c Outdated
static Obj doMaskGetter(Obj self, Obj data) {
UInt mask = NLOC_FUNC(self);
UInt start = get_start(mask);
UInt len = get_len(mask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funky formatting here.

Perhaps you might be willing to consider using clang-format?

lib/masksets.gd Outdated
## 10 on a 64-bit aerchitecture or 6 on a 32-bit architecture. Also for performance
## reasons some checks that one might wish to do are ommitted. In particular,
## the builder and setter functions do not check if the value[s] passed to them are
## negative or too large.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... what happens then? Is the value cut off? Are bits outside of the fields modified?

At the very least say: "If the caller does this, the result is completely undefined"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: I think this implicitly says: "the field values are non-negative values", but this is never said explicitly (or am I just overlooking it?)

src/intfuncs.c Outdated

/*
* For the builders we also repurpose BODY_FUNC to hold a GAP integer telling
* us how many fields there are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"also"? As in: something else was "also" repurposed? But what?

It would be really nice if there was a documentation comment before this code which gave at least a brief overview of what is stored where and why...

src/intfuncs.c Outdated
*/

#ifdef SYS_IS_64_BIT
#define LENSTART_WIDTH 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These days, we try to use enums for constants like this. They give better compiler diagnostics, and one is likely to shoot oneself into the foot, e.g. with #define VALUE 1+2 ... x = VALUE * 2;.

src/intfuncs.c Outdated
#else
#define LENSTART_WIDTH 8
#define WIDTH_WIDTH 5
#define MAX_FIELDS 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there at most 6 fields? I have not yet tried to understand the code after this, but I just looked at it, and it seems quite complicated... But why is that necessary? In my imagination, it would be sufficient to do this: For each field, the getter/setter need exactly two things: the offset of the field, and a mask (or alternatively, the number of bits in the fields, but I think the mask is more efficient).

The mask takes up one Obj, the offset another (but the offset fits in a single byte, so it might be easy to stash it somewhere else). If we are really concerned about space usage, we can store an offset and field with in 6 bit each.

Assuming we stored the mask and offset, the generic setter andgetter function would look like this (in untested pseudo code, but I hope it's enough to get the basic idea across); note that this does not use INTOBJ_INT or INT_INTOBJ at, because if we setup the mask and offset suitably, they are not necessary.

Obj SetterFunc(Obj self, Obj packedValue, Obj fieldValue)
{
   GAP_ASSERT(IS_INTOBJ(packedValue));
   GAP_ASSERT(IS_INTOBJ(fieldValue));
   UInt mask = extractMaskFromFunc(self);
   UInt offset = extractOffsetFromFunc(self);
   UInt v = ((UInt)fieldValue ^ 1) << offset;
   UInt newVal = ((((UInt)packedValue) & ~mask) | v;
   return (Obj)newVal;
}
Obj GetterFunc(Obj self, Obj packedValue)
{
   GAP_ASSERT(IS_INTOBJ(packedValue));
   UInt mask = extractMaskFromFunc(self);
   UInt offset = extractOffsetFromFunc(self);
   UInt newVal = ((((UInt)packedValue) & mask) >> offset) | 1;
   return (Obj)newVal;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have plenty of fields to store things in: ENVI_FUNC, NLOC_FUNC, BODY_FUNC, FEXS_FUNC

@stevelinton stevelinton changed the title MaskSets Bitfields Aug 24, 2017
@stevelinton
Copy link
Contributor Author

Renamed to bitfields and improved comments and documentation. Still a few points to deal with.

@stevelinton
Copy link
Contributor Author

I think I've addressed everything from @fingolfin's verty helpful review except one point. This concerns how the getter and setter functions store and use the information that tells them about the specific bitfield they are getting or setting. I use a single word which stores the offset (low 8 bits) and length (rest) of the field. This is compatible with the BEXTR instruction, which we, or a very smart compiler could use for the getter.

@fingolfin suggests using two words (which we have available) one to store the offset and the other a mask to select out the desired bits. This makes the C getter and setter code a bit simpler, and probably also the generated assembler, but doesn't fit with BEXTR and does involve accessing one extra word of memory, although probably not a new cache line.

I'm not sure which approach is better.

@markuspf
Copy link
Member

Do you already have a variant of your AVL tree code that uses these bitfields (as a showcase if nothing else)?

It seems to be the time for me to do some nitpicking: I think every other place in the GAP kernel uses a capital D in the DoXOperation function names, and generally CamelCase seems to be used more often than snake_case (referring to get_start, get_len and friends).

@stevelinton
Copy link
Contributor Author

I have a variant of the union-find structure. Runs about as fast and uses half the memory Look at https://github.com/stevelinton/datastructures/tree/union-find (in gap/union-find2.gi and in src/uf.c in the functions with 2 in their names.

@stevelinton stevelinton force-pushed the bitmasks branch 4 times, most recently from ffecdb9 to 24b80a5 Compare August 29, 2017 10:05
@stevelinton
Copy link
Contributor Author

@markuspf I've now got my AVL tree code with threading etc. working as far as adding and iterating over the tree. I haven't attacked delete yet, or done the kernel accelerators. The main conclusion for the bitfields is that we really want support for Boolean fields -- that is one bit fields whose getters and setters deal with True and False instead of 1 and 0. Shouldn't make any performance difference worth the mention, and it will make a lot of code more readable.

@fingolfin
Copy link
Member

The test failure is due to the the manual example not being "clean"

@fingolfin
Copy link
Member

I have some doubts about focusing on BEXTR. That's on instruction on one architecture. And it may not be what on wants to do at some in the future. So tying the design of the code to that seems a bit short sighted; it's probably OK if there is a huge gain, but otherwise, I am highly sceptical.

I also don't understand why there is this limitation on 6 resp. 10 fields -- at least with the approach I suggested, there is absolutely no reason for that, so if you want to have 60 1-bit fields on 64bit archs, you can have them.

@fingolfin
Copy link
Member

Aha, I guess the limitation comes from the desire to have a "builder" function which creates a single "bitfield" given a sequence of field values. I somehow missed this when reviewing the previous version of this PR (or perhaps it was not yet documented?)

This all still feels like it is some highly specialized code, only of interest for relatively low-level work. Is it right really that useful beyond the datastructures package? Perhaps it should be in there?

Perhaps it is useful in general, I just have hard time seeing it; to me, it would be much more attractive if GAP had bit manipulation functions which allow users to XOR, OR, AND, NOT immediate ints (and perhaps do bit test/set/clear). Of course that may be somewhat slower than this highly dedicated code for your applications; but OTOH, it is also vastly more general, and easy to exaplain to most programmers.

@fingolfin
Copy link
Member

For the union find code, builders seem to be only used in constructors (so not time critical), so at least there, a more naive implementation which takes a list of widths (which can be of arbitrary width), and builds up everything from it should be fine. This could be implemented as follows: have a kernel function which takes a list of width, and list of field values (they should be of the same length), and uses them to build the desired immediate values.

Then, in MakeBitfield, create the actual builder on the GAP level, like this pseudo-code:

MakeBitfield := function(arg)
   local bf;
   bf := _kernel_MakeBitfield(arg);
   bf.builder := function(values...) return _kernel_build(bf.widths, values); end
   return bf;
end;

@stevelinton
Copy link
Contributor Author

OK. I'm persuaded to change the data in the function header to mask and offset, and yes, I find myself using builders much less often than I expected (in my avl tree code under development they are used once when a new node is added.

@stevelinton
Copy link
Contributor Author

On the question of generality. Adding all the necessary operations to the language would probably be better, but a lot more work. Also you'd end up wanting to write getter and setter functions for any non-trivial cases anyway and then you're paying GAP function calling overhead for something like

get_subtree_size := word -> word & 0xFFFFFFFFFFFFFF0 >> 4;

whereas with the code I have you essentially get that function directly and it runs is 10-20 ns.

I don't really know how wide an application it will have, but there are plenty of bits of code in the library that use whole words for things that only have a few possible values in large mathematical data structures.

@stevelinton
Copy link
Contributor Author

This version uses the mask and offset setup and gets rid of the limit on the number of fields.

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now -- but it better should be, as it feels I kind of bullied Steve into them :/. Anyway...

My remaining remarks all are unimportant quibbles about spaces and indentions, and could be ignored.

src/intfuncs.c Outdated
static inline void SET_OFFFSET_BITFIELD_FUNC(Obj func, UInt offset)
{
return SET_FEXS_FUNC(func, INTOBJ_INT(offset));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line?

src/intfuncs.c Outdated
{
UInt mask = MASK_BITFIELD_FUNC(self);
UInt offset = OFFSET_BITFIELD_FUNC(self);
if (!ARE_INTOBJS(data, val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the differing indentation here and elsewhere?

(BTW, if you have it installed, you can use git clang-format to automatically format your commits)

src/intfuncs.c Outdated
static Obj FuncBUILD_BITFIELDS(Obj self, Obj args)
{
GAP_ASSERT(IS_PLIST(args));
GAP_ASSERT(LEN_PLIST(args) >= 1 && ELM_PLIST(args,1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space before && ?

src/intfuncs.c Outdated
UInt i;
for (i = nfields; i > 0; i--) {
GAP_ASSERT(ISB_LIST(widths, i));
Obj y = ELM_LIST(widths,i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert space after comma?

src/intfuncs.c Outdated
Obj y = ELM_LIST(widths,i);
GAP_ASSERT(IS_INTOBJ(y));
x <<= INT_INTOBJ(y);
GAP_ASSERT(ELM_PLIST(args, i+1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That GAP_ASSERT is somewhat redundant, as the IS_INTOBJ test later on will simply fail in this case. But of course it doesn't hurt to have here.

src/intfuncs.c Outdated
Obj bgetters = NEW_PLIST(T_PLIST + IMMUTABLE, nfields);
for (UInt i = 1; i <= nfields; i++) {
UInt mask = (1L << starts[i]) - (1L << starts[i - 1]);
Obj s = NewFunctionCT(T_FUNCTION, SIZE_FUNC, "<field setter>", 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just NewFunctionC here and elsewhere (makes it slightly easier to read the code, IMO, but that's perhaps taste)

@@ -0,0 +1,47 @@
gap> START_TEST("bitfields.tst");

# Test correct behaviour for a variety of numbers of fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space before "numbers"?

> Print("Bad return from getter",i," ",j," ",x,"\n");
> fi; od;
> if bf.booleanGetters[1](x) <> (vals[2] = 1) then
> Print("Bad return from Boolean Getter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is "Boolean Getter", below it is "boolean setter" -- why the differing case?

@fingolfin
Copy link
Member

Still would be kind of nice if @markuspf or @ChrisJefferson could have a quick look at this, too.

Bitfields provide a fast (by GAP standards) and tidy way of viewing an immediate integer
as a set of bitfields. Handy mainly for compact data structures.
Copy link
Member

@markuspf markuspf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only minor thing is of course now that one would want a file in dev/Updates to reflect this change, but maybe I should just hit "merge" for now to not drag this on.

@markuspf markuspf merged commit eebc9d6 into gap-system:master Sep 13, 2017
@olexandr-konovalov olexandr-konovalov added the release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes label Jan 22, 2018
@olexandr-konovalov olexandr-konovalov added release notes: added PRs introducing changes that have since been mentioned in the release notes and removed release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes labels Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: added PRs introducing changes that have since been mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants