Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup headers #51

Open
fsaintjacques opened this issue Sep 2, 2016 · 25 comments
Open

Cleanup headers #51

fsaintjacques opened this issue Sep 2, 2016 · 25 comments
Labels

Comments

@fsaintjacques
Copy link
Member

fsaintjacques commented Sep 2, 2016

I'd give a little love to the headers:

  • separate concerns (basic operations, advanced operations, serializations) in multiple headers, such the user can only import what he needs.
  • hide definitions, does the user really need the full definition? We could probably hide in a forward declaration à la struct roaring_bitmap;
  • Remove unnecessary public that should really be private stuff.
@lemire
Copy link
Member

lemire commented Sep 2, 2016

Sure, but please wait a bit since I am doing some reengineering and if my work is fruitful, we could end up with a massive conflict from hell.

@fsaintjacques
Copy link
Member Author

I am aware of the nasty conflicts it'll ensue and will wait for your work.

@lemire
Copy link
Member

lemire commented Sep 2, 2016

I'll know soon whether my idea helps or not... I am doing as fast as I can but it involves lots of code, sadly.

@lemire
Copy link
Member

lemire commented Sep 2, 2016

Some things to take into account...

  1. We'd like to keep the unity build in working order.
  2. The various wrappers,
    https://github.com/Ezibenroc/PyRoaringBitMap
    https://github.com/saulius/croaring-rs
    https://github.com/RoaringBitmap/gocroaring

should be preserved in working order as well.

@lemire
Copy link
Member

lemire commented Sep 3, 2016

Ok. I am done. Go ahead!

@lemire
Copy link
Member

lemire commented Sep 5, 2016

Do you have any idea when you'd like to work on this? We have this other issue (#47) which would require touching quite a bit of code. I'd like to avoid conflicts... so if you want to clean the headers soon, then we will delay work on issue 47 till you are done, otherwise, we may proceed with issue 47 first.

@fsaintjacques
Copy link
Member Author

I'll try to tackle it today. I will do this in a separate branch. It is worth noting that it will probably break bindings?

@fsaintjacques
Copy link
Member Author

Should I take the opportunity to change the signature of some functions, e.g. const parameters usually preceed non-const void roaring_bitmap_andnot_inplace(roaring_bitmap_t *x1, const roaring_bitmap_t *x2); would become void roaring_bitmap_andnot_inplace(const roaring_bitmap_t *x1, roaring_bitmap_t *x2);.

@lemire
Copy link
Member

lemire commented Sep 6, 2016

It is worth noting that it will probably break bindings?

Hopefully, it won't break the unity build.

It is probably best to aim to break as few things as possible, as a general rule, right?

separate concerns (basic operations, advanced operations, serializations) in multiple headers, such the user can only import what he needs.

That's a good idea but this does not prevent you from having a roaring.h file that offers the same API as it does now, just have it include all the subheaders so that users who do not need separation of concern can maintain the current convenience of having a single include.

hide definitions, does the user really need the full definition? We could probably hide in a forward declaration à la struct roaring_bitmap;

The inline definitions are a matter of performance. We had major performance issues due to lack of inlining. I did not move the definitions into header files for amusement.

 Remove unnecessary public that should really be private stuff.

Moving headers away from view is great. We should ensure that it does not come at the expense of performance. Being able to inline definitions is really important in some instances.

Should I take the opportunity to change the signature of some functions, e.g. const parameters usually preceed non-const

I have to disagree on this point. When we do in-place operations, the first parameter is usually the mutable one.

Here is the equivalent using operators:

x &= y

See how the x occurs before the y? I am sure that there are languages that read backward, but we are in C.

(C does not have a "andnot" operator unlike, say, Go, but it is not important.)

Here is how you'd implement it in C++:

type operator&(type & lhs, const type& rhs) {
   return lhs &= rhs;
}

See how the const parameter appears last?

@lemire
Copy link
Member

lemire commented Sep 6, 2016

Another wrapper (C#): https://github.com/RogueException/CRoaring.Net

@fsaintjacques
Copy link
Member Author

Aren't inlining concerns fixed by proper LTO support? This small peculiarity forces some private header to be public.

I've come to understand that for a proper C library, parameter order should be: input, input-output, output. This is a matter of opinion.

I'm looking ahead, and this feels like wasted time. I'm just gonna drop this issue.

@lemire
Copy link
Member

lemire commented Sep 6, 2016

Aren't inlining concerns fixed by proper LTO support?

Good point. I think that this assumes that library users will rely on LTO. Otherwise, they will simply get subpar performance.

I've come to understand that for a proper C library, parameter order should be: input, input-output, output. This is a matter of opinion.

When you copy something you go

`destination = source;``

So it makes sense that we have

copy(destination, source);

Say you have a hash table and you insert something in it. Intuitively, you'd write

set[key] = value

And thus, I think you'd write
insert(hash_table *table, const int key, const int value )

I could be wrong, of course... that's just my intuition.

(You have just inspired a blog post.)

I'm looking ahead, and this feels like wasted time. I'm just gonna drop this issue.

AH!!!! I hope it is not because of the points I raised.

I will take note that we are delaying this issue.

@lemire
Copy link
Member

lemire commented Sep 6, 2016

Function signature: how do you order parameters? http://lemire.me/blog/2016/09/06/function-signature-how-do-you-order-parameters/

@lemire lemire added the later label Sep 6, 2016
@lemire
Copy link
Member

lemire commented Sep 9, 2016

I did some minor cleaning... in commit 4fda6ba but it lead to a significant performance degradation (yet I just renamed the defines). I think that my experience illustrates the need to be careful.

@lemire
Copy link
Member

lemire commented Sep 9, 2016

I will painfully undo the commit.

@lemire
Copy link
Member

lemire commented Sep 9, 2016

I have undone my cleaning work. I have read and re-read the cleaned code and I do not see where it could have changed the generated code, but obviously, it did degrade the binaries quite a bit.

@nkurz
Copy link
Collaborator

nkurz commented Sep 9, 2016

Which machine are you testing this on? I did upgrade the default compiler on Skylake (and maybe Haswell), and it's possible that might have effects on performance.

More generally, I think this is a reminder that performance in a high level language is fragile, and at the whim of the compiler. If we want robust performance, we probably need some way to lock things in once we have what we want.

--nate

@lemire
Copy link
Member

lemire commented Sep 9, 2016

@nkurz Nah. It is not a compiler issue.

With the magic of git, I can move back and forth in time and see exactly at which point the performance went down (using the same compiler and machine).

More generally, I think this is a reminder that performance in a high level language is fragile, and at the whim of the compiler. If we want robust performance, we probably need some way to lock things in once we have what we want.

My intention was just to rename the enums and the defines so that they are all prefixed by "ROARING_". This should not impact the generated binaries in the least. But it did something bad.

@lemire
Copy link
Member

lemire commented Sep 9, 2016

Here is the benchmark I use...

https://github.com/RoaringBitmap/CBitmapCompetition#results

(I published it so one could refer to it in the future.)

@lemire
Copy link
Member

lemire commented Sep 9, 2016

@nkurz There is one ill-effect from the compiler upgrade however... The sanitize mode is now broken. I doubt it is your doing per se, but I think it is related to the new compiler.

$ mkdir debug
dlemire@skylake:~/CVS/github/CRoaring$ cmake -DCMAKE_BUILD_TYPE=Debug -DSANITIZE=ON ..
CMake Error: The source directory "/home/dlemire/CVS/github" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
dlemire@skylake:~/CVS/github/CRoaring$ cd debug/
dlemire@skylake:~/CVS/github/CRoaring/debug$ cmake -DCMAKE_BUILD_TYPE=Debug -DSANITIZE=ON ..
-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- BENCHMARK_DATA_DIR: /home/dlemire/CVS/github/CRoaring/benchmarks/realdata/
-- TEST_DATA_DIR: /home/dlemire/CVS/github/CRoaring/tests/testdata/
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- CMAKE_BUILD_TYPE: Debug
-- DISABLE_AVX: OFF
-- BUILD_STATIC: OFF
-- BUILD_LTO: OFF
-- SANITIZE: ON
-- CMAKE_C_COMPILER: /usr/bin/cc
-- CMAKE_C_FLAGS: -std=c11 -fPIC -march=native  -Wall -Winline -Wshadow -Wextra -pedantic -fsanitize=address -fno-omit-frame-pointer -fsanitize=undefined
-- CMAKE_C_FLAGS_DEBUG: -ggdb
-- CMAKE_C_FLAGS_RELEASE: -O3
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dlemire/CVS/github/CRoaring/debug
dlemire@skylake:~/CVS/github/CRoaring/debug$ make
Scanning dependencies of target roaring
[  3%] Building C object src/CMakeFiles/roaring.dir/array_util.c.o
[  6%] Building C object src/CMakeFiles/roaring.dir/bitset_util.c.o
[  9%] Building C object src/CMakeFiles/roaring.dir/containers/array.c.o
[ 12%] Building C object src/CMakeFiles/roaring.dir/containers/bitset.c.o
[ 16%] Building C object src/CMakeFiles/roaring.dir/containers/containers.c.o
[ 19%] Building C object src/CMakeFiles/roaring.dir/containers/convert.c.o
[ 22%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_intersection.c.o
[ 25%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_union.c.o
[ 29%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_equal.c.o
[ 32%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_negation.c.o
[ 35%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_xor.c.o
[ 38%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_andnot.c.o
[ 41%] Building C object src/CMakeFiles/roaring.dir/containers/run.c.o
[ 45%] Building C object src/CMakeFiles/roaring.dir/roaring.c.o
[ 48%] Building C object src/CMakeFiles/roaring.dir/roaring_priority_queue.c.o
[ 51%] Building C object src/CMakeFiles/roaring.dir/roaring_array.c.o
Linking C shared library ../libroaring.so
/usr/bin/ld: unrecognized option '--push-state'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
make[2]: *** [libroaring.so] Error 1
make[1]: *** [src/CMakeFiles/roaring.dir/all] Error 2
make: *** [all] Error 2

@lemire
Copy link
Member

lemire commented Sep 9, 2016

@nkurz (Thankfully I can run the tests with the sanitizer on other machines so it is not bad.)

@nkurz
Copy link
Collaborator

nkurz commented Sep 10, 2016

With the magic of git, I can move back and forth in time and see exactly
at which point the performance went down (using the same compiler and
machine).

Yes, I didn't know if you'd tested the new version with the old compiler to be sure, and vice versa.

It is not a compiler issue.
...
My intention was just to rename the enums and the defines so that
they are all prefixed by "ROARING_". This should not impact the
generated binaries in the least. But it did something bad.

When you rename a variable and the performance drops, I tend to think of that as a compiler issue.

There is one ill-effect from the compiler upgrade however...
The sanitize mode is now broken. I doubt it is your doing per se,
but I think it is related to the new compiler.

Seems like Ubuntu's doing rather than mine, but I think I patched it with d5a081a. Separately, I noticed that the "make test" has failed tests on Skylake, but I presume you are already aware of this.

@lemire
Copy link
Member

lemire commented Sep 10, 2016

Yes, I was working on fixing a minor issue.

@andreigudkov
Copy link
Member

  1. Header files which are not part of public interface (cannot be reached from roaring.h) should be moved from include/ into src/ directory. Where possible, public interface should be reduced.
  2. Public interface should not depend on USEAVX and other similar definitions. For example, currently I see that run.h is referenced by roaring.h (roaring.h -> roaring_array.h -> containers.h -> run.h) and its body depends on USEAVX. Simple prerelase test: grep -R USEAVX include/ >/dev/null && echo "error".

@lemire
Copy link
Member

lemire commented Nov 9, 2018

@andreigudkov PR invited!!!

But care is needed because we have an outstanding unmerged PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants