Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcmalloc forces to 16 byte alignment #433

Closed
alk opened this issue Aug 23, 2015 · 5 comments
Closed

tcmalloc forces to 16 byte alignment #433

alk opened this issue Aug 23, 2015 · 5 comments

Comments

@alk
Copy link
Contributor

alk commented Aug 23, 2015

Originally reported on Google Code with ID 430

Hi,

We have implemented a private heap in our application
to reduce heap management overhead to the minimum. 
The problem is that we allocate a lot of 24 byte objects
and we do not want to waste memory. 

We found out that your gperftools package also avoids the 8 byte malloc 
overhead and we have tested this as an alternative, but discovered 
that it does not reduce the memory like we expected, 
because in 'common.cc' we have the following: 

    int AlignmentForSize(size_t size) {
      int alignment = kAlignment;
      if (size > kMaxSize) {
        // Cap alignment at kPageSize for large sizes.
        alignment = kPageSize;
      } else if (size >= 128) {
        // Space wasted due to alignment is at most 1/8, i.e., 12.5%.
        alignment = (1 << LgFloor(size)) / 8;
      } else if (size >= 16) {
        // We need an alignment of at least 16 bytes to satisfy
        // requirements for some SSE types.
        alignment = 16;
      }
      // Maximum alignment allowed is page size alignment.
      if (alignment > kPageSize) {
        alignment = kPageSize;
      }
      CHECK_CONDITION(size < 16 || alignment >= 16);
      CHECK_CONDITION((alignment & (alignment - 1)) == 0);
      return alignment;
    }

This seems to force us to use 16 byte alignment, what gives in 
our case 32 bytes for each internal 24 bytes allocation (waste of 25%). 

Are we correct that this 16 byte alignment should be used when 
you want to improve the performance when using SSE operations. 
Or is there as well an other reason ?

We googled a bit but we can not find the more info about the 
reason why this was added, we found : 

http://code.google.com/p/gperftools/source/detail?spec=svn60&r=60

We implemented a change, based on gperftools-2.0, so that when 
you compile with the switch '-DTCMALLOC_ALIGN_8BYTES'
we disable the 16 byte alignment and we use 8 byte alignment instead. 

Is this safe to do, or are there consequences that we missed ?

Note that we have not measured performance degradation by using this
patch (perhaps a consequence of running in 32-bit mode [ gcc -m32 ]
on x86_64 architecture), and note that glibc malloc is only providing
8-byte aligned objects by default.

Would you accept to integrate this patch ?

Alternatively would you accept to provide an interface that permits
the caller to specify alignment requirements explicitly ?

The patch we've applied to solve our problem is attached, created on latest 
gperftools release 2.0.
(attachment: gperftools-2.0_8ByteAlignment.patch)


P.S. Most of our code is in Ada so for us the ideal interface would match
what the compiler expects :

    procedure Allocate(
      Storage_Address : out Address;
      Size_In_Storage_Elements : in Storage_Elements.Storage_Count;
      Alignment : in Storage_Elements.Storage_Count) is abstract;
    procedure Deallocate(
      Storage_Address : in Address;
      Size_In_Storage_Elements : in Storage_Elements.Storage_Count;
      Alignment : in Storage_Elements.Storage_Count) is abstract;

This interface leaves the responsibility for determining size and alignment
requirements to the caller, both in case of allocation and de-allocation
(because in many cases the size of the object is static and does not require
storage).


Reported by koen.meersman on 2012-05-14 07:33:42


- _Attachment: [gperftools-2.0_8ByteAlignment.patch](https://storage.googleapis.com/google-code-attachments/gperftools/issue-430/comment-0/gperftools-2.0_8ByteAlignment.patch)_
@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Looks good. Certainly makes sense to minimize internal fragmentation for systems that
don't need the additional alignment. I don't think that we want to provide an API call
for allowing user specified alignment though. The alignment is mandated by the target
platform and not on a case by case basis.

Reported by chappedm on 2012-05-15 13:33:07

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Reported by chappedm on 2012-05-15 13:33:50

  • Status changed: Accepted

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Do you have plans to include this in one or other way in the next release? 


>> I don't think that we want to provide an API call for allowing user specified
>> alignment though. The alignment is mandated by the target platform and
>> not on a case by case basis.

The statement above seems to be based on the assumption that we plan to give the responsibility
for alignment requirements to the programmer, which is not at all our intent. In our
case (Ada technology) the COMPILER has calculated the alignment requirements and has
generated a call to the allocator function specifying both size and alignment requirements.
For a different target platform it will generate (possibly) different values.

To compare with C technology you might imagine that one day gcc implements alignmentof(X),
just like it implements today sizeof(X), such that in this hypothetical future a programmer
could call a function aligned_tcmalloc (sizeof(X), alignmentof(X)) and this way avoid
aligning everything to 8 bytes.

This would be open to abuse, if a programmer hard-codes the second parameter this would
create non-portable code, just like a programmer calling malloc with a hard-coded size
parameter. A fool-proof heap interface is not feasible (without paying for garbage
collection), there are many ways to shoot yourself in the foot. The programmer's mistakes
are a job for valgrind. 

Size and alignment requirements are both "known" to gcc, regardless of the programming
language, as they are needed to properly allocate variables on the stack or components
in a struct. The only difference is that in C, size was made visible via the sizeof()
construct (as otherwise calling malloc would be a nightmare), while alignment was not
made visible (only because it creates less of a nightmare, wasting up to 15 bytes per
object is not a catastrophy).

For a large mission-critical system memory efficiency represents a significant development
cost, and wasting memory represents a significant cost, so the fact that people created
a wasteful design in the past based on "not a catastrophy", is not a reasonable justification
to continue the waste.

We hope the above is a better justfication for adding alignment support to the heap
API.

Reported by koen.meersman on 2012-07-13 07:15:05

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I think your argument has many good points that would justify this API addition. It
should make the next release.

Reported by chappedm on 2012-07-24 16:04:29

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

r175 | chappedm@gmail.com | 2012-11-04 13:15:11 -0500 (Sun, 04 Nov 2012) | 2 lines

issue-430: Introduces 8-byte alignment support for tcmalloc

Reported by chappedm on 2012-11-04 18:16:28

  • Status changed: Fixed

@alk alk closed this as completed Aug 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant