Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object.h: migrate alloc_states to mem-pool #857

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adlternative
Copy link

@adlternative adlternative commented Jan 30, 2021

Notice that "mem-pool" api may have similar effort with alloc_state,
"parsed_object_pool" have five member with alloc_state type,
and "TODO" usage in "object.h":"migrate alloc_states to mem-pool?",
so let us change it to mem-pool version.

After I learned the role of the memory pool,I think in the future git may be
more inclined to use the memory pool instead of the old interface "alloc_state".

Thanks.

cc: Junio C Hamano gitster@pobox.com
cc: Stefan Beller sbeller@google.com

cc: René Scharfe l.s.r@web.de

@adlternative
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 30, 2021

Submitted as pull.857.git.1612011569489.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-857/adlternative/alloc_states_to_mem_pool-v1

To fetch this version to local tag pr-857/adlternative/alloc_states_to_mem_pool-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-857/adlternative/alloc_states_to_mem_pool-v1

@adlternative adlternative changed the title alloc.h|c: migrate alloc_states to mem-pool object.h: migrate alloc_states to mem-pool Feb 1, 2021
@adlternative
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 1, 2021

Submitted as pull.857.v2.git.1612175966786.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-857/adlternative/alloc_states_to_mem_pool-v2

To fetch this version to local tag pr-857/adlternative/alloc_states_to_mem_pool-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-857/adlternative/alloc_states_to_mem_pool-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 1, 2021

On the Git mailing list, René Scharfe wrote (reply to this):

Am 01.02.21 um 11:39 schrieb 阿德烈 via GitGitGadget:
> From: ZheNing Hu <adlternative@gmail.com>
>
> "alloc_state" may have similar effects with "mem_pool".
> Using the new memory pool API may be more beneficial
> to our memory management in the future.

Replacing the custom object allocator with mem-pool would allow reducing
the code size.  What other effects might it have?  Do you expect changes
in memory use and/or performance with the current code and your patch?

> functions "alloc_*_node" now change to "mem_pool_alloc_*_node".

Why rename these functions?  Do callers need to care about the
underlying allocator?  The function signatures stay the same.  In any
case, this renaming would be easier to review if it was moved to a
separate patch.

> At the same time ,I add the member `alloc_count` of
> struct mem_pool ,so that we can effective track
> node alloc count,and adapt to the original interface `alloc_report`.

This function has no callers.  Why not remove it (in a separate patch)?

> diff --git a/alloc.c b/alloc.c
> index 957a0af3626..951ef3e4ed7 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -71,30 +71,30 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  	return ret;
>  }

This keeps the now unused function alloc_node(), which breaks the build
with -Werror.

allocate_alloc_state() and clear_alloc_state() become unused as well,
but the compiler doesn't complain because those functions are
exported.  Nevertheless this patch should remove them, no?

> diff --git a/object.h b/object.h
> index 59daadce214..43031d8dc04 100644
> --- a/object.h
> +++ b/object.h
> @@ -10,11 +10,11 @@ struct parsed_object_pool {
>  	int nr_objs, obj_hash_size;
>
>  	/* TODO: migrate alloc_states to mem-pool? */

This comment becomes stale with this patch and should be removed at
the same time.

> -	struct alloc_state *blob_state;
> -	struct alloc_state *tree_state;
> -	struct alloc_state *commit_state;
> -	struct alloc_state *tag_state;
> -	struct alloc_state *object_state;
> +	struct mem_pool *blob_pool;
> +	struct mem_pool *tree_pool;
> +	struct mem_pool *commit_pool;
> +	struct mem_pool *tag_pool;
> +	struct mem_pool *object_pool;

Why have pointers here instead of the structs themselves?  It's not like
a struct parsed_object_pool is of much use without them, right?

The same question applies to the original code as well, of course.

René

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 1, 2021

User René Scharfe <l.s.r@web.de> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 1, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> "alloc_state" may have similar effects with "mem_pool".

What "similar effects" do you have in mind?  "mem_pool" may have
more than one "effects" to multiple things that are affected, but it
is unclear which effect that "mem_pool" exerts on what you are
referring to.

> Using the new memory pool API may be more beneficial
> to our memory management in the future.

Many things may or may not be "beneficial" in the future.  We do not
build things on a vague "hunch".

Are you seeking performance (e.g.  number of objects that can be
allocated per minute)?  Are you seeking better memory locality
(e.g. related objects are likely to be stored in the same page,
reducing number of page faults)?  Are you seeking reduced wasted
memory (e.g. custom allocator packs objects better than bog-standard
malloc(3))?  Are you seeking functionality (e.g. you have this and
that specific codepaths and usecase where you wish to be able to
release all the objects instantiated for a particular repository,
without having to go through the list of all objects, and use of
mempool is one way to allow us do so)?

It is not even clear in your problem description what kind of
benefit you are seeking, let alone how much quantitative improvement
you are getting with this change.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 2, 2021

On the Git mailing list, 胡哲宁 wrote (reply to this):

To Junio:
Thanks for checking.forget my unprofessional
description.Macroscopically speaking,
both alloc_state and mem_pool are doing one thing:Apply for a
large block of memory in advance,and when needed a dynamically
allocated memory ,we call the interface function to apply for memory,
This can reduce the overhead of calling malloc multiple times.And the
mem-pool or alloc_state will Automatic Expand capacity.

So that ,my this patch may have something not considered,
>     mem_pool_init(o->blob_pool,0);
may be a wrong way to init this mem-pool
because:
>void mem_pool_init(...)
>       ...
>       if (initial_size > 0)
>              mem_pool_alloc_block(pool, initial_size, NULL);
the first time calloc malloc may  decay to we first call "mem_pool_alloc_block",
I think this may not be great.

A little ashamed,I did not consider the optimization point of this
at the beginning,


Junio C Hamano <gitster@pobox.com> 于2021年2月2日周二 上午1:56写道:
>
> "阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > "alloc_state" may have similar effects with "mem_pool".
>
> What "similar effects" do you have in mind?  "mem_pool" may have
> more than one "effects" to multiple things that are affected, but it
> is unclear which effect that "mem_pool" exerts on what you are
> referring to.
>
> > Using the new memory pool API may be more beneficial
> > to our memory management in the future.
>
> Many things may or may not be "beneficial" in the future.  We do not
> build things on a vague "hunch".
>
Now,I make a rough comparison.
Situation : when we just use it to malloc little struct node
such as `object`,`blob` ,`tree`.
> Are you seeking performance (e.g.  number of objects that can be
> allocated per minute)?

1.  performance.
`mem_pool` api will allocate 2^20 byte everytime ,
`alloc_state` api will allocate 1024*nodesize byte and ALLOC_GROW everytime.
A repo like git may call malloc fewer times when using `mem_pool`,
while a small repo may not have this amount of objects. The number of
calling `malloc`
may be similar.
may be `mem_pool` win a little...
>Are you seeking better memory locality
> (e.g. related objects are likely to be stored in the same page,
> reducing number of page faults)?
2. page faults .
I might think they are similar at first.But now,I start to understand
what you mean:`alloc_state` more like an object pool,so that we could
go through the list of all objects.Therefore, mem_pool is not conducive
to continuous access to all objects.Because There may be fragments
in the memory And this must be a cross-page operation.
so `alloc_state` win.
>Are you seeking reduced wasted
> memory (e.g. custom allocator packs objects better than bog-standard
> malloc(3))?
3.Memory utilization.
`alloc_state`win.No doubt.
 Are you seeking functionality (e.g. you have this and
> that specific codepaths and usecase where you wish to be able to
> release all the objects instantiated for a particular repository,
> without having to go through the list of all objects, and use of
> mempool is one way to allow us do so)?
>
4.functionality
yeah,As mentioned above.Object pool will be better.
`alloc_state`win.
5.
Indeed, the object pool `alloc_state` may be better than the
memory pool `mem_pool`.
But We can assume that the original author’s intention may be to
The five alloc_states are merged together.
Because the original author said: "migrate alloc_states to mem-pool"
Or another advantage of using the memory pool is that it can dynamically
allocate a variety of different objects, I now think the original author has
this intention.So my patch code also needs some modifications.But at the
same time, it may not be good to count them separately if multiple objects
are allocated using the memory pool at the same time.
so 'mem_pool' win a little.
> It is not even clear in your problem description what kind of
> benefit you are seeking, let alone how much quantitative improvement
> you are getting with this change.
>
I don't know how to quantify them temporarily.
I may need the opinions of you and the original author before I can move on.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 2, 2021

On the Git mailing list, 胡哲宁 wrote (reply to this):

To René Scharfe:
Thanks for checking in.

René Scharfe <l.s.r@web.de> 于2021年2月2日周二 上午1:55写道:
>
> Am 01.02.21 um 11:39 schrieb 阿德烈 via GitGitGadget:
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > "alloc_state" may have similar effects with "mem_pool".
> > Using the new memory pool API may be more beneficial
> > to our memory management in the future.
>
> Replacing the custom object allocator with mem-pool would allow reducing
> the code size.  What other effects might it have?  Do you expect changes
> in memory use and/or performance with the current code and your patch?
>
> > functions "alloc_*_node" now change to "mem_pool_alloc_*_node".
>
> Why rename these functions?  Do callers need to care about the
> underlying allocator?  The function signatures stay the same.  In any
> case, this renaming would be easier to review if it was moved to a
> separate patch.
>
Truly,I will change it.
> > At the same time ,I add the member `alloc_count` of
> > struct mem_pool ,so that we can effective track
> > node alloc count,and adapt to the original interface `alloc_report`.
>
> This function has no callers.  Why not remove it (in a separate patch)?
>
Before I may have some confuse about choosing `alloc_state`or`mem_pool`,
so It has not been deleted yet.I remember that now.
> > diff --git a/alloc.c b/alloc.c
> > index 957a0af3626..951ef3e4ed7 100644
> > --- a/alloc.c
> > +++ b/alloc.c
> > @@ -71,30 +71,30 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
> >       return ret;
> >  }
>
> This keeps the now unused function alloc_node(), which breaks the build
> with -Werror.
>
> allocate_alloc_state() and clear_alloc_state() become unused as well,
> but the compiler doesn't complain because those functions are
> exported.  Nevertheless this patch should remove them, no?
>
> > diff --git a/object.h b/object.h
> > index 59daadce214..43031d8dc04 100644
> > --- a/object.h
> > +++ b/object.h
> > @@ -10,11 +10,11 @@ struct parsed_object_pool {
> >       int nr_objs, obj_hash_size;
> >
> >       /* TODO: migrate alloc_states to mem-pool? */
>
> This comment becomes stale with this patch and should be removed at
> the same time.
>
OK.
> > -     struct alloc_state *blob_state;
> > -     struct alloc_state *tree_state;
> > -     struct alloc_state *commit_state;
> > -     struct alloc_state *tag_state;
> > -     struct alloc_state *object_state;
> > +     struct mem_pool *blob_pool;
> > +     struct mem_pool *tree_pool;
> > +     struct mem_pool *commit_pool;
> > +     struct mem_pool *tag_pool;
> > +     struct mem_pool *object_pool;
>
> Why have pointers here instead of the structs themselves?  It's not like
> a struct parsed_object_pool is of much use without them, right?
>
> The same question applies to the original code as well, of course.
Here I may have some questions: why use `struct mem_pool` instead of
using `struct mem_pool *`?
I hope you can answer my doubts, thank you!
>
> René

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 2, 2021

On the Git mailing list, René Scharfe. wrote (reply to this):

Am 02.02.21 um 14:12 schrieb 胡哲宁:
> To René Scharfe:
>>> -     struct alloc_state *blob_state;
>>> -     struct alloc_state *tree_state;
>>> -     struct alloc_state *commit_state;
>>> -     struct alloc_state *tag_state;
>>> -     struct alloc_state *object_state;
>>> +     struct mem_pool *blob_pool;
>>> +     struct mem_pool *tree_pool;
>>> +     struct mem_pool *commit_pool;
>>> +     struct mem_pool *tag_pool;
>>> +     struct mem_pool *object_pool;
>>
>> Why have pointers here instead of the structs themselves?  It's not like
>> a struct parsed_object_pool is of much use without them, right?
>>
>> The same question applies to the original code as well, of course.
> Here I may have some questions: why use `struct mem_pool` instead of
> using `struct mem_pool *`?
> I hope you can answer my doubts, thank you!

If struct parsed_object_pool contains pointers to five instances of
struct alloc_state or struct mem_pool then you have to allocate and
eventually release those instances explicitly.  Your patch introduced
mem_pool_new() for the allocation part.

If the five instances are embedded in struct parsed_object_pool then
you don't need to do that.

The indirection added by allocating explicitly and using pointers
would be beneficial if some of five instances were optional, as you
could skip their allocation and save some memory -- but you need
them all to get a usable struct parsed_object_pool.

René

In "object.h",I use `mem_pool` to replace the five
"alloc_state *" which used to allocate memory for
different types of object nodes.Now that one `mem_pool`
unified management of the allocation and release of
the memory of multiple nodes.
The advantage of changing to a `mem_pool` is:
1.we can allocate more memory at a time than the original
`alloc_state`,thus reducing the number of system calls to malloc.
2.the user interface is take memory in a memory pool,We no longer
need to take from multiple pools as before.

At the same time, mem_pool may have its disadvantages:
1. There is memory fragmentation
2. It is not conducive to counting or traversing
different types of nodes.
This may require some refinement or disregard.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
@adlternative
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 4, 2021

Submitted as pull.857.v3.git.1612434793195.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-857/adlternative/alloc_states_to_mem_pool-v3

To fetch this version to local tag pr-857/adlternative/alloc_states_to_mem_pool-v3:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-857/adlternative/alloc_states_to_mem_pool-v3

@lyrieek
Copy link

lyrieek commented Feb 7, 2021

/allow

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 7, 2021

Error: User lyrieek is not permitted to use GitGitGadget

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants