Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-33597: Reduce PyGC_Head size #7043

Merged
merged 19 commits into from
Jul 10, 2018
Merged

bpo-33597: Reduce PyGC_Head size #7043

merged 19 commits into from
Jul 10, 2018

Conversation

methane
Copy link
Member

@methane methane commented May 22, 2018

g->gc.gc_prev = _PyGC_generation0->gc.gc_prev; \
g->gc.gc_prev->gc.gc_next = g; \
_PyGC_generation0->gc.gc_prev = g; \
_PyGCHead_SET_PREV(g, _PyGC_generation0->gc.gc_prev); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems _PyGC_generation0->gc.gc_prev can have the lowest bits set. This will trigger an assertion in _PyGCHead_SET_PREV().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Python objects use lower bits. Link header (PyGC_Head without Python object) must not have flags.
(Surely, I need to add more comments.)

g->gc.gc_next->gc.gc_prev = g->gc.gc_prev; \
assert(g->gc.gc_next != NULL); \
_PyGCHead_PREV(g)->gc.gc_next = g->gc.gc_next; \
_PyGCHead_SET_PREV(g->gc.gc_next, _PyGCHead_PREV(g)); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyGCHead_PREV(g) is called twice. Wouldn't be better to cache the result?

next = gc->gc.gc_next;
if (PyTuple_CheckExact(op)) {
_PyTuple_MaybeUntrack(op);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since gc->gc.gc_next->gc.gc_prev is not real prev pointer, we can't call _PyTuple_MaybeUntrack(op) here.
I moved it to split function. But it means this pull request adds one more link traversal.

By the way, current code is not idiomatic. While most tuples can be untracked, there are still many tuples
which arn't be untracked. For example, __mro__ or __bases__.
In final generation, we check all tuples contents. In applications which triggers final generation GC frequently,
and there are many tuples, it can be significant overhead.

Anyway, I need to find "tracked tuple heavy" application to benchmark before optimize here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If final generation GC is triggered frequently, it means our heuristic is inadequate. Full collections can be very expensive.

@@ -227,20 +267,38 @@ append_objects(PyObject *py_list, PyGC_Head *gc_list)
return 0;
}

#if GC_DEBUG
static void
validate_list(PyGC_Head *head, uintptr_t expected_mask)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou @serhiy-storchaka
How do you think this function? Should I remove these checks and GC_DEBUG flag?

This validation was useful while hacking, and I think it's good document about when MASK flags are set and cleared.
But I can replace them to just a line of comment, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this function. Have you tried to always enable it in debug mode, or is it too slow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enabled it while I'm debugging. It made test significantly slower.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The performance seems acceptable, and reducing the memory footprint can only be a good idea :-)

Py_FatalError("GC object already tracked"); \
_PyGCHead_SET_REFS(g, _PyGC_REFS_REACHABLE); \
assert((g->gc.gc_prev & 6) == 0); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& 6? Why?

g->gc.gc_next->gc.gc_prev = g->gc.gc_prev; \
PyGC_Head *prev = _PyGCHead_PREV(g); \
assert(g->gc.gc_next != NULL); \
prev->gc.gc_next = g->gc.gc_next; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I wonder if the look would less weird if we also had _PyGCHead_NEXT and _PyGCHead_SET_NEXT macros.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skip using _PyGCHead_PREV and _PyGCHead_SET_PREV when flags must not be set. (i.e. list header) too.

g->gc.gc_next = NULL; \
g->gc.gc_prev &= _PyGC_PREV_MASK_FINALIZED; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the same as _PyGCHead_SET_PREV(g, NULL)? The latter seems clearer to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit 1 must be kept after untracking.
But bit 2 and 3 must be cleared when untracking, because this macro may be called while GC
(i.e. weakref callback or tp_finalize).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding a macro setting the pointer and flags at once? _PyGCHead_SET_PREV_FLAGS(g->gc.gc_next, prev, flags).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's not useful. gc_prev is used as:

  • Set prev pointer, without touching any flags. (_PyTrash_deposit_object and _PyObject_GC_TRACK)
  • Set prev pointer to list header node. Flags are not used in list header. (_PyObject_GC_TRACK)
  • Clear prev pointer and flags, but keep only _PyGC_PREV_MASK_FINALIZED. (only here.)
  • Clear gc_prev entirely. = (uintptr_t)0.

union _gc_head *gc_prev;
Py_ssize_t gc_refs;
union _gc_head *gc_next; // NULL means the object is not tracked
uintptr_t gc_prev;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small comment here explaining this field?

@methane
Copy link
Member Author

methane commented Jun 20, 2018

My current idea is change MASK_TENTATIVELY_UNREACHABLE back to negative constants.
Only two bits are used for flags instead.

This was impossible... Remaining ideas are:

  • a. Use tagged pointer only when #if SIZEOF_VOID_P > 4
  • b. Use lowest bit of gc_next for TENTATIVELY_UNREACHABLE

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here is a real review. When I wrote my first LGTM, I didn't read carefully the code, but the code also changed a lot since I reviewed it.

union _gc_head *gc_prev;
Py_ssize_t gc_refs;
union _gc_head *gc_next; // NULL means the object is not tracked
uintptr_t gc_prev; // Pointer to previous object in the list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I read gc_prev, I expect it to only contain a pointer to the previous entry, but it contains two information. Would it be possible to find a better name? (gc_prev might be fine) Maybe "gc_prev_flags"?

Maybe add "_" prefix to "explain" that the field must be accessed through macros? (ex: _gc_prev_flags)

If you rename the field, it makes sure that applications accessing (read or... write!) directly gc_prev will break which is a good thing IMHO.

Can you also please document which flags are stored in this field? Finalized, collecting and tentatively unreachable. Mention maybe that the two last are only used internally and so are not accessible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please document which flags are stored in this field? Finalized, collecting and tentatively unreachable. Mention maybe that the two last are only used internally and so are not accessible?

Finalized is documented below. And I don't want document other two bits in this file
because they are tightly coupled with GC algorithm. I added document in gcmodule.c instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed member name to _gc_prev and _gc_next.
I don't want 3rd party touches them directly.

#define _PyObject_GC_TRACK(o) do { \
PyGC_Head *g = _Py_AS_GC(o); \
if (_PyGCHead_REFS(g) != _PyGC_REFS_UNTRACKED) \
if (g->gc.gc_next != NULL) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: you may take this change as an opportunity to add { ... } (PEP 7) :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Py_ssize_t gc_refs;
union _gc_head *gc_next; // NULL means the object is not tracked
uintptr_t gc_prev; // Pointer to previous object in the list.
// Lowest three bits are used for flags.
} gc;
double dummy; /* force worst-case alignment */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I think that alignment using sizeof(double) comes the C standard. If I'm right, it might help to document it here :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is separated issue, see https://bugs.python.org/issue33589

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, it's not separated issue. Since I this pull request removes last Py_ssize_t member from gc, there are only two pointers (or integer having same size to pointer) in the struct.
It must be aligned 8byte on all 32bit and 64bit platforms. dummy can be removed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed dummy, and stop using union too.

#define _PyGCHead_SET_PREV(g, p) do { \
assert(((uintptr_t)p & ~_PyGC_PREV_MASK) == 0); \
(g)->gc.gc_prev = ((g)->gc.gc_prev & ~_PyGC_PREV_MASK) \
| ((uintptr_t)(p)); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a macro for (g->gc.gc_prev & _PyGC_PREV_MASK_INTERNAL)? Maybe #define _PyGCHead_REFS() or #define _PyGCHead_FLAGS()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth enough...

g->gc.gc_next = NULL; \
g->gc.gc_prev &= _PyGC_PREV_MASK_FINALIZED; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding a macro setting the pointer and flags at once? _PyGCHead_SET_PREV_FLAGS(g->gc.gc_next, prev, flags).

}

static inline void
gc_clear_masks(PyGC_Head *g)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't clear a mask but flags, no? Maybe: gc_clear_internal_flags()?

Maybe rename "internal flags" to "collection flags"? I don't know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I removed MASK_TENTATIVELY_UNREACHABLE from gc_prev, I renamed this to gc_clear_collecting.


/*** list functions ***/

static void
gc_list_init(PyGC_Head *list)
{
list->gc.gc_prev = list;
list->gc.gc_prev = (uintptr_t)list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use a macro here since the field is special. _PyGCHead_SET_PREV_FLAGS(list, 0)?

Ditto below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't mean "clear flags". List header node (gc_list) doesn't have any flags.
So gc_prev is really pure pointer without any flags here.
I'll add one line comment here too.

}

static inline void
gc_set_refs(PyGC_Head *g, Py_ssize_t v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"v" should be "refs" no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

}

static inline void
gc_reset_refs(PyGC_Head *g, Py_ssize_t v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename "v" to "refs"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

|| gc_refs == GC_UNTRACKED);
}
else {
assert(gc_refs > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why flags are no longer tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are no GC_REACHABLE and GC_UNTRACKED now.
Instead, there are "early return" above.

+        if (gc->gc.gc_next == NULL || !gc_is_collecting(gc)) {
+            return 0

gc_refs == GC_REACHABLE meant the object is tracked but not in current gc_list. It is !gc_is_collecting(gc) now.

gc_refs == GC_UNTRACKED meant the object is not tracked. It is gc->gc.gc_next == NULL now.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@methane
Copy link
Member Author

methane commented Jun 22, 2018

Apart from that, I have another worrying.
Old 32bit Windows had /3GB option which make user space 3GB instead of 2GB.
Since each pointer has 4bytes, upper bound of refcnt can be 3 * 2^28, which is larger than 2^29.
It means refcnt << 3 is not safe on such platform.

I found /LARGEADDRESSAWARE linker flag is required to use 3GB user space on Windows.
Since it's not used when building Python, it's safe to trim 3 bits from refcnt.

Now I think keeping two implementation is not worth enough.

@pitrou
Copy link
Member

pitrou commented Jun 22, 2018

I found /LARGEADDRESSAWARE linker flag is required to use 3GB user space on Windows.
Since it's not used when building Python, it's safe to trim 3 bits from refcnt.

It's not only about Windows, other OSes can do it too (perhaps Linux does?).

@methane
Copy link
Member Author

methane commented Jun 22, 2018

You're right... Linux can use 3GB for user virtual address. And it's default for x86.

Then, (a) Use int64_t instead of intpr_t, or (b) keep two implementation...

I think I should try (b).

@methane methane closed this Jun 22, 2018
@methane methane reopened this Jun 22, 2018
@methane
Copy link
Member Author

methane commented Jun 28, 2018

I found UNREACHABLE flag is only needed in move_unreachable() essentially.
So I removed the flag from gc_prev, and use gc_next for it temporary.

Now we require 4bytes aligned pointer, and support 1G-1 refcnt on 32bit (4bytes for each pointer) platform.

Copy link
Member Author

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

union _gc_head *gc_prev;
Py_ssize_t gc_refs;
union _gc_head *gc_next; // NULL means the object is not tracked
uintptr_t gc_prev; // Pointer to previous object in the list.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed member name to _gc_prev and _gc_next.
I don't want 3rd party touches them directly.

Py_ssize_t gc_refs;
union _gc_head *gc_next; // NULL means the object is not tracked
uintptr_t gc_prev; // Pointer to previous object in the list.
// Lowest three bits are used for flags.
} gc;
double dummy; /* force worst-case alignment */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed dummy, and stop using union too.

#define _PyGCHead_SET_PREV(g, p) do { \
assert(((uintptr_t)p & ~_PyGC_PREV_MASK) == 0); \
(g)->gc.gc_prev = ((g)->gc.gc_prev & ~_PyGC_PREV_MASK) \
| ((uintptr_t)(p)); \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth enough...

#define _PyObject_GC_TRACK(o) do { \
PyGC_Head *g = _Py_AS_GC(o); \
if (_PyGCHead_REFS(g) != _PyGC_REFS_UNTRACKED) \
if (g->gc.gc_next != NULL) \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

static inline void
gc_clear_masks(PyGC_Head *g)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I removed MASK_TENTATIVELY_UNREACHABLE from gc_prev, I renamed this to gc_clear_collecting.

}

static inline void
gc_set_refs(PyGC_Head *g, Py_ssize_t v)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

}

static inline void
gc_reset_refs(PyGC_Head *g, Py_ssize_t v)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

(visitproc)visit_reachable,
(void *)young);
gc_set_prev(gc, prev);
gc->gc.gc_prev &= ~MASK_COLLECTING;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment.

}
gc = next;
}
young->gc.gc_prev = (uintptr_t)prev;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

if (gc_get_refs(gc) != 0) {
ret = -1;
}
_PyGCHead_SET_PREV(gc, prev);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubly-linked list is broken and restored in this function. So no need to add it in function description.
+ // Use gc_refs and break gc_prev again. comment is above in this function.

#define _PyGCHead_NEXT(g) ((PyGC_Head*)(g)->_gc_next)
#define _PyGCHead_SET_NEXT(g, p) ((g)->_gc_next = (uintptr_t)(p))

// Lowest two bits of _gc_prev is used for flags described below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below? Do you mean above?

Objects/object.c Outdated
@@ -2121,8 +2122,7 @@ _PyTrash_destroy_chain(void)
PyObject *op = _PyRuntime.gc.trash_delete_later;
destructor dealloc = Py_TYPE(op)->tp_dealloc;

_PyRuntime.gc.trash_delete_later =
(PyObject*) _Py_AS_GC(op)->gc.gc_prev;
_PyRuntime.gc.trash_delete_later = (PyObject*) _Py_AS_GC(op)->_gc_prev;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not _PyGCHead_PREV(op)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Objects/object.c Outdated
@@ -2159,8 +2159,7 @@ _PyTrash_thread_destroy_chain(void)
PyObject *op = tstate->trash_delete_later;
destructor dealloc = Py_TYPE(op)->tp_dealloc;

tstate->trash_delete_later =
(PyObject*) _Py_AS_GC(op)->gc.gc_prev;
tstate->trash_delete_later = (PyObject*) _Py_AS_GC(op)->_gc_prev;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

#define GC_NEXT _PyGCHead_NEXT
#define GC_PREV _PyGCHead_PREV

// Bit 0 of _gc_prev is used for _PyGC_PREV_MASK_FINALIZED in objimpl.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't seem to match the line being commented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

// move_legacy_finalizers() removes this flag instead.
// Between them, unreachable list is not normal list and we can not use
// most gc_list_* functions for it. We should manually tweaking unreachable
// list in these two functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand "We should manually tweaking unreachable list in these two functions".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed last sentence.

I meant we should manually set _gc_prev and _gc_next, like this:

+        // Manually unlink gc from unreachable list because
+        PyGC_Head *prev = GC_PREV(gc);
+        PyGC_Head *next = (PyGC_Head*)(gc->_gc_next & ~NEXT_MASK_UNREACHABLE);
+        assert(prev->_gc_next & NEXT_MASK_UNREACHABLE);
+        assert(next->_gc_next & NEXT_MASK_UNREACHABLE);
+        prev->_gc_next = gc->_gc_next;  // copy NEXT_MASK_UNREACHABLE

@methane methane closed this Jul 6, 2018
@methane methane reopened this Jul 6, 2018
@methane
Copy link
Member Author

methane commented Jul 7, 2018

If there are no more objections, I want to merge this in next week.
We can revert it or add switch to use legacy GC later when we need.

@methane methane merged commit 5ac9e6e into python:master Jul 10, 2018
@methane methane deleted the two-word-gc branch July 18, 2018 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants