Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching of Request Ids for Request Tracker causes OOM on Controller #6664

Closed
pbelgundi opened this issue Mar 28, 2022 · 2 comments · Fixed by #6665
Closed

Caching of Request Ids for Request Tracker causes OOM on Controller #6664

pbelgundi opened this issue Mar 28, 2022 · 2 comments · Fixed by #6665
Assignees
Labels

Comments

@pbelgundi
Copy link
Contributor

Describe the bug
In Controller logs I see the same request for getURI-test100sssFailScopeVwUZG/_MARKstream-325 being tagged multiple times... infact 100s of times which indicates that in a short interval of time the same request is repeated too many times and before the earlier requests can be answered new requests keep coming in.... all these requests are for the MarkStream and most likely invoked by the Watermarking service.

687968806628, 5300827836292785080, 8719272431282860608, -7538067772377576593, 887738531512122747, -6479707821409959817, -7673954003711956361, -7843287761868123805, -7368018701339472680, 4633224097332690045, -7323511335041751699, 7025463663885140196, -5137407570473596366, -5249123331997901278, 2642209689035550778, -4408328145795195295, 2443024723475605495, 4922849640639549426, 4149871622623371498, -6556286854987474107, -8274578765410444296, -7305435419343779418, 4412560459875680833, 2117622605376286706, 8799819366878030200, 9071801182383703500, 5543149021395808389, 563740400927151296, -6459931799334233492, -7152583425866031179, 5907518409908567831, -7550620186579641169, 4664011875410076541, 6455551794110768628, -5532940123484932986, 2421668931542069968, 4124085076871203520, 761039478478262128, 8100150582706962676, -8746781205249885846, 7380971824612816311, 1874158410868117662, -7432997760868954343, 5764325676132165602, -2119040546905532538, -1156473476463703281, 2051443891642291533, -5989036103817683186, -7528992278733835229, 5275212961539916192, -5335901208009968775, -3164550244836994606, 183899367826385619, -6759737619924278341, -3958280133830128233, -9180956035845235042, -584448265310558312, 7771379848917023199, 723919182223462394, 2125213954818853404, -208941099295492671, -1657968524778748169, -596851047937760993, 1861636478018420248, -569292677819944949, 4411664800405997803, 556035124227937037, 8374506671100529111, 2127796140468895589, 828191988336702881, -2038303052841360391, 6913042054615117778, -470008545262691197, 7122531318275976202, -7864567799234352513, -4659257734115101787, -2247637463236475659, 8611325933837361965, 1474903257530332227, -251979719258336101, -4315554996716763698, 1225129516433290349, 2565182981567994845, -8958944668473312830, -8784478068265608688, -5732731468483368200, 8319197205462409911, 4865690852754242062, 3649316445395674919, 5010956325791174194, 1463799615389479444, 1612337382065297291, 2060166507225367709, -469312041477224902, 9171404536743417218, 6159497557697273517, 7969041521109184810, 4364156152602606756, 7980916550595288951, 5520820720651785245, 2270158010357242462, -3209162931053497351, -1613900409786374492, -2971163022066196925, -2915850470219518456, -2854301542137630362, -5913249372225019444, . . .-2414470442516095675]
request ids associated with same descriptor: getURI-test100sss
FailScopeVwUZG/_MARKstream-325/0.#epoch.0. Propagating only first one: -2258451022683104641.

These request descriptors are Cached by Controller in memory, but there is no upper limit on how many request Ids would get stored against the same request descriptor and so in cases like these this list grows very long hogging a lot of memory.
This is also evident from the heap dump which shows about 60-70% of heap being filled with LocalCache Entries and RequestTracker. (See attached file heap-RequestTracker)

The way we do caching for RequestIds needs to be revisited, because clearly in situations like these it seems to unnecessarily occupy a lot of heap memory.

**To Reproduce**
Restart Segment Store several times ~20 times.

@pbelgundi pbelgundi added kind/bug Correctness issue area/controller labels Mar 28, 2022
@RaulGracia
Copy link
Contributor

Yes, this could be a problem. The RequestTracker sets a limit to the number of entries in the cache:

And each cache entry is a request id and a list.
The problem may be that, in the case of a "collision" (i.e., repeated request on the same resource), the decision we took is to just append the new request id to a list of existing ids, and use just the first one:

And there is no upper bound to that list of repeated request ids. We could just keep the "last" repeated request id, so each element in the cache will have just 2 items (the first request id (the one we use for tracking) and the last one (from the last repeated request)). I think that may be enough to keep a similar behavior to the exiting one and, at the same time, limit the memory usage of that cache.

@pbelgundi pbelgundi changed the title Caching of Request Ids for Request Tracker causes OOM Caching of Request Ids for Request Tracker causes OOM on Controller Mar 28, 2022
@RaulGracia RaulGracia self-assigned this Mar 28, 2022
@pbelgundi
Copy link
Contributor Author

If the objective here is to have a different request Id per client request I don't see why we return only the first requestId always in the getRequestForTag() method:

log.debug("{} request ids associated with same descriptor: {}. Propagating only first one: {}.",
                      descriptorIds, requestDescriptor, requestId);

and if the first request Id is the only Id used to track the request, other Ids in the Cache seem to be redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants