-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the number or threads used in the brick process #475
Comments
I am also planning to resolve the same, we need to load xlator only while user enable otherwise we should not load it in graph. |
I think we should apply the same point of view we are using for the unified caching framework and the transaction framework. We should provide a framework able to manage a thread pool and delegate all/most of the asynchronous tasks to it instead of creating a single thread (or even a small thread pool in some cases) for each specific requirement. Moving to a centralized thread pool approach also has some benefits:
|
Threads per brick instance: timer thread = 1 Credits - @mohit84 |
How is this data collected ? I think we should be collecting this data with all brick side options enabled which are used. |
These threads are created by a brick process with default volume option. Some of the threads are common like poller threads shared by all bricks instances. |
While the threads output did capture threads by changelog and bitrot when it's not enabled, it missed two more threads that get created when it's enabled. glusterclogro ----> (1 per brick) --> changelog_rollover |
The problems here are:
With thread pool, we can reduce the thread count. We really need the global thread pool, and submitting tasks approach for gluster, so that all the gluster processes benefit from it. But the memory footprint may be still higher than out expectations, for 100s of graphs. And graph activate and deactivate leaks memory and has so many corner conditions to be handled. Hence, proposing that for brick multiplexing use case alone, we work with single graph and modify posix to handle multiple bricks. This can work as we attach bricks of similar graph in brick multiplexing. The cons of this approach:
Pros:
Thought? |
That's a good approach - practical and takes a lot less effort 👍 |
Thinking about it, this needs changes in 'management' layer, but we can make sure all these volumes have their export directory mounted inside an exported path in glusterfs, and treat each different volume as a subdir mount. So, clients only get access to what they need, and there won't be any self-heald service problems, and for matrix also, it is directory level (at least df -h works per directory). A good thought to think towards. @raghavendra-talur @JohnStrunk @ShyamsundarR this is similar to how we have subdir mount, but to provide snapshots, we have LVs exported as subdirs in a larger gluster export ? A good topic to discuss in new repo called GCS, IMO. This discussion is taking this out of Gluster's scope altogether, and making it a provisioning and CSI goal. |
Tagging @jdarcy as well on this, for insights. Commentary provided is not to dissuade the thought, but to voice possible issues that may need handling. Some of the concerns/issues to address is GFID name space collusion across the various bricks and dealing with the root inode for each (which will collide). Others are features that are at a volume level and how these can adapt (geo-rep (changelog), quota, index) to such a scheme.
I ask the above as initial proposal deals with handling this at POSIX, and later we are stating we will mount each volume export inside an exported path.
Generic thoughts on existing model: Having a thread pool and a means of submitting jobs to it, can help in thread counts and better framework for future expansion as well. (as discussed above) Adding sub-brick graphs, takes away the constraints of xlator development to address possibility of multiple bricks by the same instance of the xlator. Brick attach/detach and all problems thereof are IMO only postponed when folding this into POSIX. For example, when a brick is detached, when are its inodes in the brick process evicted? Currently the entire inode table is evicted, and all inodes released/forgotten when a brick is detached, thus providing an easier cleanup experience. I think we need to detail the current approach out, and in the meantime address existing concerns as well. |
A patch https://review.gluster.org/20761 has been posted that references this issue. Posix: Instead of using one thread for janitor, use synctask With brick mux, the number of threads increases as the number of Updates #475 |
A patch https://review.gluster.org/20859 has been posted that references this issue. io-threads: Associate io-threads with glusterfs_ctx_t Updates #475 |
I think the following suggestion brings up major issues as highlighted by Shyam and is not a good approach. |
A patch https://review.gluster.org/21116 has been posted that references this issue. Posix: Instead of using a thread for fsyncer, use synctask With brick mux, the number of threads increases as the number of afr_changelog_fsync is the only code that currently uses batch-fsync. Updates #475 |
Should this be a feature at all? Or a bug to be precise? If we need to continue here, @ShyamsundarR @xhernandez please confirm if commit messages are sufficient for providing SpecApproved ? I see that a user wouldn't even need to know about it, or see any changes in documentation in this issue, and hence providing DocApproved! |
I see these as major enhancements, that are better documented than left in commit messages. In this case I would prefer that we add some documentation in tree (i.e under developer-guide/features) that explain out the various changes and the current model of reducing the thread counts. |
A patch https://review.gluster.org/21688 has been posted that references this issue. doc: Add brick mux thread reduction design doc in developer guide Change-Id: I5309efb5826d4385eadea29c1cf973f504eef9e5 |
@ShyamsundarR , @amarts have submitted design doc @ https://review.gluster.org/#/c/glusterfs/+/21688 |
A patch https://review.gluster.org/21688 has been posted that references this issue. doc: Add brick mux thread reduction design doc in developer guide Change-Id: I5309efb5826d4385eadea29c1cf973f504eef9e5 |
Change-Id: I5309efb5826d4385eadea29c1cf973f504eef9e5 Updates: #475 Signed-off-by: Poornima G <pgurusid@redhat.com>
With brick mux, the number of threads increases as the number of bricks increases. As an initiative to reduce the number of threads in brick mux scenario, replacing janitor thread to use synctask infra. Now close() and closedir() handle by separate janitor thread which is linked with glusterfs_ctx. Updates #475 Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73 Signed-off-by: Poornima G <pgurusid@redhat.com>
A patch https://review.gluster.org/21896 has been posted that references this issue. feature/changelog: Avoid thread creation if xlator is not enabled Problem: Avoid thread creation for changelog Solution: Before thread creation check the flag if feature Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a |
A patch https://review.gluster.org/21941 has been posted that references this issue. mpx-restart-crash.t: add details on memory and threads consumed Change-Id: Ie051ed246493ef290be1d781e4f326bc5384d4e5 |
A patch https://review.gluster.org/21943 has been posted that references this issue. feature/bitrot: Avoid thread creation if xlator is not enabled Problem: Avoid thread creation for bitrot-stub Solution: Before thread creation check the flag if feature Updates: #475 |
A patch https://review.gluster.org/21944 has been posted that references this issue. Posix: Instead of using a thread for fsyncer, use synctask With brick mux, the number of threads increases as the number of afr_changelog_fsync is the only code that currently uses batch-fsync. Updates #475 |
A patch https://review.gluster.org/21945 has been posted that references this issue. io-threads: Associate io-threads with glusterfs_ctx_t io-threads are currently associated with xlator, every brick Note that, this patch reduces the performance of brick mux use Updates: #475 |
A patch https://review.gluster.org/21942 has been posted that references this issue. feature/changelog: Avoid thread creation if xlator is not enabled Problem: Avoid thread creation for changelog Solution: Before thread creation check the flag if feature Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a |
A patch https://review.gluster.org/21958 has been posted that references this issue. bitrot|changelog: Avoid thread creation if xlator is not enabled Problem: Avoid thread creation for bitrot-stub and changelog Solution: Before thread creation check the flag if feature Change-Id: I944465079fcd038736eb021ad45a323cca5f8a64 |
Change-Id: Ie051ed246493ef290be1d781e4f326bc5384d4e5 updates: gluster#475
Problem: Avoid thread creation for changelog for a volume if feature is not enabled Solution: Before thread creation check the flag if feature is enabled Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a Updates: gluster#475 Signed-off-b: Mohit Agrawal <moagrawal@redhat.com>
Problem: Avoid thread creation for bitrot-stub for a volume if feature is not enabled Solution: Before thread creation check the flag if feature is enabled Updates: gluster#475 Change-Id: I2c6cc35bba142d4b418cc986ada588e558512c8e Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
With brick mux, the number of threads increases as the number of bricks increases. As an initiative to reduce the number of threads in brick mux scenario, replcing fsyncer thread to use synctask infra. afr_changelog_fsync is the only code that currently uses batch-fsync. At the posix_fsync, the fsync fop is handed over to the fsyncer thread if batch-fsync-mode is set(default set) and xdata contains batch-fsync. Even when the batch_fsync_delay_usec is 0, the fsync is handed to the fsyncer thread, so the main thread is not blocked on fsync. With this patch this behaviour is changed. If the batch_fsync_delay_usec is 0, there is no need to hand over fsync to separate thread, as fsync is executed in io-threads. If batch_fsync_delay_usec > 0 then the batch fsync happens on a timely basis in synctask thread. Updates gluster#475 Change-Id: I560ab1f3d690f353130cf46cbefec55fa8dfd3b4 Signed-off-by: Poornima G <pgurusid@redhat.com>
io-threads are currently associated with xlator, every brick graph has one instance of io-threads. Everytime a brick graph is attached in brick mux use case, new set of io-threads are spawned. At scale, this becomes overhead. It is good to associate resources like io-threads with glusterfs_ctx which is global to the process. Note that, this patch reduces the performance of brick mux use case, due to lock contention in io-thread implementation. Will have subsequent patches by Mohit Agarwal solving the lock contention. Updates: gluster#475 Change-Id: I131b9f7f8d95a58b1dc3667bf973a6e1afd9ce50 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Harpreet kaur <hlalwani@redhat.com>
Problem: Avoid thread creation for bitrot-stub for a volume if feature is not enabled Solution: Before thread creation check the flag if feature is enabled Updates: #475 Change-Id: I2c6cc35bba142d4b418cc986ada588e558512c8e Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
What is the completion status of this issue in release-6? The issue is still open, hence what is pending? Based on which we can decide what we need to get into the release notes for the release. |
Problem: Changelog creates threads even if the changelog is not enabled Background: Changelog xlator broadly does two things 1. Journalling - Cosumers are geo-rep and glusterfind 2. Event Notification for registered events like (open, release etc) - Consumers are bitrot, geo-rep The existing option "changelog.changelog" controls journalling and there is no option to control event notification and is enabled by default. So when bitrot/geo-rep is not enabled on the volume, threads and resources(rpc and rbuf) related to event notifications consumes resources and cpu cycle which is unnecessary. Solution: The solution is to have two different options as below. 1. changelog-notification : Event notifications 2. changelog : Journalling This patch introduces the option "changelog-notification" which is not exposed to user. When either bitrot or changelog (journalling) is enabled, it internally enbales 'changelog-notification'. But once the 'changelog-notification' is enabled, it will not be disabled for the life time of the brick process even after bitrot and changelog is disabled. As of now, rpc resource cleanup has lot of races and is difficult to cleanup cleanly. If allowed, it leads to memory leaks and crashes on enable/disable of bitrot or changelog (journal) in a loop. Hence to be safer, the event notification is not disabled within lifetime of process once enabled. Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a Updates: #475 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
Problem: Changelog creates threads even if the changelog is not enabled Background: Changelog xlator broadly does two things 1. Journalling - Cosumers are geo-rep and glusterfind 2. Event Notification for registered events like (open, release etc) - Consumers are bitrot, geo-rep The existing option "changelog.changelog" controls journalling and there is no option to control event notification and is enabled by default. So when bitrot/geo-rep is not enabled on the volume, threads and resources(rpc and rbuf) related to event notifications consumes resources and cpu cycle which is unnecessary. Solution: The solution is to have two different options as below. 1. changelog-notification : Event notifications 2. changelog : Journalling This patch introduces the option "changelog-notification" which is not exposed to user. When either bitrot or changelog (journalling) is enabled, it internally enbales 'changelog-notification'. But once the 'changelog-notification' is enabled, it will not be disabled for the life time of the brick process even after bitrot and changelog is disabled. As of now, rpc resource cleanup has lot of races and is difficult to cleanup cleanly. If allowed, it leads to memory leaks and crashes on enable/disable of bitrot or changelog (journal) in a loop. Hence to be safer, the event notification is not disabled within lifetime of process once enabled. > Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a > Updates: gluster#475 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (Cherry pick from commit 6de80bc) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/21896/) BUG: 1790336 Change-Id: Ifd00286e0966049e8eb9f21567fe407cf11bb02a Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/202778 Tested-by: Mohit Agrawal <moagrawa@redhat.com> Tested-by: RHGS Build Bot <nigelb@redhat.com> Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
If I start just 1 fsd process, below is the summary of threads:
That is 20 different threads, where only 3 threads here would be required in I/O path. So there is a lot of benefit if we move other threads into a single pool (if it makes sense), where we make use of timer to trigger time based validations (like
glfs_posixjanitor
etc).The text was updated successfully, but these errors were encountered: