Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add usage of Threads to "xz"-command on archive-routine #1196

Closed
knumskull opened this issue Jan 29, 2018 · 5 comments
Closed

Add usage of Threads to "xz"-command on archive-routine #1196

knumskull opened this issue Jan 29, 2018 · 5 comments

Comments

@knumskull
Copy link

On systems, with big amount of logfiles, running sosreport will take a lot of time.
A lot of time is lost during the archiving-process. This could be improved by using threads for xz-command easily.

A possible way could be to use 20% of the available cores for the archiving process.

# diff ~/archive.py /usr/lib/python2.7/site-packages/sos/archive.py
28,29d27
< import multiprocessing
< import math
437,439d434
<             # use multiple threads if using xz
<             if cmd == "xz":
<                 cmd = "%s -T%d" % (cmd,math.ceil(multiprocessing.cpu_count()/5))
@TurboTurtle
Copy link
Member

Looks like xz-5.2 is what added threaded compression - which looks like it was in F24 and RHEL 7.3. I'm not sure what Debian/Ubuntu releases it ships in.

Thinking we should do this at the same time we address running plugins in parallel, @bmr-cymru @pmoravec

@pmoravec
Copy link
Contributor

pmoravec commented Jan 30, 2018

Running plugins in parallel and this improvement are imho independent on each other but both aiming the same direction - they can just share sosreport's setting of # of threads (that would make sense imho).

Why use math.ceil, if dividing returns integer value every time? What about:

  • passing new sosreport option threads via archive = self.archive.finalize(..) call
  • updating relevant code to:
    if cmd == "xz":
        if threads == None:   # or 0, simply cmdline option not set
            threads = multiprocessing.cpu_count()/5
            if threads == 0:
                threads = 1  # so we dont need to use math.ceil and import math
        cmd = "%s -T%d" % (cmd, threads)

@knumskull
Copy link
Author

Yes, makes much more sense. Forgot about python is doing the right math here.

@TurboTurtle
Copy link
Member

We'd also want to check to xz version - if a user has an old xz package without this check, we'll fail to create the archive every time.

@pmoravec
Copy link
Contributor

pmoravec commented Mar 3, 2018

Depends on #1199 , once that is merged, we can propose patch here like:

--- /usr/lib/python2.7/site-packages/sos/sosreport.py.orig	2018-03-03 14:50:14.271792916 +0100
+++ /usr/lib/python2.7/site-packages/sos/sosreport.py	2018-03-03 14:54:44.338248720 +0100
@@ -769,9 +769,9 @@ class SoSReport(object):
                                     self.policy.get_archive_name())
         if self.opts.compression_type == 'auto':
             auto_archive = self.policy.get_preferred_archive()
-            self.archive = auto_archive(archive_name, self.tmpdir)
+            self.archive = auto_archive(archive_name, self.tmpdir, self.get_commons())
         else:
-            self.archive = TarFileArchive(archive_name, self.tmpdir)
+            self.archive = TarFileArchive(archive_name, self.tmpdir, self.get_commons())
         self.archive.set_debug(True if self.opts.debug else False)
 
     def _make_archive_paths(self):
--- /usr/lib/python2.7/site-packages/sos/archive.py.orig	2018-03-03 14:51:07.360685937 +0100
+++ /usr/lib/python2.7/site-packages/sos/archive.py	2018-03-03 15:12:04.474193009 +0100
@@ -372,10 +372,11 @@ class TarFileArchive(FileCacheArchive):
     method = None
     _with_selinux_context = False
 
-    def __init__(self, name, tmpdir):
+    def __init__(self, name, tmpdir, commons):
         super(TarFileArchive, self).__init__(name, tmpdir)
         self._suffix = "tar"
         self._archive_name = os.path.join(tmpdir, self.name())
+        self._commons = commons
 
     def set_tarinfo_from_stat(self, tar_info, fstat, mode=None):
         tar_info.mtime = fstat.st_mtime
@@ -450,6 +451,18 @@ class TarFileArchive(FileCacheArchive):
             # use fast compression if using xz or bz2
             if cmd != "gzip":
                 cmd = "%s -1" % cmd
+            threads = 10 # load it from self._commons & params
+            if cmd.startswith("xz"):
+                try:
+                    xz_version = self._commons["policy"].package_manager.all_pkgs()['xz']["version"]
+                except Exception:
+                    xz_version = [0] # deal like xz version is really old
+                if xz_version >= [u'5',u'2']:
+                    if threads == None:   # or 0, simply cmdline option not set
+                        threads = multiprocessing.cpu_count()/5
+                        if threads == 0:
+                            threads = 1  # so we dont need to use math.ceil and import math
+                    cmd = "%s -T%d" % (cmd, threads)
             try:
                 r = sos_get_command_output("%s %s" % (cmd, self.name()),
                                            timeout=0)

pmoravec added a commit to pmoravec/sos that referenced this issue Jun 11, 2018
Resolves: sosreport#1196

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Jun 11, 2018
Resolves: sosreport#1196

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Jun 11, 2018
Resolves: sosreport#1196

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Jun 18, 2018
In some cases, we need to determine what package provides given
binary. This auxiliary function implements it - so far it works
for and will be used by archive methods.

Related to: sosreport#1196

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Jun 18, 2018
Moving also building of the command from Archive to Plugin class.

Resolves: sosreport#1196

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
bmr-cymru pushed a commit that referenced this issue Jun 18, 2018
In some cases, we need to determine what package provides given
binary. This auxiliary function implements it - so far it works
for and will be used by archive methods.

Related to: #1196

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants