Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use older filetype detection for linux distributions which do not have python3-magic version 0.4.20 #3021

Closed
nkshirsagar opened this issue Sep 6, 2022 · 3 comments

Comments

@nkshirsagar
Copy link
Contributor

nkshirsagar commented Sep 6, 2022

On sos 4.3, filetype detection does not use magic.detect_from_filename()

    def file_is_binary(self, fname):
        """Determine if the file is a binary file or not.


        :param fname:          Filename relative to the extracted archive root
        :type fname:           ``str``

        :returns:   ``True`` if file is binary, else ``False``
        :rtype:     ``bool``
        """
        with open(self.get_file_path(fname), 'tr') as tfile:
            try:
                # when opened as above (tr), reading binary content will raise
                # an exception
                tfile.read(1)
                return False
            except UnicodeDecodeError:
                return True

but it does on 4.4,

def file_is_binary(fname):
    """Helper to determine if a given file contains binary content or not.

    This is especially helpful for `sos clean`, which cannot obfuscate binary
    data and instead, by default, will remove binary files.

    :param fname:   The full path of the file to check binaryness of
    :type fname:    ``str``

    :returns:   True if binary, else False
    :rtype:     ``bool``
    """
    try:
        _ftup = magic.detect_from_filename(fname)

( due to 0689d8b )

Older linux distributions may not be having python3-magic version 0.4.20 which provides the detect_from_filename() method (introduced in 0.4.19 , thanks @arif-ali !)

So sos should detect if an older python3-magic version than 0.4.20 is installed and if so, use the older way to detect if a file is binary.

@TurboTurtle
Copy link
Member

TurboTurtle commented Sep 6, 2022

Copying from IRC -

python3-magic has the current dependency for 0.4.20 (edit: or newer) due to the use of a properly functioning detect_from_filename() method. 0.4.19 introduced this method and at the same time replacing the older methods. Meaning, moving forward our only option via this library is this function - the previous approaches are not even no longer maintained but are completely removed.

I'd first suggest seeing if 0.4.20 (or newer) can be pushed as an upgrade in the older distro releases. If not, then we could certainly accept a PR that implements fallback logic for the older versions of the magic module.

@nkshirsagar
Copy link
Contributor Author

nkshirsagar commented Sep 8, 2022

We have decided to patch downstream old distributions for Ubuntu like so, (basically reverting to older approach for binary file detection). This keeps file_is_binary() in utilities but changes its internal behavior to not use magic.detect_from_filename() but the older 4.3 approach.

--- sosreport-4.4.orig/requirements.txt
+++ sosreport-4.4/requirements.txt
@@ -2,5 +2,4 @@ pycodestyle>=2.4.0
 coverage>=4.0.3
 Sphinx>=1.3.5
 pexpect>=4.0.0
-python_magic>=0.4.20
 pyyaml
--- sosreport-4.4.orig/setup.py
+++ sosreport-4.4/setup.py
@@ -107,7 +107,7 @@ setup(
     ],
     cmdclass=cmdclass,
     command_options=command_options,
-    requires=['pexpect', 'python_magic', 'pyyaml']
+    requires=['pexpect', 'pyyaml']
     )
 
 
--- sosreport-4.4.orig/sos/utilities.py
+++ sosreport-4.4/sos/utilities.py
@@ -19,7 +19,6 @@ import tempfile
 import threading
 import time
 import io
-import magic
 
 from contextlib import closing
 from collections import deque
@@ -75,17 +74,14 @@ def file_is_binary(fname):
     :returns:   True if binary, else False
     :rtype:     ``bool``
     """
-    try:
-        _ftup = magic.detect_from_filename(fname)
-        _mimes = ['text/', 'inode/']
-        return (
-            _ftup.encoding == 'binary' and not
-            any(_ftup.mime_type.startswith(_mt) for _mt in _mimes)
-        )
-    except Exception:
-        # if for some reason this check fails, don't blindly remove all files
-        # but instead rely on other checks done by the component
-        return False
+    with open(fname, 'tr') as tfile:
+        try:
+            # when opened as above (tr), reading binary content will raise
+            # an exception
+            tfile.read(1)
+            return False
+        except UnicodeDecodeError:
+            return True
 
 
 def find(file_pattern, top_dir, max_depth=None, path_pattern=None):

@TurboTurtle
Copy link
Member

Ack, looks fine to me. Closing as resolved downstream.

pmoravec added a commit to pmoravec/sos that referenced this issue Sep 12, 2022
For compatibility reasons on some distros, sos should not have a hard
dependency on 'magic' python library. It should attempt to use it for
detection of binary file content, but should fall back to previous "read
the very first byte" method otherwise.

Resolves: sosreport#3025
Relates: sosreport#3021

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Sep 13, 2022
For compatibility reasons on some distros, sos should not have a hard
dependency on 'magic' python library. It should attempt to use it for
detection of binary file content, but should fall back to previous "read
the very first byte" method otherwise.

Resolves: sosreport#3025
Relates: sosreport#3021

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Sep 13, 2022
For compatibility reasons on some distros, sos should not have a hard
dependency on 'magic' python library. It should attempt to use it for
detection of binary file content, but should fall back to previous "read
the very first byte" method otherwise.

Resolves: sosreport#3025
Relates: sosreport#3021

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Sep 18, 2022
For compatibility reasons on some distros, sos should not have a hard
dependency on 'magic' python library. It should attempt to use it for
detection of binary file content, but should fall back to previous "read
the very first byte" method otherwise.

Resolves: sosreport#3025
Relates: sosreport#3021

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
TurboTurtle pushed a commit that referenced this issue Sep 21, 2022
For compatibility reasons on some distros, sos should not have a hard
dependency on 'magic' python library. It should attempt to use it for
detection of binary file content, but should fall back to previous "read
the very first byte" method otherwise.

Resolves: #3025
Relates: #3021

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants