Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Abdul-Muqadim-Arbisoft
Copy link
Contributor

@Abdul-Muqadim-Arbisoft Abdul-Muqadim-Arbisoft commented Feb 19, 2024

This update significantly refines the is_binary_file function with a more nuanced detection mechanism. The function now starts by attempting to guess the MIME type of the file based on its path. If the MIME type explicitly indicates a non-text nature or belongs to a set of known binary MIME types, the file is considered binary. However, this version introduces a critical improvement: it recognizes specific MIME types and file extensions that, despite not starting with 'text/', are commonly associated with text content (e.g., 'application/xml', 'application/json'). These are explicitly checked and treated as text files.

In situations where the MIME type is inconclusive or suggests a binary nature — but the file extension is known to be associated with text content (such as '.html', '.xml', '.json', '.css', '.js') — the function overrides the MIME type indication and classifies the file as text. Conversely, if the file's extension is among those recognized as binary, it is deemed as such, making this approach highly effective for distinguishing between binary and text files with greater precision

@Abdul-Muqadim-Arbisoft Abdul-Muqadim-Arbisoft changed the title Enhance binary file detection by integrating MIME type analysis Enhance binary file detection by integrating MIME type analysis, resolves #975 Feb 19, 2024
@Abdul-Muqadim-Arbisoft Abdul-Muqadim-Arbisoft added the enhancement Enhancements will be processed by decreasing priority label Feb 19, 2024
tutor/env.py Outdated Show resolved Hide resolved
Copy link
Contributor

@DawoudSheraz DawoudSheraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add a changelog
  • Squash the commits

@Abdul-Muqadim-Arbisoft Abdul-Muqadim-Arbisoft requested review from regisb and removed request for regisb February 22, 2024 12:24
@Abdul-Muqadim-Arbisoft
Copy link
Contributor Author

@regisb can you squash the commits while merging the branch

@DawoudSheraz
Copy link
Contributor

@regisb A reminder for reviewing and squash-merging this PR, thanks.

Copy link
Contributor

@regisb regisb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, this PR was designed to address issue #975 right? If yes, then we need to implement the filter that I described in this comment. This filter would then make use of the is_binary function to decide whether any given file should be rendered. Does that make sense? If not we should have a live discussion about this.

Also, please rebase and squash your changes on top of master.

@@ -0,0 +1 @@
- [Improvement] This is a non-breaking change. Enhanced is_binary_file function in env file for better binary file detection. (by @Abdul-Muqadim-Arbisoft)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry is not very informative. As a Tutor user or plugin developer, what change(s) should I expect? Will some files now be considered as text/binary, that previously were not? If there is no change, what's the purpose of this PR?

@regisb
Copy link
Contributor

regisb commented Apr 16, 2024

To clarify my thought above, here's what I have in mind:

@hooks.Filters.IS_FILE_RENDERED.add()
def _do_not_render_binary_files(result: bool, path: str) -> bool:
    if result and is_binary(path):
        result = False
    return result


def render_files():
    for path in files:
        if hooks.Filters.IS_FILE_RENDERED.apply(True, path):
            # render file
        else:
            # copy file

To understand the benefit of the above approach, consider the following scenarios:

  1. A certain plugin needs a certain html file to be copied, not rendered.
  2. Plugin 1 needs .zoopla1 files to be copied, not rendered. Same for plugin 2 with .zoopla2 files.
  3. The list of binary extensions changes over time.

Neither scenario can be addressed with simple configuration settings; in the scenarios 2 and 3, users have to run manual config save --append commands to keep up with the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements will be processed by decreasing priority
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants