Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to display binary data #506

Closed
simonw opened this issue Jun 8, 2019 · 10 comments

Comments

Projects
None yet
2 participants
@simonw
Copy link
Owner

commented Jun 8, 2019

In #442 we suppressed rendering of binary data:

many-photos-tables__RKAlbumVersion_albumId_RidIndex__36_rows

It turns out there is one use-case where displaying binary data is useful: when you're poking around looking at random SQLite databases you find in ~/Library trying to figure out what they are for.

So, a mechanism for opting in to ugly display of binary data again would be useful.

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 8, 2019

This could also be handled by a plugin.

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 8, 2019

Possible plugin direction:

import filetype # https://pypi.org/project/filetype/

@hookimpl(trylast=True)
def render_cell(value):
    if isinstance(value, bytes):
        info = repr(value)
        # May still want to truncate this on table view (but not on row page)
        guess = filetype.guess(value)
        if guess is not None:
            # Need jinja2 markup here for \n to display
            info = "Guess: mime={}, extension={}\n\n{}".format(
                guess.mime, guess.extension, info
            )
        return info

    return None
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 9, 2019

What are some other interesting tricks we can use to make binary data a bit more interesting to look at?

https://martin.varela.fi/2017/09/09/simple-binary-data-visualization/ has some really clever visualization tricks - probably a bit much for this plugin though. See also https://codisec.com/binary-visualization-explained/

https://github.com/tryexceptpass/perceptio is some much simpler code for rendering an image for a binary.

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 9, 2019

Another cheap trick is the equivalent of the Unix strings command - https://stackoverflow.com/questions/6804582/extract-strings-from-a-binary-file-in-python

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 9, 2019

This is quite nice:

$ od -c /tmp/Thumb64Segment_11.data | head -n 10
0000000   \0  \0   @  \0  \0  \0 005   5   X   T   S   F  \0  \0  \0 001
0000020   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0010000  025 030   . 377 026 032   . 377 027 033   - 377 031 035   0 377
0010020  032 036   1 377 036       5 377 037   !   8 377 036   "   : 377
0010040  036       8 377       $   : 377   !   &   ; 377   $   *   ? 377
0010060    '   -   ? 377   %   *   < 377   %   ,   > 377   -   3   E 377
0010100    6   ;   M 377   :   @   O 377   =   C   R 377   @   G   V 377
0010120    @   I   X 377   <   B   Q 377   8   @   N 377   8   @   P 377
0010140    :   C   T 377   ;   C   U 377   :   C   V 377   9   C   W 377

Here's a rough Python equivalent http://code.activestate.com/recipes/579120-data_dumppy-like-the-unix-od-octal-dump-command/

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 9, 2019

3C9CCDBA-F346-47CB-BFEC-964B0426E728

New idea: show essentially this but differentiate the escape sequences in some way. Maybe wrap them in <code> or put the non-escape sequences in bold?

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 9, 2019

I'm going to call this datasette-render-binary: https://github.com/simonw/datasette-render-binary

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 9, 2019

Shipped 0.1 of the plugin! I'm pretty happy with this display format:

many-photos-tables__RKFaceCrop__58_rows

@simonw simonw closed this Jun 9, 2019

@Gagravarr

This comment has been minimized.

Copy link

commented Jun 9, 2019

If you don't mind calling out to Java, then Apache Tika is able to tell you what a load of "binary stuff" is, plus render it to XHTML where possible.

There's a python wrapper around the Apache Tika server, but for a more typical datasette usecase you'd probably just want to grab the Tika CLI jar, and call it with --detect and/or --xhtml to process the unknown binary blob

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 11, 2019

Calling out to Tika does make me a little nervous, but that's why Datasette has plugins! A plugin that calls Tika (and caches the results) could be really interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.