New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export blobs confus(ed,ing) - of (at least) two minds #1438

Open
tlhackque opened this Issue Jun 20, 2018 · 17 comments

Comments

Projects
None yet
3 participants
@tlhackque
Copy link

tlhackque commented Jun 20, 2018

This is almost a feature request, but I came down on the side of a UI bug...

The browser provides a "binary" view of blob data that is handy for humans.

Sometimes it's useful to extract the blob as binary for analysis (or formatting) by external tools.

The Export button in the "Edit Database Cell" pane is of two minds on this -- maybe more.

On one hand, it displays a text hexdump - which is handy for humans.

But Export provides a filename of ".txt", so one expects a human-readable form to be written.

Then again, the output isn't the text dump, but appears to be the raw data. I haven't tried a complex test case, but hopefully the output is written in binary ("wb") mode. (I encountered this with a record that has a a few bare ^M s - which seems to be enough for you to decide it's a blob.)

The following observations seem apropos:

  • when exporting a blob, the data should be written in binary mode
  • The filename should not be associated with a text file -- .bin or .dat would be more appropriate
  • It should be clear that what is exported is the data, not the hex dump that one sees
  • It might be nice to have a choice between exporting the hex dump (as a text file) and the raw binary (suitably identified). Of course, given the binary, other tools can re-create a dump. But as you have one, might as well allow exporting it... Might be a use for the "Autoformat" button when viewing a blob.
  • The message binary data can't be viewed with the text editor is misleading; it can be viewed if you switch to "binary mode". The message should be something like binary data can't be viewed in this mode. Try selecting Binary mode.

As long as you're providing a hex dump, it would also be helpful to be able to select formatting as (16-bit, 32-bit or 64-bit) words. Saves manually grouping and flipping bytes on a little-endian machine.

Hope this is useful.

@justinclift justinclift added the bug label Jun 20, 2018

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Jun 20, 2018

Ahhh, good point. We've had the export button for saving out data for ages. Probably from before we added the various view types, and we didn't update it to take them into account. 😄

@mgrojo mgrojo self-assigned this Jun 20, 2018

@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Jun 20, 2018

All this makes sense.

  • Setting an appropriate extension given the data type is straightforward.
  • Allowing to choose between hex dump and binary data would need more work, but I'll take a look.
  • The proposed message for the binary data is better than the current one, but the change would outdate the translations. Given the amount of work for translators in the next release, it might be a trifle, though. @justinclift, what do you think?
@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Jun 20, 2018

@tlhackque is the extension automatically added to the filename for you? In my case (Ubuntu 16.04), I have to add myself whatever extension I want, otherwise the file is created without extension. Maybe under Windows is different. In any case, it makes sense to set the appropriate filter for each case.

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Jun 20, 2018

Under Windows, the Save As box comes up with "Save as type: Text files (*.txt).

If I enter just a filename, .txt is appended.

If I enter an explicit extension, it is honored. But the view is of any .txt files in the save path, not .bin files. And it's suggesting that the content will be text.

The dialog box also allows "All files (.)" as a (file) browser filter, but that isn't terribly helpful.

BTW, I just extracted a blob of deflated (pure binary) data & recovered the contents correctly.

So - either you're writing both text and binary in binary mode (windoze cares & will turn to , or you already have some cleverness...

My current project is Windoze based, so I can't say anything about what you do in the Unix environment.

FWIW, I'm using a nightly build from a few daze ago (I wanted the latest SQLite).

HTH.

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Jun 20, 2018

"Binary data can't be viewed in this mode. Try selecting Binary mode."

@mgrojo Agreed. That sounds better. Our translaters will have a bunch of stuff to change anyway, and I'm pretty sure they'd prefer to have better source text in the UI too. 😄

For exporting a hex dump, should we use the extension .hex to hopefully make it incredibly hard to miss as to the output format in that instance? 😄

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Jun 20, 2018

For exporting a hex dump, should we use the extension .hex to hopefully make it incredibly hard to miss as to the output format in that instance? 😄

No, because .hex traditionally means a hex-encoded binary stream in Intel or Motorola format used for burning a (P)ROM, FLASH or PLD device, e.g. S-records.

.txt would be fine - note that it's also associated (all OS) with text editors that can read it.

(Well, except for the Notepad/wordpad mess on windoze - but I hear that's about to be fixed after a few decades...)

Suggest you make the Export dialog be something like

Title - Export "Blob" contents to file

[ .. folders, files, etc from the dialog box]

Save as [ Formatted text as shown (*.txt) ] [ Entry box ]
              Binary data (*.bin)                < -- Dropdown options
              Base64-encoded binary (*.b64) 

Or anything else that comes to mind as useful. I don't think you want a huge list, but I bet there are some common formats for blobs, like graphics (.jpg,png,gif) that might be worth considering - they don't cost anything except a menu entry. (For extra credit, you could even run the stream against one of the magic recognizers and pick one automagically. (e.g. the database used by the Unix 'file' command). As long as it doesn't go into the dozens of entries in the pulldown...

mgrojo added a commit that referenced this issue Jun 20, 2018

Improve messages in incorrect editor modes for better feedback
The messages in the invalid modes for the current data type in "Edit
Database Cell" are improved for giving hint to the user about the correct
modes.

For this commit and 885f4f7 see issue
#1438
@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Jun 20, 2018

The first and third points are already addressed and will be available in next nightly build. @tlhackque, please, can you give it a try and confirm that the filters works for you as expected?

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Jun 20, 2018

Sure -- let me know when it's available (I'm actually working one something else so looking at this is interrupt-driven), & I'll give it a whirl.

Thank you for taking this on so quickly and being so responsive.

In case my last (non-bulleted) note in .0 wasn't clear:

As long as you're providing a hex dump, it would also be helpful to be able to select formatting as (16-bit, 32-bit or 64-bit) words. Saves manually grouping and flipping bytes on a little-endian machine.

What I mean is that the hex bytestrings in a dump (written low address to high left to right):

        00                                           0f
  0000: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f

as words is read (little endian) as:

  16-bit: 0100 0302 00504 0706 0908 0b0a 0d0c 0f0e      (8 words)
  32-bit: 03020100 07060504 0b0a0908  0f0e0d0c          (4 words)
  64-bit: 0f0e0d0c0b0a09080706050403020100             (1 word)

The easiest approach is to reverse the bytes in the listing, so the highest address in each row
is on the left (and the address on the right). This makes decoding things of any size simply a matter of grouping:

   0f                                           00
   0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 :0000

So dump generators traditionally provide the option to view the data in either of these byte-swapped formats. Some also will insert spaces every 2, 4, or 8 bytes to show word boundaries. This tends to work even with a variety of elements packed into a structure, since they (hopefully) are usually naturally aligned for performance.

For big-endian (Network Order) data, the current left to right order works - grouping in word-sized chunks is the same as for little-endian.

So a simple toggle (big/little = left to right or right to left) would satisfy everyone. :-)

Well, to the extent that anyone is ever satisfied ;-0

@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Jun 23, 2018

@tlhackque Today's nightly build already contains the improvement. In fact they are built every day aroun 5 UTC (I think) with latest content of the master brunch in this repository.

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Jun 24, 2018

@mgrojo Thanks for the wok.

Sorry for the delay.

I got this morning's build (MSI, W64). Save as now prompts for .bin & the messages are changed.

Both look improved.

But - I noticed the Import button. And after clicking on it, I realized that things aren't quite right yet :-(

It prompts for .txt files, even in binary mode. It does offer .bin, but the default selection should match the window's mode. E.g., if the Mode is binary, .bin should be the default; if the mode is image, the default should be images - etc.

This is also true - and somewhat worse on Export - the default selection for image mode is .bin. This is an improvement over TXT, but inappropriate for images. Worse is that in this case, the "save as type' pulldown only includes "bin" and "all files" This doesn't only control the default name; it also controls the default view (of existing files that one might want to use/supersede/save).

So, what I think we want is;

  • Both import and export should select the default type based on the mode.
  • Export should offer all appropriate file types, based on the mode.
  • All files should be available (but not the default) for anyone with special needs.

i think the mapping looks like this (Same for import and export):

 Text mode  : Offer "Text files (*.txt)" as default.  JSON, XML and All Files as options
 Binary mode: Offer 'binary (.bin)" as default.  image files ( .bmp - .xpm) & All files as options.
 Image mode:  Offer "Image files ( .bmp .. .xpm) as default.  Binary & all files as options.
 JSON mode :  Offer 'JSON files" as default,  text files, all files as options
 XML mode:    Offer 'XML files" as default, text files, all files as options.

Note that the text modes don't offer binary; image mode is s special case of binary. json and xml don't overlap each-other, but both are special cases of text files.

Hope this makes sense.

I know I'm being somewhat picky - but these things do make a difference when you try to use the tool.

Thanks again.

@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Jun 25, 2018

But - I noticed the Import button. And after clicking on it, I realized that things aren't quite right yet :-( It prompts for .txt files, even in binary mode. It does offer .bin, but the default selection should match the window's mode. E.g., if the Mode is binary, .bin should be the default; if the mode is image, the default should be images - etc.

The list of filters for the import button is always the same, independently of the selected mode or the detected data type already in the editor.

Text files (*.txt);;Image files (%1);;JSON files (*.json);;XML files (*.xml);;Binary files (*.bin);;All files (*)

The rationale is that importing can change the data type. You can import any kind of data, even in any mode. But this could be made different. It could only allow to load the data of the selected mode. I don't have a strong opinion about it.

This is also true - and somewhat worse on Export - the default selection for image mode is .bin. This is an improvement over TXT, but inappropriate for images. Worse is that in this case, the "save as type' pulldown only includes "bin" and "all files" This doesn't only control the default name; it also controls the default view (of existing files that one might want to use/supersede/save).

The export filters should be intelligent based on the data types that we are currently detecting, but not based on the current mode. This means, that if the offered filter was *.bin, that means that the data was detected as binary. Was it indeed an image in your test? Then it might be a bug.

The only current caveat of this approach is that we are not currently detecting the XML data. We should probably do it.

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Jun 25, 2018

The rationale is that importing can change the data type. You can import any kind of data, even in any mode.

You are correct, of course. My point is that changing the type of data in a field is unusual, even during development. I don't mean to disallow it, but I do think that the default should follow what is already in the cell. The idea (and the reason to always allow .) is that with sharp tools, people can certainly hurt themselves. but the browser shouldn't guide them to mistakes.

I don't remember for sure what was in the cell when i looked for export - I think it was NULL.

I didn't realize that you look at the actual data when selecting the output filters. That's probably better than following the editor mode. But I'm not sure exactly how that works, given SQLite's loose typing.

If I click on an integer cell, it's detected as text. Which is fuzzy. I think of Integer as a binary field - it's stored that way. But it is human-readable, and it's probably natural to edit it that way.

Speaking of which, if I click on a blob and use binary mode - the binary isn't editable. Which might be a nice thing to have. (e.g. click on the hex or text, change a byte and it updates the cell)

I don't know what you would export from a NULL cell - but it certainly isn't text. (I verified that last night's build definitely does suggest text for NULL.) I suppose it could be a zero-length binary file. Or you could just refuse to export - after all, NULL is nothing.

As you DO detect what's in a cell, why not change the edit mode to match when a cell is selected?
If I click on a JPEG cell (i have one), why not make the edit mode switch to image automagically?

Which led me to try the next "obvious' thing. I selected a row and clicked export. I got only one cell - the one i happened to have clicked last. That isn't unreasonable.

But there's no context menu choice to export an entire row, which seems odd.
(Of course when a row is a mixture of types, the question becomes how to export it. One answer is the escaped-literal format used in SQL dumps. That could be read back in with an insert.

It might be useful to be able to select a few rows and export them in that way...

mgrojo added a commit that referenced this issue Sep 15, 2018

Allow exporting the textual representation of binary data
For binary data, the file save dialog allows to select text files. When
the user saves to a text file (*.txt) the visual representation of the
hex buffer is saved to the file (addresses, hexadecimal bytes and ASCII
view). In this dump, only US-ASCII seems to be considered printable, while
in screen, Latin-1 is also considered.

This was one of the enhancement suggestions in issue #1438.
@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Sep 28, 2018

You are correct, of course. My point is that changing the type of data in a field is unusual, even during development. I don't mean to disallow it, but I do think that the default should follow what is already in the cell. The idea (and the reason to always allow .) is that with sharp tools, people can certainly hurt themselves. but the browser shouldn't guide them to mistakes.

This is improved now. The default filter in the Import is following the editor mode. Would you want to test it in the nightly?

I don't remember for sure what was in the cell when i looked for export - I think it was NULL.

When the value is NULL, the export will have text and binary filters, but it will make an empty file in any case.

I didn't realize that you look at the actual data when selecting the output filters. That's probably better than following the editor mode. But I'm not sure exactly how that works, given SQLite's loose typing.

The application has its own detection algorithms.

If I click on an integer cell, it's detected as text. Which is fuzzy. I think of Integer as a binary field - it's stored that way. But it is human-readable, and it's probably natural to edit it that way.

There is a single data detected for text and numeric data, since both are updated using the text editor. It would make sense to see the real binary representation of the integer, though. There is already an issue: #1416.

Speaking of which, if I click on a blob and use binary mode - the binary isn't editable. Which might be a nice thing to have. (e.g. click on the hex or text, change a byte and it updates the cell)

This is only expected for read-only databases or views. Can you reproduce it in a read-write table?

I don't know what you would export from a NULL cell - but it certainly isn't text. (I verified that last night's build definitely does suggest text for NULL.) I suppose it could be a zero-length binary file. Or you could just refuse to export - after all, NULL is nothing.

Yes, maybe we should disable the export in that case, but it is harmless. Maybe in the future.

As you DO detect what's in a cell, why not change the edit mode to match when a cell is selected?
If I click on a JPEG cell (i have one), why not make the edit mode switch to image automagically?

This was separated to a new issue (#1537) and it's already in the nightlies. Would you like to try it?

Which led me to try the next "obvious' thing. I selected a row and clicked export. I got only one cell - the one i happened to have clicked last. That isn't unreasonable.

Yes, because the editor loads the last selected cell. It only understands about cells.

But there's no context menu choice to export an entire row, which seems odd.
(Of course when a row is a mixture of types, the question becomes how to export it. One answer is the escaped-literal format used in SQL dumps. That could be read back in with an insert.

It might be useful to be able to select a few rows and export them in that way...

You can copy the data and paste it in other applications. There is also an SQL version that copies the insert statement for those cells. You can now also print them to a PDF file or to a real printer (unless you have problems with it like those reported in #760.

mgrojo added a commit that referenced this issue Sep 29, 2018

Add "Hex dump files" as an export option when data source is hex editor
When the data source is the hex editor we are able to save whatever
data type as shown in the widget, so additionally to set the filters
according to the data type a "Hex dump files (*.txt)" filter option is
added.

If the user selects this option for saving, the hexadecimal dump of the
widget content is saved to the file. Note that the check must be performed
using the selected filter by the user and not the file ending, which would
be the same for text data exported as plain text.

The Null case is disregarded as it is useless for exporting.

See issues #1438 and #1485
@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Sep 29, 2018

@tlhackque, I've finally implemented the hex dump as an option, independent of the detected data type, when the editor mode is set to Binary.

Would you mind testing it with tomorrow's nightly and confirm whether it's working for you?

@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Sep 29, 2018

By the way, there is an option in the hex editor widget library for dynamically changing the number of columns. Instead of a horizontal scrollbar and a fixed number of columns, we can control the number of columns adjusting the width of the panel. This seems more useful to me. What others think? It's easy to activate it if we don't need an option for it.

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Sep 29, 2018

I'll try to look at the new stuff in the next few days; it's a busy time.

With respect to dynamically changing number of columns: I'm OK IF the result is an offset that increments by an even multiple of 8 for each row (though I prefer 10 (hex - 16 decimal). Otherwise, one has to do too much math to find a byte (or offset). I wouldn't want to have to fuss with the panel width to get a convenient offset increment; the dump should snap to one.

E.g. if the panel width can fit17-31 bytes/row, the dump should use 16. If it would fit 9-15, the dump should use 8.

@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Sep 29, 2018

No, it's dumber than that, it just puts so many columns as fit in the available width. Better not to enable it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment