-
-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphics rendering #33
Comments
I really love this idea. So to put this into my own words: Assuming I got it right, the first issue that comes to my mind would be the drawing at cursor position, some applications like ranger show some image preview like this. Next thought: Next: |
Yes, there will be the possibility to specify formats for image data, probably something like: And yes you can specify whole images at a time, they will simply span multiple cells, something like this:
Then the terminal will render the 500x300 pixels image starting at position (2, 3) from the top of the current cell. |
I have created a spec for the protocol here: https://github.com/kovidgoyal/kitty/blob/master/protocol-extensions.asciidoc#graphics-rendering |
What were the issues/limitations you saw with sixel? Implementing that protocol would allow you to take advantage of projects that already support sixel graphics (including a replacement for w3mimgdisplay, allowing images to work in w3m and ranger) rather than adding support in those applications afterwards. Would you consider adding sixel support in addition to this new graphics rendering protocol? |
IIRC (and please correct me if I am wrong)
|
As for adding sixel support -- I have nothing against it, but, practically speaking, since I dont have any need for it either, it will not get done unless someone else contributes. However, if and when I implement some variation of this protcol in kitty, adding support for additional legacy schemes, should be fairly easy to do on top of that. Basically all that has to be done is implement the parsing for the escape codes and then map the result onto this scheme (which is a strict superset of all the legacy schemes). |
As an aside, the palette format and low color support of sixels might be a good thing dealing remotely if all someone wants is a preview of a file. A 3.1MB 2560x1600 jpeg encoded as a 15-bit sixel image can be around 11 MB, while with 256 colors only it is 4.5M, and base64 encoded it is 4.1MB. After a certain point, it may be better to get the original image from the source and then display it because of how much data you'd be dealing with. |
2 and 4 are designed precisely to workaround the fact that images are very large as raw data. And it is trivial to add an indexed, 8-bit format to the new protocol if desired. |
Extending that last point, if the goal is remote image display, what you've got is probably the right way to do it. Locally though, w3m on X (when you're not using the sixel support) is drawing directly onto the terminal window, not having images decoded and drawn by the terminal at all. That route certainly has better performance, but is limited to applications running locally and is not display server agnostic. Thanks for the discussion. I had only briefly looked at the sixel protocol before and now I know more about it and why you've determined it is not an ideal solution. |
Interesting, I did not realize one could get cell width height and not just number of cells per row/number of cells per column using termios. At least according to http://man7.org/linux/man-pages/man4/tty_ioctl.4.html the One some investigation -- libvte based terminals and konsole both return 0 for those values. xterm returns the window size. |
Is the the spec implemented in master yet? I just tried it with no effect, trying to determine if user error. |
No not yet, I will close this bug when it is implemented. I'm rather busy with other things at the moment. |
OK, so some opinions about graphics rendering;
So now we have a simpler case where the client sends image data as base64 encoded PNG:s, binding them to image ids, and then uses that id to render the image on screen. Otherwise I think it's great. |
Yeah but on the other hand the savings from PNG are not that large for arbitrary images (i.e. images with lots of colors). And then you have baked in dependence on a particular image format for ever in the protocol. I dont think that's a good idea, and if we were going to do that, then why not opt for a more modern lossless format with better compression? Shared memory is a performance optimization, particularly for displaying animated images. But it's optional, if you dont want to implement it, simply have your emulator reply that it does not support it when queried. |
Just a heads up, I am starting work on implementing this in kitty. As I implement it, there will probably be changes/additions to the protcol, informed by the implementation. One change, as requested by sasq64 is the addition of a z-index so that graphics can be rendered below or above text and also alpha blended with each other. |
I thought about this again an maybe you're right that png shouldn't be used for a generic protocol. Better to have optional support for generic zlib/deflate compression for any resource sent from the application. |
Is that really needed though? Doesn't ssh already compress data? Double compression would just waste CPU and probably yield slightly larger data sizes because fo zlib headers. |
Compression is not on by default and not recommended unless in certain very low bandwidth situations afaik |
It's not on by default because it actually slows things down on fast networks, which is also true in the case of compression of image data. I cant think of a scenario where it makes sense to turn on compression for transmission of image data, but not for the ssh connection in general. |
Because you typically pre-compress resources for an application, so it will have no cost for the remote side, and decompression (on the local side) is much faster than compression. |
Certainly for application assets that is the case, but surely the main cost will be displaying actual image content not assets, which cannot be pre-compressed. Even in the case of application assets, it is unlikely that the assets will be both stored and displayed in the exact same resolution, so chances are they will need to be compressed at least once on the client. For example, the most common asset class --- application icons -- are typically stored in the application bundle in a single resolution and then resized on the fly depending on runtime conditions. I'm not familiar with how ssh does compression, but surely it only compresses data if there is a significant amount of it ready to send. At least, I have not noticed increased latency with compression over ssh on slow connections. But lets not get into the weeds about this. It is a minor thing that can always be added later, by simply adding a compression field to the escape code metadata. Something like c=gzip. Once we have a working viable implementation, we can do some benchmarking to see what the effects of compression actually are in actual usage scenarios and decide if it makes sense to add. |
Yes I think we imagine different scenarios of use. You mentioned that you're primary goal was an image viewer which means large, uncompressed image data. I imagine it more for extending existing applications with graphics features, or porting graphics only applications to the terminal. Either way, there needs to be a way to ask the terminal about these features. I suggest extending the normal Device Attributes command (which already can report features such as sixel graphics) with attributes related to this feature. so sending "CSI c" should answer something like The main benefit of extending an existing command is that the application knows it is getting a known format answer back, even on non supporting terminals. Some suggested attributes:
|
There is no need for separate queries. The graphics protocol itself allows querying, see the q field int eh command metadata. The client application simply sends dummy images with all the attributes it wants to test and the terminal replies with whether reading the iamges was successful or not. |
It might make sense to add a DECQRM code for testing for the existence of graphics support itself, but only if there are widespread client programs actually have trouble with interpreting APC codes. |
DECRQM seems to be to check if a certain mode is on or off, not for checking features... Regarding compression; How about allowing more formats in the "f=" key, ie a suffix for the type, where "z" mean z-compressed. I could also imagine other bit depths being supported besides 24 and 32 -- for mono colored decoration it would really make sense with 1-bit data for instance. Also compressed textures would speed up things a lot in certain scenarios. In short; basic formats are "32" and "24", but the terminal can optionally support things like "8", "32z" or "32PVRTC" ? |
Tried with lsix but only characters came out. |
since lsix uses the inferior sixel protocol for images, that is hardly surprising. As I've said before, i have no interest in implementing an inferior imaging solution in kitty, but patches are welcome. |
@kovidgoyal Thanks for your swift reply - I'll try to convince the lsix maintainer to perhaps look into other options, perhaps there's an approach both projects can bridge to each other. |
Hello, I have a terminal widget library here that has support for images in both its terminal widget (parsing sixel) and its terminal and Swing backends (rendering to sixel or images). I am curious about adding Kitty image support to my terminal widget. Main question: Is the Kitty image protocol specification here considered complete? Is it at a particular version number now? Next issue: My terminal treats image data as just "image stuff in an otherwise normal text cell". I.e. I do not have image IDs, z order, or storage. This has the nice property that all other VT100/Xterm sequences behave as one would expect. One could for example display image, and then delete text cells in the middle of the image, and parts of the image would move over just as text would. But I am not keen on the idea of an application ordering its terminal to do image stacking, clipping, etc. -- there is very little chance xterm would ever support that based on how its sixel images work. How am I supposed to tell a Kitty-supporting application which subset of functions I do support? Or is it expected that one must support all of it? Just curious, thanks for your time. |
The protocol specification is complete, has not had any additions in The fact that when you delete text unrelated images get distorted is one You can support whatever subset of the full protocol makes sense for |
Looking at the specification, some comments: General:
Some other methods for row/column count: ioctl (or in my case 'stty size'), CSI 18 t, the underlying protocol window size option (e.g. telnet NAWS option, rlogin window size), or move the cursor to something very large and use DSR 6 (still used by a lot of things).
What should the terminal do when control data/payload don't work, i.e. when some other application uses APC for its own needs that happens to start with 'G'? Silently ignore it?
PNG is not a pixel data format, it is a file format. Why limit to PNG? Why not JPG, TIFF, BMP, etc? For that matter, why not AVI or MP4? What should the terminal do if it cannot understand the pixel format provided (e.g. 24-bit RGB but not 32-bit RGBA)? Not display an image at all? Not reserve an image ID? Tell the application somehow that the image did not display?
Could consider adding a link to the summary table here. Where does the text cursor position end up after an image is successfully displayed? Does it move at all? Does it end up on the row below, or column to the right of the image? Should the screen scroll if the image is too tall? What if the image is wider than the screen?
Which deflate-based program are you referring to? Can I just run the pixel data through gzip?
How does the temporary file get there in the first place? The application puts it there, then sends the sequence to display it? This seems a security risk: it wouldn't be hard to fool the terminal into deleting the wrong file via clever use of symlinks and relative paths inside a tmp directory.
What about Windows users? They have a different shared memory model. Should a Windows-based terminal silently ignore this image, or respond somehow that it could not display it?
This is screaming "security hole."
Why this particular maximum length? According to Dickey: "string parameters (such as setting the title on a window) do not have a predefined limit on their length."
After the chunked data has been received and reassembled, if the resulting image data is not a valid PNG (or future file format) or RGB/RGBA, what should the terminal do? Silently ignore the whole thing? Display a partial image, since it knows how many rows/columns the image was supposed to be shown in? Since each chunk is a whole APC sequence, what should happen when printable characters or other VT100/Xterm sequences come in between chunks? Should the image be displayed where the cursor was at the first chunk received, or the final chunk? Or should anything that comes in between chunks cause this image to be treated as corrupt/discarded?
What about localization? Is the error message expected to be in English only? What kind of scrubbing/sanitation must be performed on the error message (ASCII includes C0 control characters afterall)?
What is the difference between a dummy image and a not-dummy image?
Repeating the question earlier: where does the cursor go after the image is displayed? Why bother drawing at the actual cursor at all? Add keys to pick the text row/column, and say that the cursor does not move: then there would be no further ambiguity.
What happens if they are not? Is the image not displayed at all? Why not allow negative offsets (imagine an application mouse-dragging a window pixel-by-pixel all over the screen)?
There are three scaling possibilities: stretch to fix X only and stretch/shrink Y, stretch to fix Y only and stretch/shrink X, and stretch/shrink both. How does the application select which option it wants?
What if the application wants the text background color to cover the image (no blending, no overlay: text is text)? Does it have to display a background-color image over that cell first, then text? If z is defined for images, why not text too? Add a sequence to select the text z order and current alpha blending value, then write text under or over existing.
How should the terminal tell the application that the image data was not actually deleted because it was used in the scrollback buffer?
How should the terminal tell the application that an image somewhere was deleted because it exceeded the quota? Someday we will all be using retina displays where 320MB will feel quaint.
Going back to the question on text completely covering images: this spec requires that covering an image with text means either making numerous calls to display pieces of an image around where the text will go, or having snippets of text background color as images to cover up the cells that the text will draw on. Not a problem for Jexer -- it already displays images as strips of text-cell-sized image pieces -- but could be a headache for less advanced systems.
Does clearing them on screen also delete them?
I don't see why clear screen is more important than erase line.
What about:
Summary thoughts:
|
On Tue, Aug 27, 2019 at 05:46:20AM -0700, Kevin Lamonte wrote:
> The protocol specification is complete, has not had any additions in
years.
Looking at the specification, some comments:
General:
* It would be nice if the specification web page was available as raw text or markdown.
It is, look in the kitty repo.
* You should define your expectation for C0/C1 control characters received during an APC sequence. I think most people are now familiar with [Williams' state machine](https://vt100.net/emu/dec_ansi_parser), where C0/C1 for ESCAPE, CSI, OSC, DCS, SOS, PM, and APC will always go to another state, and SOS/PM/APC can still see the other C0/C1 without acting on them, but you never know.
This has nothing to do with this spec. C0 and C1 escape codes inside APC
codes are invalid and cause the entire APC code to not be parsed.
* Is there a required codepage for this protocol? Unicode already defines encoding as beneath the VT100 emulation layer, and your error message text says ASCII, but there might be other reasons to insist on e.g. UTF-8.
Look carefully and you will see that it uses only a-zA-Z0-9=/, so no. And
pretty much all modern terminals/terminal applications use UTF-8.
* How does the application determine the terminal's support for images without actually trying to display an image and waiting for timeout or seeing if the cursor moved or not? Why not have a DA flag for this, same as sixel does?
This is addressed in detail in the spec.
> In order to know what size of images to display and how to position them, the client must be able to get the window size in pixels and the number of cells per row and column. This can be done by using the TIOCGWINSZ ioctl. ... CSI 14 t
Some other methods for row/column count: ioctl (or in my case 'stty size'), CSI 18 t, the underlying protocol window size option (e.g. telnet NAWS option, rlogin window size), or move the cursor to something very large and use DSR 6 (still used by a lot of things).
Huh? The spec explicitly mentions the ioctl and CSI t. I have no
interest in specifying any other methods. If a client application wants
to use other methods, it is free to do so, the spec does not care.
> <ESC>_G<control data>;<payload><ESC>\
What should the terminal do when control data/payload don't work, i.e. when some other application uses APC for its own needs that happens to start with 'G'? Silently ignore it?
Yes, this is what terminals are supposed to do with all malformed escape
codes in general. Has nothing to do with this spec.
> The terminal emulator must understand pixel data in three formats, 24-bit RGB, 32-bit RGBA and PNG.
PNG is not a pixel data format, it is a file format. Why limit to PNG? Why not JPG, TIFF, BMP, etc? For that matter, why not AVI or MP4?
No PNG is a pixel data format.
What should the terminal do if it cannot understand the pixel format provided (e.g. 24-bit RGB but not 32-bit RGBA)? Not display an image at all? Not reserve an image ID? Tell the application somehow that the image did not display?
If a terminal does not understand one of these extremely simple formats,
then it is not in spec compliance.
> Here the width and height are specified using the s and v keys respectively. Since f=24 there are three bytes per pixel and therefore the pixel data must be 3 * 10 * 20 = 600 bytes.
Could consider adding a link to the summary table here.
Where does the text cursor position end up after an image is successfully displayed? Does it move at all? Does it end up on the row below, or column to the right of the image? Should the screen scroll if the image is too tall? What if the image is wider than the screen?
The cursor position is not well defined, so applications should not rely
on it. They can simply reposition it wherever they like. And yes, screen
will scroll if image is too tall. If the image is wider than the screen
behavior is again undefined, so well designed applications which already
know the screen size, should not rely on it.
> Currently, only zlib based deflate compression is supported, which is specified using o=z.
Which deflate-based program are you referring to? Can I just run the pixel data through gzip?
The one from zlib or if you like RFC 1951 which anyway defers to zlib.
> Transmission Medium
> t | A temporary file, the terminal emulator will delete the file after reading the pixel data. For security reasons the terminal emulator should only delete the file if it is in a known temporary directory, such as /tmp, /dev/shm, TMPDIR env var if present and any platform specific temporary directories.
How does the temporary file get there in the first place? The application puts it there, then sends the sequence to display it? This seems a security risk: it wouldn't be hard to fool the terminal into deleting the wrong file via clever use of symlinks and relative paths inside a tmp directory.
This is addressed in the spec, the terminal is only allowed to delete
files inside well know temp directories. And obv it will realpath()
before deleting things. If it does not do that, that is a bug in its
implementation.
> s | A POSIX shared memory object. The terminal emulator will delete it after reading the pixel data
What about Windows users? They have a different shared memory model. Should a Windows-based terminal silently ignore this image, or respond somehow that it could not display it?
No windows supports named shared memory as well. However, I have not
looked into it, but since an application can only use shared memory if
running on the same machine as the terminal, an application running on
windows can simply use files.
> This tells the terminal emulator to read 80 bytes starting from the offset 10 inside the specified shared memory buffer.
This is screaming "security hole."
To you maybe. When I write software that receives data that specifies
that a sub-region of a buffer needs to be processed, I bounds check the
sub-region.
> Since escape codes are of limited maximum length, the data will need to be chunked up for transfer.
Why this particular maximum length? According to [Dickey](https://unix.stackexchange.com/questions/264937/whats-the-maximum-length-for-a-multibyte-escape-sequence): "string parameters (such as setting the title on a window) do not have a predefined limit on their length."
The idea of infinite length escape sequences is absurd, thomas dickey
notwithstanding. Anybody that has ever written an escape code parser
will know that. Infinite length escape codes actually do scream security
holes, unlike offsets into buffers.
> The client then sends the graphics escape code as usual, with the addition of an m key that must have the value 1 for all but the last chunk, where it must be 0.
After the chunked data has been received and reassembled, if the resulting image data is not a valid PNG (or future file format) or RGB/RGBA, what should the terminal do? Silently ignore the whole thing? Display a partial image, since it knows how many rows/columns the image was supposed to be shown in?
It returns an error, as specified in the protocol.
Since each chunk is a whole APC sequence, what should happen when printable characters or other VT100/Xterm sequences come in between chunks? Should the image be displayed where the cursor was at the first chunk received, or the final chunk? Or should anything that comes in between chunks cause this image to be treated as corrupt/discarded?
No image is displayed until the sequence is complete. It is up to the
terminal emulator to implement whatever policy it likes on how long to
wait for partial sequences to be completed.
> to which the terminal emulator will reply (after trying to load the data):
> <ESC>_Gi=31;error message or OK<ESC>\
What about localization? Is the error message expected to be in English only? What kind of scrubbing/sanitation must be performed on the error message (ASCII includes C0 control characters afterall)?
The OK message is the only defined thing. Anything else is an ERROR. I
am not going to list all the various possible errors in the spec.
Terminal developers can be as helpful or not with their error messages.
> or if you are sending a dummy image and do not want it stored by the terminal emulator
What is the difference between a dummy image and a not-dummy image?
A name.
> then display it with a=p,i=10 which will display the previously transmitted image at the current cursor position.
Repeating the question earlier: where does the cursor go after the image is displayed?
Why bother drawing at the actual cursor at all? Add keys to pick the text row/column, and say that the cursor does not move: then there would be no further ambiguity.
You can say that the cursor does not move in either case, do not have to
use extra keys for it. Again, not defined, applications should not rely
on any particular cursor position.
> Note that the offsets must be smaller that the size of the cell.
What happens if they are not? Is the image not displayed at all? Why not allow negative offsets (imagine an application mouse-dragging a window pixel-by-pixel all over the screen)?
A negative offset is the same as positive offset in a prev cell. And if
they are not, the terminal is free to do whatever it wants, once again,
applications sending invalid things can have no expectations with regard
to the result.
> The image will be scaled (enlarged/shrunk) as needed to fit the specified area.
There are three scaling possibilities: stretch to fix X only and stretch/shrink Y, stretch to fix Y only and stretch/shrink X, and stretch/shrink both. How does the application select which option it wants?
By resizing the image itself before sending it.
> You can specify z-index values using the z key. Negative z-index values mean that the images will be drawn under the text. This allows rendering of text on top of images.
What if the application wants the text background color to cover the image (no blending, no overlay: text is text)? Does it have to display a background-color image over that cell first, then text?
yes.
If z is defined for images, why not text too? Add a sequence to select the text z order and current alpha blending value, then write text under or over existing.
because that is needless complication.
> The uppercase variants will delete the image data as well, provided that the image is not referenced elsewhere, such as in the scrollback buffer.
How should the terminal tell the application that the image data was not actually deleted because it was used in the scrollback buffer?
Why does the application care? If it wants to explicitly manage images
it should use ids and delete using those ids.
> When adding a new image, if the total size exceeds the quota, the terminal emulator should delete older images to make space for the new one.
How should the terminal tell the application that an image somewhere was deleted because it exceeded the quota? Someday we will all be using retina displays where 320MB will feel quaint.
It should not. There is no guarantee than an application currently
running is the application that originally sent the image. If an
application wants to guarantee an image is displayed then it should use
the querying facilities for that purpose. And obv if 320MB is
insufficient, terminal emulators are free to raise that limit. It is a
floor not a ceiling.
> The other commands to erase text must have no effect on graphics. The dedicated delete graphics commands must be used for those.
Going back to the question on text completely covering images: this spec requires that covering an image with text means either making numerous calls to display pieces of an image around where the text will go, or having snippets of text background color as images to cover up the cells that the text will draw on. Not a problem for Jexer -- it already displays images as strips of text-cell-sized image pieces -- but could be a headache for less advanced systems.
What? If you want to place text on top of an image you simply send the
image with a negative z-index, that is all. If for some odd reason you
also want a block of solid color on which to write the text, you send
that block with a higher but still negative z-index. In actual fact, you
would use a semi-transparent block for best results, since opaque blocks
covering images look fairly ugly.
> When switching from the main screen to the alternate screen buffer (1049 private mode) all images in the alternate screen must be cleared, just as all text is cleared.
Does clearing them on screen also delete them?
See below.
> The clear screen escape code (usually <ESC>[2J) should also clear all images. This is so that the clear command works.
I don't see why clear screen is more important than erase line.
Because images can span more than a single line, commonly, while
spanning more than a single screen is rare. Besides which *users* can
trigger screen clears easily via the clear command. It would be
extremely surprising if clear cleared text but not images.
> Interaction with other terminal actions
What about:
* Reverse video (DECSCNM)? Should images be reverse-video too?
No.
* Double-width / double-height lines (DECWL, DECDWL, DECHDL). Should images also stretch like the text above them?
No.
* Copy-paste: should the terminal be able to copy images to the system clipboard? Should the application be able to request whatever is on the system clipboard to be displayed on screen if it is an image?
No.
* Image memory management: How can an application find out which image IDs are in use? And how much memory they consume?
It cannot. It simply uses its own ids. If they conflict with ids from a
previous application, they will overwrite.
Summary thoughts:
* This spec seems good for displaying thumbnails and supporting tiling window managers. The inability for text to fully cover images will make it less convenient for cascading / floating window managers. But the ability to draw offsets from the same image might make up for that inconvenience.
I have no idea what window management has to do with this spec?
* The focus here seems to be general-purpose mapping of rectangular text cell regions to 2-dimensional pixel-based file data. PNG and RGB/RGBA feel like too little gain for this, it should be both expanded to include animations (meaning time offsets and looping), and not married to PNG format (use something like generic mime-type discussed at https://gitlab.freedesktop.org/terminal-wg/specifications/issues/12 ).
One of the core goals of this spec, as mentioned in its motivations is
not not force terminals to support an arbitrary and ever growing set of
image formats. That is not going to change. And animations can be driven
purely by the application, there is no need to have the terminal support
it specially.
* There is not enough consideration for failure modes. Sixel is pretty simple: if it is malformed, nothing is displayed and the cursor does not move, otherwise the text cursor moves with the sixel "print head"; if artifacts are left on screen, so be it, the application has to figure it out. Through multiple sections the cursor is mentioned, but where the cursor ends up at the end of an image render is not defined.
The situation is much better here. If the image data/escape codes is invalid,
nothing is displayed and the cursor does not move. There is no need to
worry about artifacts or partial images or absurdities like deleting
unrelated text distorting displayed images.
… --
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#33 (comment)
--
_____________________________________
Dr. Kovid Goyal
https://www.kovidgoyal.net
https://calibre-ebook.com
_____________________________________
|
OK, I will. Is there more to the spec than the web page? You say in several other responses "it is in the spec", but I am not seeing it in the web page.
Looking at the spec, we are both incorrect. C0 and C1 are acceptable inside APC codes.:
Same with PM and OSC. I am wrong in that C1 control codes are processed within APC/PM/OSC. You are wrong in that C0 (except SUB and ST) are valid data inside APC.
OK.
I did not find this spelled out in the web page, but I think I see what you are getting at now. More below.
Fine.
OK.
If you want to assert that all of this just means pixels, more power to you.
OK.
Regarding scrolling: you define the need to clip to scroll regions, why not also define the need to clip to screen size? Also one inconvenience with sixel support in xterm is that sixel images drawn on the bottom row causes xterm to scroll (even when the "print head" does not require it). Does this spec expect the same behavior?
What happens if the file in /tmp is not an image? Should the terminal's behavior be: don't display it, report error, and delete it anyway?
But what should the Windows-based terminal do right now if they are to be deemed "in compliance"? They don't have POSIX shared memory objects, so obviously cannot display the image. If they notify the application that the image failed to display, are they in compliance?
Good to know that your secure coding skills exceed those of most major vendors. But let's say that us less skilled developers decide not to support shared memory buffers because we are unsure our skills are up to it. Will a terminal that refuses to support shared memory buffers be deemed in compliance with the spec or not?
A limit clearly must exist. The question is why 4096? Did you do some kind of tests to come to this value?
The only error reporting I see in the web page is in the "Detecting available transmission mediums" section and "Display images on screen". These two error reports are a) different in structure, and b) not clear if they are intended to be general-purpose and apply to the rest of the protocol actions.
I guess I didn't make my question clear. If an "in compliance" terminal receives the following:
Will the image be displayed? If not, what is the kind of error to report to the application ("incomplete image", "out of sequence", "bad data", ...)? If yes, where on screen will it be?
That's fine, but since error messages have no structure or defined numeric codes to refer to it will be harder for applications to perform graceful fallback. Is graceful fallback something you want this standard to support? I could see applications trying several different strategies depending on the error they might get: if an image exceeds memory, try to downsample it or use a smaller amount of screen space; if shared memory won't work, try local disk, and if that fails resort to chunking the image data.
So do you really mean something like this at this part of the spec? "An application can determine image support from the terminal by attempting to load, but not store or display, an image using one of the transmission mediums defined above. To do this, it can use the query action, set a=q. Then the terminal emulator will try to load the image and respond with either OK or an error. If OK, then the application can assume that images of this format and medium are both storable and displayable. If error, the application can try different mediums until one comes back OK." If so, can 'a=q' be used to query the other capabilities of the terminal?
If applications can't deterministically know where the cursor position should be, then libraries such as ncurses might never be able to support this protocol. Dickey waited for a de facto standard before doing it, yet here we are now at ncurses 6.1 supporting 24-bit RGB. If this spec could be ironed out then it could be pretty cool for ncurses 8.x or whatever to be able to do images, and even optimize transparently to the application for local/remote.
Up top you say "invalid sequences should be ignored", yet here you say "invalid sequences can put whatever they want on the screen, or nothing". Which one do you want?
If the application has to scale the image, why does the spec say the terminal will scale it? Conversely, if the terminal is scaling the image, which (or both) axis is getting altered?
OK.
Seems a bit arbitrary where the "needless complication" line is, but OK.
It doesn't matter to you (the terminal) why the application cares. You have an error condition, with no defined error action. If the application requests the terminal do something, and the terminal doesn't do, shouldn't the application have some way of knowing that it needs to try something else?
So the terminal manages memory, applications will have no notification of memory actions, if they step on each they won't know. But inside each application's screen they can get some guarantee that their bitmaps are or are not on the screen. OK.
What I'm getting at is this and this: text window borders overlapping images. (I am unaware of anyone pulling off the trick before Jexer, but would be excited to be corrected.) I am hoping that new multiplexers come up this capability in the future.
But why only for a full screen clear? Why can't a program like tmux/screen clear say only the top half of a screen, clipping the image, because the bottom half is another terminal? Right now their only means of doing that is either drawing a bunch more background color images on top of what they need, or clearing/redrawing the bits of the image that should remain visible. Also: If the screen is cleared, are the images that were on screen deleted in the ID cache too (i.e the application has to reload them to display them again)?
Do you envision this spec being used by multiple different applications on the same physical screen, or is it targeted to applications that can assume they have the entire physical screen? Is it supposed to fit into the same ecosystem as tilix, terminator, and tmux/screen? Example: for tmux/screen, sixel is currently hacked by way of bypassing the window manager and sending the DCS sequence directly to the host terminal; if one had a tmux session with multiple terminals on screen, and ran neofetch or icat or something, and they all used ID 1, ID 2, etc. then they will step on each other because all of those APC sequences would be routed to the same shared terminal. If an application could guarantee they had unique IDs, then all of these terminals could look good. But if this is not a use case you care about, then OK.
Again, this. Terminal window managers like tmux/screen are often asked to support images because people like 'lsix' and such, but they haven't really jumped on it because breaking up images into text cell pieces and putting them back together again is hard, niche, and there aren't enough users asking for it. If this spec makes it easy to chop up images from multiple terminal windows and put them back onto a text-based tiling WM, it might make inroads that sixel hasn't gotten to. Up to you if that is something you are interested in.
OK.
I suppose animations should wait for a different spec. The CPU hit from taking each frame of a mpg, converting to PNG, putting in shared memory, and asking the terminal to display it; vs putting the avi/mp4 in shared memory and asking the terminal to animate it will be very different. But it is premature to optimize for a use case with no users yet.
OK, thank you for clarifying what happens on the error case (no cursor movement).
For a terminal multiplexer, what the terminal sees has little to do with the underlying images. Being able to treat cells as "parts of the image" and shuffle them around is very useful. Should ncurses (or termbox?) ever get image support, I think it likely that they would optimize the image cell updates just like the text cells. Things like: move down 3 rows, over 2, display 7 cells of image data, up 2 rows, 2 cells of text data, etc. So they would be overwriting image cells very frequently (which is hard to do nicely in this spec), and redisplaying small pieces of the image (which is a nice part of this spec). |
On Tue, Aug 27, 2019 at 09:29:18AM -0700, Kevin Lamonte wrote:
> On Tue, Aug 27, 2019 at 05:46:20AM -0700, Kevin Lamonte wrote: > The protocol specification is complete, has not had any additions in years. Looking at the specification, some comments: General: * It would be nice if the specification web page was available as raw text or markdown.
> It is, look in the kitty repo.
OK, I will. Is there more to the spec than the web page? You say in several other responses "it is in the spec", but I am not seeing it in the web page.
The web page is auto-generated from source.
> * You should define your expectation for C0/C1 control characters received during an APC sequence. I think most people are now familiar with [Williams' state machine](https://vt100.net/emu/dec_ansi_parser), where C0/C1 for ESCAPE, CSI, OSC, DCS, SOS, PM, and APC will always go to another state, and SOS/PM/APC can still see the other C0/C1 without acting on them, but you never know.
> This has nothing to do with this spec. C0 and C1 escape codes inside APC codes are invalid and cause the entire APC code to not be parsed.
Looking at the spec, we are both incorrect. [C0 and C1 are acceptable inside APC codes.](https://vt100.net/docs/vt510-rm/chapter4.html#S4.2):
Regardless, how to parse APC codes is not in scope for this spec.
> > The terminal emulator must understand pixel data in three formats, 24-bit RGB, 32-bit RGBA and PNG. PNG is not a pixel data format, it is a file format. Why limit to PNG? Why not JPG, TIFF, BMP, etc? For that matter, why not AVI or MP4?
> No PNG is a pixel data format.
If you want to assert that all of [this](https://www.w3.org/TR/PNG/) just means pixels, more power to you.
I dont see how a spec for losslessly storing pixel data, no matter how
complex it is, becomes not a spec for storing pixel data. And in case
you dont realize it, the reasons this spec mandates PNG is mainly for an
efficient format for indexed image data, instead of implementing our
own.
> What should the terminal do if it cannot understand the pixel format provided (e.g. 24-bit RGB but not 32-bit RGBA)? Not display an image at all? Not reserve an image ID? Tell the application somehow that the image did not display?
> If a terminal does not understand one of these extremely simple formats, then it is not in spec compliance.
OK.
> > Here the width and height are specified using the s and v keys respectively. Since f=24 there are three bytes per pixel and therefore the pixel data must be 3 * 10 * 20 = 600 bytes. Could consider adding a link to the summary table here. Where does the text cursor position end up after an image is successfully displayed? Does it move at all? Does it end up on the row below, or column to the right of the image? Should the screen scroll if the image is too tall? What if the image is wider than the screen?
> The cursor position is not well defined, so applications should not rely on it. They can simply reposition it wherever they like. And yes, screen will scroll if image is too tall. If the image is wider than the screen behavior is again undefined, so well designed applications which already know the screen size, should not rely on it.
Regarding scrolling: you define the need to clip to scroll regions, why not also define the need to clip to screen size?
I dont really see a need for it. Displaying images wider than the screen
in a terminal is never going to have good results since terminals dont
scroll horizontally. Applications should simply not do that. However, I
am open to amending the spec to mandate clipping in such a case.
Also one inconvenience with sixel support in xterm is that sixel images drawn on the bottom row causes xterm to scroll (even when the "print head" does not require it). Does this spec expect the same behavior?
No, scrolling should happen only if actually required, to display a new
row of the image.
> This is addressed in the spec, the terminal is only allowed to delete files inside well know temp directories. And obv it will realpath() before deleting things. If it does not do that, that is a bug in its implementation.
What happens if the file in /tmp is not an image? Should the terminal's behavior be: don't display it, report error, and delete it anyway?
If the terminal is unable to rad an image from the file it should not
delete it, although I dont think the spec mandates this. I am fine with
adding that requirement to it.
> > s | A POSIX shared memory object. The terminal emulator will delete it after reading the pixel data What about Windows users? They have a different shared memory model. Should a Windows-based terminal silently ignore this image, or respond somehow that it could not display it?
> No windows supports named shared memory as well. However, I have not looked into it, but since an application can only use shared memory if running on the same machine as the terminal, an application running on windows can simply use files.
But what should the Windows-based terminal do right now if they are to be deemed "in compliance"? They don't have POSIX shared memory objects, so obviously cannot display the image. If they notify the application that the image failed to display, are they in compliance?
An application running on windows that requests to use POSIX shared
memory objects is not in compliance with the spec, so the terminal
emulator can do whatever it pleases.
> > This tells the terminal emulator to read 80 bytes starting from the offset 10 inside the specified shared memory buffer. This is screaming "security hole."
> To you maybe. When I write software that receives data that specifies that a sub-region of a buffer needs to be processed, I bounds check the sub-region.
Good to know that your secure coding skills exceed those of most major vendors.
You really want to claim that reading a sub-region from a buffer is
beyond the skillset of any halfway competent person? How then is such a
person going to write the code to parse escape codes? Or implement the
rest of the terminals logic?
But let's say that us less skilled developers decide not to support shared memory buffers because we are unsure our skills are up to it. Will a terminal that refuses to support shared memory buffers be deemed in compliance with the spec or not?
No. However, a well written application will always fallback to files,
and direct streaming of data, since there are no guarantees an application
will run on the same machine as the terminal emulator and so have access
to shared memory.
> > Since escape codes are of limited maximum length, the data will need to be chunked up for transfer. Why this particular maximum length? According to [Dickey](https://unix.stackexchange.com/questions/264937/whats-the-maximum-length-for-a-multibyte-escape-sequence): "string parameters (such as setting the title on a window) do not have a predefined limit on their length."
> The idea of infinite length escape sequences is absurd, thomas dickey notwithstanding. Anybody that has ever written an escape code parser will know that. Infinite length escape codes actually do scream security holes, unlike offsets into buffers.
A limit clearly must exist. The question is why 4096? Did you do some kind of tests to come to this value?
Why not? Do you have some reason to believe 4096 is not suitable. If you
do I am happy to bikeshed on a better limit. My choice comes from
writing an escape code parser that does not consume excessive amounts of
RAM and that does not allocate memory just to parse escape codes.
> > The client then sends the graphics escape code as usual, with the addition of an m key that must have the value 1 for all but the last chunk, where it must be 0. After the chunked data has been received and reassembled, if the resulting image data is not a valid PNG (or future file format) or RGB/RGBA, what should the terminal do? Silently ignore the whole thing? Display a partial image, since it knows how many rows/columns the image was supposed to be shown in?
> It returns an error, as specified in the protocol.
The only error reporting I see in the web page is in the "Detecting available transmission mediums" section and "Display images on screen". These two error reports are a) different in structure, and b) not clear if they are intended to be general-purpose and apply to the rest of the protocol actions.
The spec states:
Since a client has no a-priori knowledge of whether it shares a filesystem/shared memory
with the terminal emulator, it can send an id with the control data, using the ``i`` key
(which can be an arbitrary positive integer up to 4294967295, it must not be zero).
If it does so, the terminal emulator will reply after trying to load the image, saying
whether loading was successful or not.
This applies to chunked images as well. After the sequence is complete,
as long as the client specifies an id, the terminal must reply with
either OK or an error if loading pixel data fails.
> Since each chunk is a whole APC sequence, what should happen when printable characters or other VT100/Xterm sequences come in between chunks? Should the image be displayed where the cursor was at the first chunk received, or the final chunk? Or should anything that comes in between chunks cause this image to be treated as corrupt/discarded?
> No image is displayed until the sequence is complete. It is up to the terminal emulator to implement whatever policy it likes on how long to wait for partial sequences to be completed.
I guess I didn't make my question clear. If an "in compliance" terminal receives the following:
1. CUP(0, 0).
2. Chunk 1.
3. Chunk 2.
4. CUP(10, 10).
5. Chunk 3, and final.
Will the image be displayed? If not, what is the kind of error to report to the application ("incomplete image", "out of sequence", "bad data", ...)? If yes, where on screen will it be?
Images are typically transmitted and displayed using separate escape
codes. The display portion not supporting chunking. The exception is
a=T mode. In this mode the relevant cursor position is the position when
the sequence is complete and validated. I have added a relevant section
to the spec for it.
> > to which the terminal emulator will reply (after trying to load the data): > <ESC>_Gi=31;error message or OK<ESC>\ What about localization? Is the error message expected to be in English only? What kind of scrubbing/sanitation must be performed on the error message (ASCII includes C0 control characters afterall)?
> The OK message is the only defined thing. Anything else is an ERROR. I am not going to list all the various possible errors in the spec. Terminal developers can be as helpful or not with their error messages.
That's fine, but since error messages have no structure or defined numeric codes to refer to it will be harder for applications to perform graceful fallback. Is graceful fallback something you want this standard to support? I could see applications trying several different strategies depending on the error they might get: if an image exceeds memory, try to downsample it or use a smaller amount of screen space; if shared memory won't work, try local disk, and if that fails resort to chunking the image data.
I am happy to add some structure for this, although I dont see any out
of memory conditions being possible unless the terminal emulator keeps
less memory available than a screenful of pixels or the application
tries to display an image that is much larger than a screen. Neither of
which are particularly likely or meaningful.
> > or if you are sending a dummy image and do not want it stored by the terminal emulator What is the difference between a dummy image and a not-dummy image?
> A name.
So do you really mean something like this at this part of the spec?
"An application can determine image support from the terminal by attempting to load, but not store or display, an image using one of the transmission mediums defined above. To do this, it can use the query action, set a=q. Then the terminal emulator will try to load the image and respond with either OK or an error. If OK, then the application can assume that images of this format and medium are both storable and displayable. If error, the application can try different mediums until one comes back OK."
If so, can 'a=q' be used to query the other capabilities of the terminal?
What other capabilities?
> > then display it with a=p,i=10 which will display the previously transmitted image at the current cursor position. Repeating the question earlier: where does the cursor go after the image is displayed? Why bother drawing at the actual cursor at all? Add keys to pick the text row/column, and say that the cursor does not move: then there would be no further ambiguity.
> You can say that the cursor does not move in either case, do not have to use extra keys for it. Again, not defined, applications should not rely on any particular cursor position.
If applications can't deterministically know where the cursor position should be, then libraries such as ncurses might never be able to support this protocol. Dickey waited for a de facto standard before doing it, yet here we are now at ncurses 6.1 supporting 24-bit RGB. If this spec could be ironed out then it could be pretty cool for ncurses 8.x or whatever to be able to do images, and even optimize transparently to the application for local/remote.
I am fine with defining this behavior, I dont actually see a point to it,
but if other people do, it is fine by me. Feel free to open a separate
issue to discuss it.
I dont really understand what is preventing ncurses from issuing say a
cursor store and pop operation around a graphics code to ensure the
cursor does not move, but whatever.
> > Note that the offsets must be smaller that the size of the cell. What happens if they are not? Is the image not displayed at all? Why not allow negative offsets (imagine an application mouse-dragging a window pixel-by-pixel all over the screen)?
> A negative offset is the same as positive offset in a prev cell. And if they are not, the terminal is free to do whatever it wants, once again, applications sending invalid things can have no expectations with regard to the result.
Up top you say "invalid sequences should be ignored", yet here you say "invalid sequences can put whatever they want on the screen, or nothing". Which one do you want?
The two situations are entirely different. One is about invalid pixel
data. The other is about invalid positioning data. One will cause the
image to not be loaded/shown/an error code returned, the other is
undefined and terminal implementations can do whatever is most suitable
for them.
> > The image will be scaled (enlarged/shrunk) as needed to fit the specified area. There are three scaling possibilities: stretch to fix X only and stretch/shrink Y, stretch to fix Y only and stretch/shrink X, and stretch/shrink both. How does the application select which option it wants?
> By resizing the image itself before sending it.
If the application has to scale the image, why does the spec say the terminal will scale it? Conversely, if the terminal is scaling the image, which (or both) axis is getting altered?
> > You can specify z-index values using the z key. Negative z-index values mean that the images will be drawn under the text. This allows rendering of text on top of images. What if the application wants the text background color to cover the image (no blending, no overlay: text is text)? Does it have to display a background-color image over that cell first, then text?
> yes.
OK.
> If z is defined for images, why not text too? Add a sequence to select the text z order and current alpha blending value, then write text under or over existing.
> because that is needless complication.
Seems a bit arbitrary where the "needless complication" line is, but OK.
> > The uppercase variants will delete the image data as well, provided that the image is not referenced elsewhere, such as in the scrollback buffer. How should the terminal tell the application that the image data was not actually deleted because it was used in the scrollback buffer?
> Why does the application care? If it wants to explicitly manage images it should use ids and delete using those ids.
It doesn't matter to you (the terminal) why the application cares. You have an error condition, with no defined error action. If the application requests the terminal do something, and the terminal doesn't do, shouldn't the application have some way of knowing that it needs to try something else?
If the application cares it should send ids and use the query actions.
If it does not, the terminal manages it for the application.
> What? If you want to place text on top of an image you simply send the image with a negative z-index, that is all. If for some odd reason you also want a block of solid color on which to write the text, you send that block with a higher but still negative z-index. In actual fact, you would use a semi-transparent block for best results, since opaque blocks covering images look fairly ugly.
What I'm getting at is [this](https://jexer.sourceforge.io/screenshots/jexer_sixel_in_sixel.png) and [this](https://gitlab.com/klamonte/jexer/raw/master/screenshots/new_demo1.png?raw=true): text window borders overlapping images. (I am unaware of anyone pulling off the trick before Jexer, but would be excited to be corrected.) I am hoping that new multiplexers come up this capability in the future.
Sorry I still dont understand why that cannot be implemented by drawing
a block of color on top of the image. And with kitty's protocol you can
get much better results since you window borders can be
translucent/arbitrarily fancy.
> > The clear screen escape code (usually <ESC>[2J) should also clear all images. This is so that the clear command works. I don't see why clear screen is more important than erase line.
> Because images can span more than a single line, commonly, while spanning more than a single screen is rare. Besides which *users* can trigger screen clears easily via the clear command. It would be extremely surprising if clear cleared text but not images.
But why only for a full screen clear? Why can't a program like tmux/screen clear say only the top half of a screen, clipping the image, because the bottom half is another terminal? Right now their only means of doing that is either drawing a bunch more background color images on top of what they need, or clearing/redrawing the bits of the image that should remain visible.
Sorry I dont understand what you are asking for. If tmux has already
split the terminal into multiple panes then it simply rewrites the image
display escape codes accordingly to display only a cut off/sub region of
the image. This incidentally is one of the reasons it is necessary to
have support for subregions, despite the apparent extreme difficulty of
implementing it securely.
If it wants to create a new window, it means that the old window has to
be resized, at which point things will scroll and it is best off just
deleting all images and asking the application to redraw. There is a
reason tmux and its ilk are horrible hacks.
Also: If the screen is cleared, are the images that were on screen deleted in the ID cache too (i.e the application has to reload them to display them again)?
> * Image memory management: How can an application find out which image IDs are in use? And how much memory they consume?
> It cannot. It simply uses its own ids. If they conflict with ids from a previous application, they will overwrite.
Do you envision this spec being used by multiple different applications on the same physical screen, or is it targeted to applications that can assume they have the entire physical screen? Is it supposed to fit into the same ecosystem as tilix, terminator, and tmux/screen?
At a time only a single application can control a given tty, having
multiple applications *simultaneously* write to the same tty is chaos
and will not work for pretty much anything, even printing out simple
text.
Example: for tmux/screen, sixel is currently hacked by way of bypassing the window manager and sending the DCS sequence directly to the host terminal; if one had a tmux session with multiple terminals on screen, and ran neofetch or icat or something, and they all used ID 1, ID 2, etc. then they will step on each other because all of those APC sequences would be routed to the same shared terminal. If an application could guarantee they had unique IDs, then all of these terminals could look good. But if this is not a use case you care about, then OK.
IMO terminal multiplexers are horrible hacks, however coming to the
question of ID management for them, what they will have to do is rewrite
ids bi-directionally. So if application A in window 1 sends id 1 and
application B in window 2 also sends id 1 the multiplxer will rewrite
both ids using an internal id map and send unique ids to the terminal
emulatr. Similarly it will need to remap the ids in the responses from
the terminal emulator.
> Summary thoughts: * This spec seems good for displaying thumbnails and supporting tiling window managers. The inability for text to fully cover images will make it less convenient for cascading / floating window managers. But the ability to draw offsets from the same image might make up for that inconvenience.
> I have no idea what window management has to do with this spec?
Again, [this](https://gitlab.com/klamonte/jexer/raw/master/screenshots/new_demo1.png?raw=true). Terminal window managers like tmux/screen are often asked to support images because people like 'lsix' and such, but they haven't really jumped on it because breaking up images into text cell pieces and putting them back together again is hard, niche, and there aren't enough users asking for it. If this spec makes it easy to chop up images from multiple terminal windows and put them back onto a text-based tiling WM, it might make inroads that sixel hasn't gotten to.
Up to you if that is something you are interested in.
Not particularly, but if there is some reasonable addition to the spec
that makes this use case easier, I am open to discussing it.
> And animations can be driven purely by the application, there is no need to have the terminal support it specially.
I suppose animations should wait for a different spec. The CPU hit from taking each frame of a mpg, converting to PNG, putting in shared memory, and asking the terminal to display it; vs putting the avi/mp4 in shared memory and asking the terminal to animate it will be very different. But it is premature to optimize for a use case with no users yet.
Note that for video I envisage decoding to shared memory in RGB format
not PNG. Which a video player would do anyway. But yes, in general
video is a more complex problem that this spec does not really optimize
for.
> * There is not enough consideration for failure modes. Sixel is pretty simple: if it is malformed, nothing is displayed and the cursor does not move, otherwise the text cursor moves with the sixel "print head"; if artifacts are left on screen, so be it, the application has to figure it out. Through multiple sections the cursor is mentioned, but where the cursor ends up at the end of an image render is not defined.
> The situation is much better here. If the image data/escape codes is invalid, nothing is displayed and the cursor does not move.
OK, thank you for clarifying what happens on the error case (no cursor movement).
> There is no need to worry about artifacts or partial images or absurdities like deleting unrelated text distorting displayed images.
For a terminal multiplexer, what the terminal sees has little to do with the underlying images. Being able to treat cells as "parts of the image" and shuffle them around is very useful. Should ncurses (or termbox?) ever get image support, I think it likely that they would optimize the image cell updates just like the text cells. Things like: move down 3 rows, over 2, display 7 cells of image data, up 2 rows, 2 cells of text data, etc. So they would be overwriting image cells very frequently (which is hard to do nicely in this spec), and redisplaying small pieces of the image (which is a nice part of this spec).
I'm not sure I understand what you mean. An example would be helpful,
and as I said, if there is some image operation that is reasonable to
implement and helps with this use case, I am open to discussing it.
…--
_____________________________________
Dr. Kovid Goyal
https://www.kovidgoyal.net
https://calibre-ebook.com
_____________________________________
|
Oh and regarding scaling, if you specify an area for display and the image does not fit in it, the terminal emulator will scale the image to make it fit. There is no control over how the scaling is done, if the application needs this control, it should scale the image itself. |
I think our conversation has reached its end then. |
And when I get around to implementing #391 an obsolete as well as horrible hack |
to help reduce the likelihood of this happening, i start at a random ID each time my library is initialized. any attempt to discover what other IDs are in use seems fundamentally racy (in the same way that it's difficult for non-cooperating threads to discover what file descriptors are in use), and with a 24-bit space, collisions are relatively rare (assuming a reasonably-seeded PRNG) outside of pathological cases. IMHO, an ideal solution would be the ability so supply |
On Thu, May 13, 2021 at 08:23:31PM -0700, Nick Black wrote:
to help reduce the likelihood of this happening, i [start at a random ID](https://github.com/dankamongmen/notcurses/blob/master/src/lib/sprite.c) each time my library is initialized. any attempt to discover what other IDs are in use seems fundamentally racy (in the same way that it's difficult for non-cooperating threads to discover what file descriptors are in use), and with a 24-bit space, collisions are relatively rare (assuming a reasonably-seeded PRNG) outside of pathological cases. IMHO, an ideal solution would be the ability so supply `i=-1`, at which point kitty would assign you an id (and return that information). practically, i'm not sure i would bother using such functionality, due to my dislike of terminal results and user input being multiplexed onto the same channel [shrug].
There is already such a facility, https://sw.kovidgoyal.net/kitty/graphics-protocol.html#requesting-image-ids-from-the-terminal
|
Thoughts on implementing support for raster graphics in kitty (and terminals more generally). Out of curiosity, I spent some time looking into the existing imaging solutions in terminals, I found:
My question is, why are we limiting ourselves to this image file display paradigm? Why not allow programs to render arbitrary pixel data in the terminal? The way I envision this working is:
An escape code that allows programs running in the terminal to query the terminal for the current character cell size in pixels (this is similar to how querying for cursor position works)
An escape code that allows the program running in the terminal to specify arbitrary pixel data to render at the current cursor position (in a single cell, think of it as sending a "graphical character" instead of text character). The pixel data can be binary for maximum efficiency (taking care to escape the C0 control codes for maximum robustness).
With these two primitives, programs will be able to draw arbitrary graphics (including image files) in terminals. This, to me, seems like a more general, and powerful, abstraction to build rather than just the ability to send image files in a few formats.
I am considering building this into kitty, so I thought, that before I do so, it would be good get some more opinions on the subject. Maybe get a little consensus going. Note that once this is built it is easy to support displaying image files on top of it, if needed.
A specification for this protocol is here: https://github.com/kovidgoyal/kitty/blob/gr/graphics-protocol.asciidoc
Progress on implementing the specification:
The text was updated successfully, but these errors were encountered: