Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image proposal #5055

Closed
1 task done
pngwn opened this issue Aug 1, 2023 · 6 comments · Fixed by #6169
Closed
1 task done

Image proposal #5055

pngwn opened this issue Aug 1, 2023 · 6 comments · Fixed by #6169
Assignees
Labels
enhancement New feature or request 🖼️ image Image component

Comments

@pngwn
Copy link
Member

pngwn commented Aug 1, 2023

  • I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
We have many issues with the current Image component, some of which are difficult to resolve.

Describe the solution you'd like
A new Image component.

This issue is a continuation and refinement of #466. More specifically it is an actual proposal for a new Image component. This first iteration will aim for parity with the current component while addressing a few core limitations and bugs.

The main aim here is to expand the capabilities of the Image component, provide a flexible API to app authors, provide a simple and engaging UX for end users, leave 'space' in both he API and GUI for expansion/ modification, all without making thing unnecessarily complex.

I will first list the feature list across both the python API and the GUI, then I'll drop down into the details.

Features fall roughly into one of three categories. Sources or Backgrounds, Transforms, Drawing.

Backgrounds

Many use cases require a background image which can either exist on its own or act as a base image to be modified with other tools.

Proposal

  1. Constrain background images to a single image. Adding a new background image will replace the old one. We can consider this a 'base' layer.
  2. Support the following sources:
  • Image upload
  • Capture webcam
  • Paste from clipboard
  • Set background colour (colorpicker)

Python API

A list of sources to allow in the form: bg_source=['upload', 'webcam', 'paste', 'color'].

Where options need to be set could we use a class instead?:

class ImageSources:
  def __init__(
    self, 
    upload: bool = True,
    webcam: bool = True,
    paste: bool = True,
    color: bool = True,
    color_mode: Literal['fixed', 'defaults'] = "defaults", 
    # fixed == only the swatches can be selected
    # defaults == swatches are shown but colorpicker is available
    colors: list[str] | list[tuple[int, int, int, int]] | str = ["red", "green", "blue"] 
    # support string colors (`"#000"`, `"#0000"`, `"black"`) or rgba (`(255, 255, 255, 1)`)
    # if a single color is provided then swatches are never shown and we jump straight to the colorpicker
    # if a single colour is provided and `color_mode` ==`"fixed"` then clicking the color icon will set the bg rather than openeing swatch panel / colorpicker
  ):
    ...
    

bg_sources = gr.imagetools.ImageSources(
  color_mode='fixed', 
  colors=[(255, 255, 255, 1), (255, 255, 255, 1), (255, 255, 255, 1)]
) 

gr.Image(bg_sources=bg_sources)

GUI

The GUI should display all allowed options.

  • Clicking the upload icon opens the file browser
  • Clicking the webcam icon opens the webcam, capture screen
  • Clicking the paste will simply paste the image in
  • Clicking the color icon will o different things based on options set in python:
    • if color_mode="fixed" and color_swatches is a single color, then set the bg to that color
    • if color_mode="fixed" show a swatch panel with the provided colors. Selecting a color sets the bg.
    • if color_mode="defaults" show a swatch panel with the provided colors and a colorpicker or + icon. Selecting a swatch colors sets the bg. Selecting the + icon opens a colorpicker. 'picking' a color sets the bg.

Questions

  • Should we default to supporting all bg sources?
  • Should bg_color be a set of swatches with a + icon that jump to a colour picker? (I'd like this personally, suggested API above)
    • Should those swatches be configurable?
    • Should we have a fixed mode where only those swatches are allowed and not arbitrary colours?

Transforms

After an image for sketch has been generated, it is sometime necessary to transform the image.

Proposal

Support the following transform tools:

  • crop
  • resize (?)
  • rotate

Python API

Similar to the background source I think these should be settable as a list of tools: transforms=['crop', 'resize', 'rotate'] with options available as a class similar to above. We could maybe put the current shape kwarg into this which makes more sense because it is a crop/resize really:

class TransformOptions:
  def __init__(
    self, 
    crop: bool = True,
    resize: bool = True,
    rotate: bool = True,
    crop_suggestions: list[str] | None = None # ['4:3', '16:9'] etc. useful? Might conflict with crop_size
    crop_size: tuple[int, int] | None = None
  ):
    ...
    
transform = gr.imagetools.TransformOptions(
  crop_size = (500, 500)
)

gr.Image(transform=transform)

GUI

The tools selected will be shown.

  • Selecting the crop icon will open a crop/zoom overlay.
  • I do not know what the resize would do, do we need this?
  • Selecting the rotate icon would open some kind of rotate wheel.
  • If crop_size has been set then the canvas and any background images will be resized to fit those constraints, users could modify the crop by selecting the crop tool which will be fixed to that ratio. The crop overlay could be moved and the image zoomed but the crop overlay would not be editable. This is basically 'shape' but with a GUI.

Drawing

Drawing tools are essential for freeform sketching and masking. We should allow more flexibility + constraints to the python developer when creating their image components.

Proposal

  1. We should support multiple 'layers' of drawing with some simple but easier to follow heuristics about when a new layer should be created. The core use case is applying multiple masks to a single image.
    Each mask color should become a new layer. Reselecting a color that has previously been used will add to that existing layer.
  2. Introduce an eraser tool. Erasing is constrained to the currently selected mask.
  3. Brush sizes and colours should be more configurable and have a better UX.

Python API

I think we should replace our existing brush_color and brush_radius options with something more flexible. Any options we have for brush, we probably need for eraser as well (other than color).

class Eraser:
  def __init__(
    self, 
    sizes: list[int] = [5, 10, 20, 40, 70],
    default_size: int = 20,
    size_mode: Literal["fixed" | "defaults"] = "defaults",
    antialias: bool = True
  ):
    ...
    
class Brush:
  def __init__(
    self, 
    sizes: list[int] = [5, 10, 20, 40, 70],
    default_size: int | None = None,
    size_mode: Literal["fixed" | "defaults"] = "defaults",
    colors: list[str] | list[tuple[int, int, int, int]] | str = True,
    default_color: str | tuple[int, int, int, int] | None = None
    color_mode:  Literal["fixed" | "defaults"] = "defaults",
    antialias: bool = True
  ):
    ...

GUI

Pretty much as before. The UI will only show what has been passed into the python API. Brushes + eraser work similarly to color swatched before.

  • Selecting the drawing tool will show two additional icons. Brush size and brush color.
  • Selecting brush color show swatches and a colorpicker depending on the options set.
  • Selecting the brush size icon will show preset sizes + a size picker depending on the options set. the 'size picker' would be a slider, similar to what we have today.
  • I think eraser should be its own tool separate from 'draw'. It would only have the brush size option from above.

Other things

  • fullscreen mode - this has been much requested and should be straight forward to implement. I imagine this would fill the screen and put the controls outside of the image, rather than on top of it. So a kind of modal.

Function signature

We have a few issues with the pre+postprocess signature for Image right now. The big one being how the shape changes depending on how the component is feeling on a given day. We also have users who want different things, some want separate layers, others want a composite image. I propose a fixed signature of this:

{
"bg": "image | color",
"layers": ["image", "image", "image"],
"composite": "image"
}

"image" is just whatever image formats we support/ want this to be. Color is a new thing I just invented because bgs can be simple.

  • bg this is the background image or color as its own image
  • layers these are the mask or sketch layers, this is a list because there could be many
  • composite this is all of the layers squished into a single image

Passing in masks

We can support passing in masks + bg images via the API (or returning them from an inference fn). I think we should support two ways of doing this. bg images should be settable to either an image or a color, just like the UI supports. Masks should be settable to either an image or 'mask data'.

I think the 'mask' data case is important because people may want to programmatically generate masks without generating an image. I propose a helper class for this:

class Line:
  def __init__(
    brush_size: int, # brush size is _per line_ so this is per line not per mask
    points: list[tuple[int, int]] # x, y values - could also be a `Point` class but seemed overkill
  ):
    ...
class Mask:
  def __init__(
    self,
    lines: list[Line], # this could be a Dict
    color: str | tuple[int, int, int, int] = "#000", # masks are bound to a single color so this is per mask rather than per line
  ):
    ...
    
# you would probably create this in a loop or create it from data.    
layers = [
  Mask(lines = [Line(...), Line(...), Line(...)], color = "red",)
]


gr.Image(value={"bg": "red", "layers": layers})
# or
gr.Image(layers=layers, bg="red")
# or something

lmk!

Layout

Few ideas around layouts below. This is what I was thinking in terms of keeping things simple + straightforward, but also (crucially) this kind of layout is pretty mobile friendly.

image layout

@pngwn
Copy link
Member Author

pngwn commented Aug 1, 2023

@pngwn pngwn self-assigned this Aug 1, 2023
@pngwn pngwn added enhancement New feature or request 🖼️ image Image component labels Aug 1, 2023
@abidlabs
Copy link
Member

abidlabs commented Aug 1, 2023

Great issue and mockups @pngwn! Sharing my thoughts on the various pieces of the proposal:

Backgrounds

Looks great. The paste icon will be very welcome!

Clicking the color icon will do different things based on options set in python

... all this seems kind of complicated, and I have not heard people specifically ask for a solid color as a background image before. Maybe we leave this out of v1?

The only part I'd push against is requiring a Python class to hold parameter options. I do see the point about providing additional flexibility down line, but I strongly think that we should make the common use cases as simple as possible for users. One way to do that would be to accept either a list of sources, or an instance of ImageSources. For the vast majority of users (and particularly for users who are doing simple stuff with gradio), providing a list of sources will be much easier.

Should we default to supporting all bg sources?

I think so, it could help people discover all the various sources.

Transforms

Again, I think it should be either a list of tools or an instance of the class. I personally find the parameter name tools more intuitive than transforms but okay with either.

I do not know what the resize would do, do we need this?

No I don't think so, we can just do crop and rotate.

We could maybe put the current shape kwarg into this which makes more sense because it is a crop/resize really:

I agree that the shape parameter is a bit confusing (particularly as we have a height and width as well). But shape feels quite different than the other tools. shape is set by the app developer to force a particular size and isn't a tool. Perhaps we can keep the name crop_size but keep it as a separate parameter from transforms. If an app developer sets a crop_size and enable the crop tool, then the crop tool which will be fixed to that ratio.

If crop_size has been set then the canvas and any background images will be resized to fit those constraints, users could modify the crop by selecting the crop tool which will be fixed to that ratio. The crop overlay could be moved and the image zoomed but the crop overlay would not be editable. This is basically 'shape' but with a GUI.

Exactly, very cool. Since this is such a common use case, I would again suggest bringing crop_size out as a separate parameter, so that a user doesn't have to instantiate a class just to get a fixed crop_size.

Drawing

Proposed changes look quite nice. I'm personally more okay to have Brush and Eraser be classes here because the default values of Brush and Eraser are probably okay for the majority of users.

Each mask color should become a new layer. Reselecting a color that has previously been used will add to that existing layer.

This seems like a very strong assumption. What if users want to have a single color but have different layers, or have many colors in a single layer? Why not just let users create new layers by clicking a button and then they can switch through the different layers as needed.

fullscreen mode - this has been much requested and should be straight forward to implement. I imagine this would fill the screen and put the controls outside of the image, rather than on top of it. So a kind of modal.

Nice

Function signature

Ok so I do like the proposed function signature, particularly as it allows easy access to both the individual layers as well as the composite image. However, this makes easy things (like image classification or image generation) much harder because now users have to deal with a dictionary inside their inference function. Which leads me to my main suggestion...


What if we separated Image into two classes: Image and ImageLayers?

The basic idea is that ImageLayers is what you've proposed above and Image is just like ImageLayers except without the ability to add multiple layers.

  • On the GUI side, this means that Image doesn't have the right 2 buttons for sketching / erasing.
  • On the function signature side, this means that Image doesn't return a dictionary of images, but rather a single image, which makes it very easy to work with in your inference function
  • The other advantage is that when people are creating custom components, they can clone Image instead of ImageLayers and have a much simpler codebase to work with.

Mockups

Look really nice!

@pngwn
Copy link
Member Author

pngwn commented Aug 1, 2023

Just to clarify on the use of classes. I was suggesting a list of strings (literals) or the class when a user needs to set options for any of the individual tools. Except for brush/ eraser just because there are so many possible options.

On bg colour, I don't feel very strongly but think it would easy to add and seems like something people might want to do. I've seen @hysts generate solid bg images before and this would be easier.

On the layers, I think the auto layer thing would be way nicer when drawing masks. For other usecases it isn't as valuable. Maybe we could have a flag for it.

I'd be fine with a separate image component with a simpler signature.

@abidlabs
Copy link
Member

abidlabs commented Aug 1, 2023

👍 looks like we're on the same page on all of these points

@gary149
Copy link
Contributor

gary149 commented Aug 30, 2023

This looks really cool. If there's no tool activated (which is the default I guess) are there changes from the current image component? I'm asking because it would be great to have some options to size the image component inside apps.

@pngwn
Copy link
Member Author

pngwn commented Oct 18, 2023

Sorry @gary149, I missed this. When you say 'size the image component inside apps', what do you mean exactly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request 🖼️ image Image component
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants