Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate merging attention's saliency features for "smart" crop #295

Closed
rightaway opened this issue Nov 10, 2015 · 43 comments
Closed

Investigate merging attention's saliency features for "smart" crop #295

rightaway opened this issue Nov 10, 2015 · 43 comments
Milestone

Comments

@rightaway
Copy link

Often when images are shrunk to generate thumbnails through resizing or cropping, the resulting image doesn't look very good just because of the content and the dimensions of the original image. But if there was a way to generate 'smart' thumbnails that's based on the content of the image, it would allow for much better thumbnails. For example, http://29a.ch/sandbox/2014/smartcrop/examples/testsuite.html.

There's a JS library that implements this https://github.com/jwagner/smartcrop.js, would it be possible to offer similar functionality in sharp?

@lovell
Copy link
Owner

lovell commented Nov 10, 2015

Hello, might https://github.com/lovell/attention provide what you're looking for?

@rightaway
Copy link
Author

Interesting!

So is the idea that the return values from attention.region, which are top/left/bottom/right, would be passed to sharp.extract? I imagine not because it doesn't pay attention to the width and height provided by sharp.resize. Is it better then to focus the image on the focal point provided by attention.point?

How would you use the x/y coordinates returned by attention.point in sharp? Ideally if sharp.crop optionally took an x/y coordinate instead of gravity, it could automatically center the image there while still respecting the other values passed to sharp such as the resize width and height.

So for example something like sharp.resize(300, 200).crop(125, 36) would offer up an image that's 300x200 centered at the point 125, 36 in the original image, which would be fantastic!

@lovell
Copy link
Owner

lovell commented Nov 10, 2015

You've got the idea. It's quite experimental and the performance could probably be improved. I might consider merging some of the features of attention into sharp after #152 but for now you'll have to "do the math" yourself :)

@lovell lovell changed the title Support for 'smart' cropping Investigate merging attention's saliency features for "smart" crop Nov 23, 2015
@homerjam
Copy link

Would love to see sharp get some attention 😆

@vlapo
Copy link
Contributor

vlapo commented Dec 15, 2015

+1

@homerjam
Copy link

homerjam commented Feb 5, 2016

Hi @lovell just wondering if there was any update on this - I'm about to start using attention in my image processing workflow unless there's an integration on the near horizon? Could I help at all (guessing its beyond a simple PR though)?

@lovell
Copy link
Owner

lovell commented Feb 5, 2016

@homerjam This is still planned but with nothing implemented yet. It'd be great to learn which features of attention you find useful to help prioritise what gets added to sharp.

@homerjam
Copy link

homerjam commented Feb 5, 2016

Cool, well not to worry I will press on with attention as it stands.

First up I'm going to be using attention to find the focal point of an image (coming soon on karinatwiss.com). But would also find the palette finder really useful - there are lots of cases where I'd like to match dominant colours (or calculate complimentary/opposites) of images to background colours.

@puzrin
Copy link

puzrin commented Feb 5, 2016

Focal point to generate thumbnails is enougth for me. I guess, that's a most demanded method.

@jwagner
Copy link

jwagner commented Feb 11, 2016

Actually I have some integration of smartcrop with sharp. I'll probably release it together with the next release. :)

@lovell
Copy link
Owner

lovell commented Feb 11, 2016

@jwagner 👍

@lovell
Copy link
Owner

lovell commented Mar 5, 2016

Commit 2034efc on the needle branch adds an experimental implementation of the entropy-based method suggested by @jcupitt in https://github.com/lovell/attention/issues/8

Here's an example of how you might use this to generate auto-cropped 200px square thumbnails using Streams:

var transformer = sharp().resize(200, 200).crop(sharp.strategy.entropy);
readableStream.pipe(transformer).pipe(writableStream);

Feedback very much welcome.

@puzrin
Copy link

puzrin commented Mar 5, 2016

One question. Do i understand right, that scale can vary? It selects region with requested width/height ratio, crop it and scale down to exact size. Correct?

@lovell
Copy link
Owner

lovell commented Mar 5, 2016

@puzrin The image is resized so at least one dimension is correct, then the edges of the remaining dimension are repeatedly cropped until it too is correct. (I've added this feature in such a way that, in the future, we could also use it to auto-extract a target width and height.)

@puzrin
Copy link

puzrin commented Mar 5, 2016

Got it. Probably i don't undersatand how api was intended to be used. Let me describe my task:

  1. I have images of any size on input. Let's take 3000x1500 for example.
  2. I wish to get 170x150 thumbnail on output, with "the most valuable content in it".

It would be nice to have simple call for that. I expect such use case to be the most demanded.

@lovell
Copy link
Owner

lovell commented Mar 5, 2016

@puzrin My thinking here is that we can add further "strategies" more suited to the use case you describe. These might be things like "skin tones", "edges", "contrast" etc.

As with the approach used in attention, these techniques require training with salient region datasets to calculate suitable thresholds.

The initial entropy-based strategy is more about removing the least valuable edges rather than keeping the most valuable/salient regions - I'll try to make the docs clearer - thanks for the feedback!

@puzrin
Copy link

puzrin commented Mar 5, 2016

Thanks for explanation. After thinking a bit, probably fuzzy edges cut will be enougth for my needs.

@jcupitt
Copy link
Contributor

jcupitt commented Mar 5, 2016

Yes, I like the trim boring edges strategy, it seems like a simple, reliable way to cut an image down that should need little training.

Most photos will not have a very small detail that you want to cut out. It must be much more common to just want to handle off-centre compositions automatically.

@lovell
Copy link
Owner

lovell commented Mar 5, 2016

It looks like something in libgobject in causing hist_entropy to segfault with an "Access violation" on Windows. Here's the backtrace:

msvcrt!strchr+0x2
libgobject_2_0_0!g_param_spec_pool_lookup+0x8d
libgobject_2_0_0!g_object_class_find_property+0x1a
libvips_42!vips_object_get_argument+0x35
libvips_42!vips_object_set_valist+0xb4
libvips_42!vips_call_required_optional+0x1fe
libvips_42!vips_call_split+0x92
libvips_42!vips_log+0x32
libvips_42!vips_hist_ismonotonic+0x30b
libvips_42!vips_object_build+0x19
libvips_42!vips_cache_operation_buildp+0x48
libvips_cpp!vips::VImage::call_option_string+0x185
libvips_cpp!vips::VImage::hist_entropy+0xf7
sharp!sharp::EntropyCrop+0x11a

I'll investigate.

@lovell
Copy link
Owner

lovell commented Mar 5, 2016

@jcupitt Should hist_entropy.c#L83 be

vips_log( t[0], &t[1], NULL )

instead of

vips_log( t[0], &t[1], 1.0 / sum, 0, NULL )

?

@jcupitt
Copy link
Contributor

jcupitt commented Mar 5, 2016

Ooops, yes, looks like a copy-paste error.

jcupitt added a commit to libvips/libvips that referenced this issue Mar 6, 2016
there was a copy-paste error in the call to vips_log(), thanks Lovell

see lovell/sharp#295
@jcupitt
Copy link
Contributor

jcupitt commented Mar 6, 2016

I fixed it in 8.2 and master, and added a test for it. Thanks for spotting the dumbness @lovell!

@lovell
Copy link
Owner

lovell commented Mar 7, 2016

@jcupitt Fantastic, thank you.

@lovell
Copy link
Owner

lovell commented Mar 22, 2016

The release of libvips v8.2.3 with the hist_entropy fix means this is now working on Windows too - https://ci.appveyor.com/project/lovell/sharp/build/372/job/3guwidhfb7hm0t3d

@calebshay
Copy link

I'm looking for something similar enough that I didn't think I should make a new issue: Trimming whitespace around an image. I've implemented it before using the vips ruby bindings, but sharp doesn't expose the vips methods I would need. Ruby implementation below. Note that this implementation assumes that the the image is already RGB(A), and would need more smarts to handle other color spaces.

    def trim(img)
      alpha = nil
      # Remove the alpha channel, if there is one, as it breaks mask creation
      if img.bands == 4
        alpha = img.extract_band(3)
        img = img.extract_band(0, 3)
      end
      mask = img.less(240)
      columns, rows = mask.project
      left = columns.profile_h.min
      right = columns.x_size - columns.fliphor.profile_h.min
      top = rows.profile_v.min
      bottom = rows.y_size - rows.flipver.profile_v.min
      # Put the alpha channel back in, if it had one
      img = img.bandjoin(alpha.clip2fmt(img.band_fmt)) if alpha
      img = img.extract_area(left, top, right - left, bottom - top)
      img
    end

@lovell
Copy link
Owner

lovell commented Mar 30, 2016

@calebshay This discussion is more about strategies for dealing with cropping-when-resizing. What you describe sounds like automated image extraction so feel free to create a new feature request for this.

(I see the possibility of combining the two approaches in one pipeline, e.g. extract non-whitespace then resize+crop using entropy.)

@lovell
Copy link
Owner

lovell commented Apr 2, 2016

The entropy-based cropping strategy is in v0.14.0, now available via npm, thanks for all the comments and help here. I'm going to leave this task open to track further additions/improvements from attention and similar modules.

@lovell lovell removed this from the v0.14.0 milestone Apr 2, 2016
@puzrin
Copy link

puzrin commented Apr 2, 2016

It seems to work strange. I've tried to create cropped thumbnails for images with clear left focus & right focus. Those are detected well by smartcrop.js, but not with sharp 0.14 (with new crop param).

@puzrin
Copy link

puzrin commented Apr 27, 2016

@lovell is previous explanation clear enougth or i should provide more info?

I used this demo to compare result https://29a.ch/sandbox/2014/smartcrop/examples/testbed.html.

@lovell
Copy link
Owner

lovell commented Apr 27, 2016

@puzrin Thanks, I think we've got plenty of info at the moment. When this is revisited we can look at training/thresholding algorithms with data sets such as this.

@puzrin
Copy link

puzrin commented Apr 27, 2016

Glad to know. My test case is crop 4:3 ratio image to 170*150 pixels (downscale + cut left & right sides a bit). Your link has at least one image (with focus on the left) good for algorythm check. It should cut such images from the one side only.

@puzrin
Copy link

puzrin commented Jun 4, 2016

@lovell do you have any estimates/priorities for revisiting smartcrop feature?

@lovell
Copy link
Owner

lovell commented Jun 4, 2016

Здравствуй @puzrin, I still plan to revisit this feature to add "trained" crop strategies. This and #236 seem to be the most popular/requested/useful new features, so I'll look at both of these over the next few months.

@jwagner
Copy link

jwagner commented Jun 25, 2016

A little update on integrating smartcrop with sharp. I have released smartcrop 1.0 along with smartcrop-sharp now. It's not super efficient right now as the image needs to be decoded twice (once for smartcrop, once for operating on it with sharp). But in practice it works quite well. :)

@homerjam
Copy link

Oooooh lovely @jwagner, thanks!

@lovell
Copy link
Owner

lovell commented Oct 3, 2016

Update/teaser:

The following graph shows image count (y-axis) against % error (x-axis) for the existing entropy-based crop strategy (dark blue) vs the attention-based strategy (green). Closer to the origin is closer to the "ground truth" and therefore better, so the attention-based approach is the relative winner in terms of accuracy.

(The MSRA Salient Object Database image set B was used as the source of "ground truth".)

The attention-based strategy is currently ~50% faster than entropy, typically adding <50ms to processing time, but work continues to fine-tune both accuracy and performance.

@jcupitt
Copy link
Contributor

jcupitt commented Oct 4, 2016

That's a fantastic graph Lovell! Very nice work. I should look at your attention crop code.

@lovell lovell added this to the v0.16.1 milestone Oct 11, 2016
@lovell
Copy link
Owner

lovell commented Oct 11, 2016

The attention branch adds experimental support for a crop "strategy" based on a slightly modified+simplified version of the original logic in the attention module.

sharp(input).resize(200, 200).crop(sharp.strategy.attention)...

@lovell
Copy link
Owner

lovell commented Oct 12, 2016

Commit 18b9991 adds this to the master branch ready for inclusion in v0.16.1.

@rightaway
Copy link
Author

@lovell What would be the difference between using the original attention package with sharp, vs using sharp.strategy.attention?

In layman's terms what's the difference between the entropy and attention based strategies?

@homerjam
Copy link

Not sure this qualifies as layman's terms but here's a bit of an explanation...

@lovell
Copy link
Owner

lovell commented Oct 12, 2016

sharp will be getting an updated+improved version of the focal-point logic from attention, made available via crop() when resizing to fixed dimensions, which I believe is its most common/popular use case.

entropy ranks regions based on their vips_hist_entropy value, or "which bit of the image has the most energy?"

attention converts image regions to the LAB and LCH colourspaces and generates 3 masks:

  1. luminance frequency: edge detection on the L channel via the Sobel operator, or "which bit of the image has the biggest change in brightness?"
  2. colour saturation: include only pixels from the C channel of LCH where the value is >~50%, or "which bit of the image has the most saturated colour?"
  3. skin tones: include only pixels where AB chroma is within a range trained with http://humanae.tumblr.com/ , or "which bit of the image contains humans?"

...then adds them together and finds the maximum value to rank regions.

@lovell
Copy link
Owner

lovell commented Oct 13, 2016

v0.16.1 now available via npm. Thanks everyone for your help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants