New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate merging attention's saliency features for "smart" crop #295

Closed
rightaway opened this Issue Nov 10, 2015 · 43 comments

Comments

8 participants
@rightaway
Copy link

rightaway commented Nov 10, 2015

Often when images are shrunk to generate thumbnails through resizing or cropping, the resulting image doesn't look very good just because of the content and the dimensions of the original image. But if there was a way to generate 'smart' thumbnails that's based on the content of the image, it would allow for much better thumbnails. For example, http://29a.ch/sandbox/2014/smartcrop/examples/testsuite.html.

There's a JS library that implements this https://github.com/jwagner/smartcrop.js, would it be possible to offer similar functionality in sharp?

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Nov 10, 2015

Hello, might https://github.com/lovell/attention provide what you're looking for?

@lovell lovell added the question label Nov 10, 2015

@rightaway

This comment has been minimized.

Copy link

rightaway commented Nov 10, 2015

Interesting!

So is the idea that the return values from attention.region, which are top/left/bottom/right, would be passed to sharp.extract? I imagine not because it doesn't pay attention to the width and height provided by sharp.resize. Is it better then to focus the image on the focal point provided by attention.point?

How would you use the x/y coordinates returned by attention.point in sharp? Ideally if sharp.crop optionally took an x/y coordinate instead of gravity, it could automatically center the image there while still respecting the other values passed to sharp such as the resize width and height.

So for example something like sharp.resize(300, 200).crop(125, 36) would offer up an image that's 300x200 centered at the point 125, 36 in the original image, which would be fantastic!

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Nov 10, 2015

You've got the idea. It's quite experimental and the performance could probably be improved. I might consider merging some of the features of attention into sharp after #152 but for now you'll have to "do the math" yourself :)

@lovell lovell changed the title Support for 'smart' cropping Investigate merging attention's saliency features for "smart" crop Nov 23, 2015

@lovell lovell added enhancement and removed question labels Nov 23, 2015

@homerjam

This comment has been minimized.

Copy link

homerjam commented Dec 14, 2015

Would love to see sharp get some attention 😆

@vlapo

This comment has been minimized.

Copy link
Contributor

vlapo commented Dec 15, 2015

+1

@homerjam

This comment has been minimized.

Copy link

homerjam commented Feb 5, 2016

Hi @lovell just wondering if there was any update on this - I'm about to start using attention in my image processing workflow unless there's an integration on the near horizon? Could I help at all (guessing its beyond a simple PR though)?

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Feb 5, 2016

@homerjam This is still planned but with nothing implemented yet. It'd be great to learn which features of attention you find useful to help prioritise what gets added to sharp.

@homerjam

This comment has been minimized.

Copy link

homerjam commented Feb 5, 2016

Cool, well not to worry I will press on with attention as it stands.

First up I'm going to be using attention to find the focal point of an image (coming soon on karinatwiss.com). But would also find the palette finder really useful - there are lots of cases where I'd like to match dominant colours (or calculate complimentary/opposites) of images to background colours.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Feb 5, 2016

Focal point to generate thumbnails is enougth for me. I guess, that's a most demanded method.

@jwagner

This comment has been minimized.

Copy link

jwagner commented Feb 11, 2016

Actually I have some integration of smartcrop with sharp. I'll probably release it together with the next release. :)

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Feb 11, 2016

@jwagner 👍

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 5, 2016

Commit 2034efc on the needle branch adds an experimental implementation of the entropy-based method suggested by @jcupitt in lovell/attention#8

Here's an example of how you might use this to generate auto-cropped 200px square thumbnails using Streams:

var transformer = sharp().resize(200, 200).crop(sharp.strategy.entropy);
readableStream.pipe(transformer).pipe(writableStream);

Feedback very much welcome.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Mar 5, 2016

One question. Do i understand right, that scale can vary? It selects region with requested width/height ratio, crop it and scale down to exact size. Correct?

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 5, 2016

@puzrin The image is resized so at least one dimension is correct, then the edges of the remaining dimension are repeatedly cropped until it too is correct. (I've added this feature in such a way that, in the future, we could also use it to auto-extract a target width and height.)

@puzrin

This comment has been minimized.

Copy link

puzrin commented Mar 5, 2016

Got it. Probably i don't undersatand how api was intended to be used. Let me describe my task:

  1. I have images of any size on input. Let's take 3000x1500 for example.
  2. I wish to get 170x150 thumbnail on output, with "the most valuable content in it".

It would be nice to have simple call for that. I expect such use case to be the most demanded.

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 5, 2016

@puzrin My thinking here is that we can add further "strategies" more suited to the use case you describe. These might be things like "skin tones", "edges", "contrast" etc.

As with the approach used in attention, these techniques require training with salient region datasets to calculate suitable thresholds.

The initial entropy-based strategy is more about removing the least valuable edges rather than keeping the most valuable/salient regions - I'll try to make the docs clearer - thanks for the feedback!

@puzrin

This comment has been minimized.

Copy link

puzrin commented Mar 5, 2016

Thanks for explanation. After thinking a bit, probably fuzzy edges cut will be enougth for my needs.

@jcupitt

This comment has been minimized.

Copy link

jcupitt commented Mar 5, 2016

Yes, I like the trim boring edges strategy, it seems like a simple, reliable way to cut an image down that should need little training.

Most photos will not have a very small detail that you want to cut out. It must be much more common to just want to handle off-centre compositions automatically.

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 5, 2016

It looks like something in libgobject in causing hist_entropy to segfault with an "Access violation" on Windows. Here's the backtrace:

msvcrt!strchr+0x2
libgobject_2_0_0!g_param_spec_pool_lookup+0x8d
libgobject_2_0_0!g_object_class_find_property+0x1a
libvips_42!vips_object_get_argument+0x35
libvips_42!vips_object_set_valist+0xb4
libvips_42!vips_call_required_optional+0x1fe
libvips_42!vips_call_split+0x92
libvips_42!vips_log+0x32
libvips_42!vips_hist_ismonotonic+0x30b
libvips_42!vips_object_build+0x19
libvips_42!vips_cache_operation_buildp+0x48
libvips_cpp!vips::VImage::call_option_string+0x185
libvips_cpp!vips::VImage::hist_entropy+0xf7
sharp!sharp::EntropyCrop+0x11a

I'll investigate.

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 5, 2016

@jcupitt Should hist_entropy.c#L83 be

vips_log( t[0], &t[1], NULL )

instead of

vips_log( t[0], &t[1], 1.0 / sum, 0, NULL )

?

@jcupitt

This comment has been minimized.

Copy link

jcupitt commented Mar 5, 2016

Ooops, yes, looks like a copy-paste error.

jcupitt added a commit to libvips/libvips that referenced this issue Mar 6, 2016

fix hist_entropy
there was a copy-paste error in the call to vips_log(), thanks Lovell

see lovell/sharp#295
@jcupitt

This comment has been minimized.

Copy link

jcupitt commented Mar 6, 2016

I fixed it in 8.2 and master, and added a test for it. Thanks for spotting the dumbness @lovell!

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 7, 2016

@jcupitt Fantastic, thank you.

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 22, 2016

The release of libvips v8.2.3 with the hist_entropy fix means this is now working on Windows too - https://ci.appveyor.com/project/lovell/sharp/build/372/job/3guwidhfb7hm0t3d

@calebshay

This comment has been minimized.

Copy link

calebshay commented Mar 30, 2016

I'm looking for something similar enough that I didn't think I should make a new issue: Trimming whitespace around an image. I've implemented it before using the vips ruby bindings, but sharp doesn't expose the vips methods I would need. Ruby implementation below. Note that this implementation assumes that the the image is already RGB(A), and would need more smarts to handle other color spaces.

    def trim(img)
      alpha = nil
      # Remove the alpha channel, if there is one, as it breaks mask creation
      if img.bands == 4
        alpha = img.extract_band(3)
        img = img.extract_band(0, 3)
      end
      mask = img.less(240)
      columns, rows = mask.project
      left = columns.profile_h.min
      right = columns.x_size - columns.fliphor.profile_h.min
      top = rows.profile_v.min
      bottom = rows.y_size - rows.flipver.profile_v.min
      # Put the alpha channel back in, if it had one
      img = img.bandjoin(alpha.clip2fmt(img.band_fmt)) if alpha
      img = img.extract_area(left, top, right - left, bottom - top)
      img
    end
@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Mar 30, 2016

@calebshay This discussion is more about strategies for dealing with cropping-when-resizing. What you describe sounds like automated image extraction so feel free to create a new feature request for this.

(I see the possibility of combining the two approaches in one pipeline, e.g. extract non-whitespace then resize+crop using entropy.)

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Apr 2, 2016

The entropy-based cropping strategy is in v0.14.0, now available via npm, thanks for all the comments and help here. I'm going to leave this task open to track further additions/improvements from attention and similar modules.

@lovell lovell removed this from the v0.14.0 milestone Apr 2, 2016

@puzrin

This comment has been minimized.

Copy link

puzrin commented Apr 2, 2016

It seems to work strange. I've tried to create cropped thumbnails for images with clear left focus & right focus. Those are detected well by smartcrop.js, but not with sharp 0.14 (with new crop param).

@puzrin

This comment has been minimized.

Copy link

puzrin commented Apr 27, 2016

@lovell is previous explanation clear enougth or i should provide more info?

I used this demo to compare result https://29a.ch/sandbox/2014/smartcrop/examples/testbed.html.

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Apr 27, 2016

@puzrin Thanks, I think we've got plenty of info at the moment. When this is revisited we can look at training/thresholding algorithms with data sets such as this.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Apr 27, 2016

Glad to know. My test case is crop 4:3 ratio image to 170*150 pixels (downscale + cut left & right sides a bit). Your link has at least one image (with focus on the left) good for algorythm check. It should cut such images from the one side only.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Jun 4, 2016

@lovell do you have any estimates/priorities for revisiting smartcrop feature?

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Jun 4, 2016

Здравствуй @puzrin, I still plan to revisit this feature to add "trained" crop strategies. This and #236 seem to be the most popular/requested/useful new features, so I'll look at both of these over the next few months.

@jwagner

This comment has been minimized.

Copy link

jwagner commented Jun 25, 2016

A little update on integrating smartcrop with sharp. I have released smartcrop 1.0 along with smartcrop-sharp now. It's not super efficient right now as the image needs to be decoded twice (once for smartcrop, once for operating on it with sharp). But in practice it works quite well. :)

@homerjam

This comment has been minimized.

Copy link

homerjam commented Jun 25, 2016

Oooooh lovely @jwagner, thanks!

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Oct 3, 2016

Update/teaser:

The following graph shows image count (y-axis) against % error (x-axis) for the existing entropy-based crop strategy (dark blue) vs the attention-based strategy (green). Closer to the origin is closer to the "ground truth" and therefore better, so the attention-based approach is the relative winner in terms of accuracy.

(The MSRA Salient Object Database image set B was used as the source of "ground truth".)

The attention-based strategy is currently ~50% faster than entropy, typically adding <50ms to processing time, but work continues to fine-tune both accuracy and performance.

@jcupitt

This comment has been minimized.

Copy link

jcupitt commented Oct 4, 2016

That's a fantastic graph Lovell! Very nice work. I should look at your attention crop code.

@lovell lovell added this to the v0.16.1 milestone Oct 11, 2016

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Oct 11, 2016

The attention branch adds experimental support for a crop "strategy" based on a slightly modified+simplified version of the original logic in the attention module.

sharp(input).resize(200, 200).crop(sharp.strategy.attention)...
@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Oct 12, 2016

Commit 18b9991 adds this to the master branch ready for inclusion in v0.16.1.

@rightaway

This comment has been minimized.

Copy link

rightaway commented Oct 12, 2016

@lovell What would be the difference between using the original attention package with sharp, vs using sharp.strategy.attention?

In layman's terms what's the difference between the entropy and attention based strategies?

@homerjam

This comment has been minimized.

Copy link

homerjam commented Oct 12, 2016

Not sure this qualifies as layman's terms but here's a bit of an explanation...

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Oct 12, 2016

sharp will be getting an updated+improved version of the focal-point logic from attention, made available via crop() when resizing to fixed dimensions, which I believe is its most common/popular use case.

entropy ranks regions based on their vips_hist_entropy value, or "which bit of the image has the most energy?"

attention converts image regions to the LAB and LCH colourspaces and generates 3 masks:

  1. luminance frequency: edge detection on the L channel via the Sobel operator, or "which bit of the image has the biggest change in brightness?"
  2. colour saturation: include only pixels from the C channel of LCH where the value is >~50%, or "which bit of the image has the most saturated colour?"
  3. skin tones: include only pixels where AB chroma is within a range trained with http://humanae.tumblr.com/ , or "which bit of the image contains humans?"

...then adds them together and finds the maximum value to rank regions.

@lovell

This comment has been minimized.

Copy link
Owner

lovell commented Oct 13, 2016

v0.16.1 now available via npm. Thanks everyone for your help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment