Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added suport for AVIF to getimagesize(). #5127

Closed
wants to merge 3 commits into from

Conversation

y-hashida
Copy link

@y-hashida y-hashida commented Jan 29, 2020

AVIF is a compression format for images derived from the AV1 video format.
WebP's file reduction is -24.5%, whereas AVIF's file reduction is -61.1%, so AVIF format is enable highly efficient compression[1].

In this pull request, for introducing AVIF, We added AVIF support for getimagesize() and used image/avif as MIME type.

References
[1]:https://aomedia.org/wp-content/uploads/2019/11/CyrilConcolato_Netflix-AVIF-AOM-Research-Symposium-2019.pdf
[2]:https://aomediacodec.github.io/av1-avif
[3]:https://www.iso.org/standard/66067.html

@nikic
Copy link
Member

nikic commented Feb 4, 2020

Is there any publicly available spec for this? [2] doesn't seem to correlate with this code, the relevant parts seem to depend on HEIF instead.

@y-hashida
Copy link
Author

Thank you for replying!

The specification document is charged, but there is a reference implementation publicly.
The following is introduced.

@Girgias
Copy link
Member

Girgias commented May 8, 2020

Just to note, please rebase onto master instead of merging master into your feature branch.
At the same time you may want to 'fixup' (by using git rebase -i master) the commits after the initial one to condense them into one commit.

@y-hashida
Copy link
Author

y-hashida commented May 9, 2020

I'm sorry
I try to use rebase.

@y-hashida
Copy link
Author

I used rebase .
Is this all right?

@Girgias
Copy link
Member

Girgias commented May 9, 2020

I used rebase .
Is this all right?

Perfect :)

@andypost
Copy link
Contributor

andypost commented Mar 7, 2021

Libgd just added support for avif/heif via libavif and libheif in 2.3.2 release.

There's related feature request https://bugs.php.net/bug.php?id=78832

Looks both libraries needs dependencies evaluation but maybe it's enough to add IMAGE_AVIF first, like webP used to before

@morsssss
Copy link
Contributor

morsssss commented Mar 8, 2021

Hey, I'm excited to see this! I'm the one who wrote the PR for AVIF support in libavif. I was looking at adding this support for getimagesize() next, but I'm pretty thrilled to see the work's already been done :)

@y-hashida
Copy link
Author

@morsssss Thank you!!
If you don't mind, may I ask you to check this feature?

@morsssss
Copy link
Contributor

Hi @y-hashida !

I took a look here - but I admit that the details about the contents of the header file are out of my range of experience.

I wonder if @wantehchang would be familiar with this...

@morsssss
Copy link
Contributor

Or perhaps @joedrago ? Joe, if you've got a moment, could you help make sure this is being done properly?

@joedrago
Copy link

I'm no expert on PHP, but this implementation certainly makes a lot of assumptions.

It appears to be blindly looking for fourcc codes like pixi and ispe and assuming that the bytes after that are for the image item you're about to decode. This might work often, but there no guarantees at all this will always work, as there are many legal ways to package AVIFs that will have multiple "items", each with potentially their own pixi and ispe properties (such as thumbnails or auxiliary images), let alone false positives in AV1 bitstreams that might happen to match the 4-byte patterns here. To ensure that you're reading an actual pixi or ispe box that correlates to the primary item, you'd have to be willing to find the pitm box find out which item ID is the primary item, then find which item properties are associated to that item, etc. It is much more complicated than this.

Long story short, this is certainly "wrong", but I don't know what level of accuracy you are looking for here. For the most common single-image, color-item-happens-to-come-before-alpha-item, mdat-comes-after-meta-box, no-false-positives-in-the-AV1-payload AVIF, this will likely find the right numbers. It could occasionally be spectacularly wrong though, and I don't know if that is worth the extra effort here, or if that could simply be warned in the docs and people would have to protect themselves accordingly.

@morsssss
Copy link
Contributor

Thanks, @joedrago !

@y-hashida , we're looking for another way to do this...

@wantehchang
Copy link

If it is okay to use libavif, then the avifDecoderParse() function can be used to get the width, height, bit depth, and the number of channels from the AVIF image container.

@morsssss
Copy link
Contributor

If it is okay to use libavif, then the avifDecoderParse() function can be used to get the width, height, bit depth, and the number of channels from the AVIF image container.

It would be quite convenient to use libavif! I just fear that the PHP team won't to include that as an additional dependency - especially as none of the other supported image formats come with a similar requirement.

If we didn't have another solution, maybe a compromise would involve including the AVIF functionality only if libavif was found during build. I see that sort of mechanism for other image formats in the GD config here - but I don't know if that propagates into this part of the codebase.

@morsssss
Copy link
Contributor

@cmb69 , would you like to weigh in here?

@morsssss
Copy link
Contributor

Offline discussion with @joedrago indicates that it's pretty hard to get the image's characteristics consistently without using libavif. And no one loves the idea of conditionally including libavif.

However, it is not hard to confirm that the image type is actually AVIF. And, at least, that's all that CMSs like Joomla, WordPress, and Drupal use this function for.

@cmb69 proposed that we let getimagesize() simply get the image type for AVIF, reporting false or null for the dimensions etc. This seems quite doable, and we could always expand support in later versions. We'd just need to document the behavior.

@y-hashida , how do you feel about this? I wish I had a better solution - but this is a start! And, as @cmb69 points out, people with libavif would be easily able to get the image's dimensions once it was decoded.

@y-hashida
Copy link
Author

y-hashida commented Mar 16, 2021

Sorry for the late reply.

we let getimagesize() simply get the image type for AVIF,

people with libavif would be easily able to get the image's dimensions once it was decoded.

I see!
I think that's a very smart solution.

Let me confirm to make sure.
Is my understanding correct?

  • If people have no "libavif": getimagesize() returns NULL
  • if people have "libavif": getimagesize() returns image size using "libavif"

@y-hashida
Copy link
Author

I want to follow the latest version of php, so I will use git rebase -i master.

@morsssss
Copy link
Contributor

Actually... I believe the idea is to have getimagesize() return only the image type if it's an AVIF image - and nothing else. It would never use libavif. So, the $image_info array would contain an IMAGETYPE constant at index 2 - but nothing useful at indices 0 or 1.

@morsssss
Copy link
Contributor

@joedrago has a nice algorithm which can be borrowed from libavif. I could post it here, but I'd like to give him the chance to do so himself first if he wants to ☺️

@joedrago
Copy link

(Reworded and tweaked from a private email chain:)

Detecting if a file is self-identifying as AVIF is not too tough, due to various restrictions with BMFF's FileTypeBox. I have an obvious bias here, but this is a fairly simple C routine from libavif called avifPeekCompatibleFileType() that shouldn't be too hard to re-implement:

https://github.com/AOMediaCodec/libavif/blob/master/src/read.c#L2210

These two snippets are from the BMFF standard (ISO 14496-12), of which AVIF is built on top of:

image

image

The general idea here is that the first "box" inside of the file must be a FileTypeBox, which means you can simply read the box's header (4 bytes for size, 4 bytes for type), and if the type isn't ftyp, it is not an AVIF.

If you do see ftyp here though, this file is probably BMFF-derived (which could be any kind of modern video, HEIF, etc). So to really be sure that it is an AVIF, you must look inside of this box to see if any of the brands are the brand avif (libavif also looks for and allows avis). The format is simple enough (as the screencap above shows). You basically have a "major brand" which is really just which brand this file mainly considers itself to be, then an array of "compatible brands" which this file could also legally be parsed as.

To recap, you'd check to see from the first 8 bytes if the file even smells a little bit like BMFF, and if so, read in the rest of the box (based on the length supplied by the first 4 bytes) to see if avif (or avis) is either the major_brand or in the compatible_brands list. If the right brand is in there, you have a file that at least thinks it is an AVIF. 😄

@morsssss
Copy link
Contributor

btw @y-hashida I'm happy to take this on, but wanted to give you the chance to adjust your own PR first! I didn't want to try to take this over if you preferred to do it yourself.

@y-hashida
Copy link
Author

Thanks for your concern.
But I am currently in a particularly busy period at work, so I will not be able to start this work for at least the rest of March.
If you are in a hurry, I can have you take over, how about it?

@joedrago
Copy link

joedrago commented Jun 2, 2021

No, there isn't a limit; in fact, there are mechanisms (see the implementation of Box) to allow for a 64bit size!

That said, I think it is quite reasonable to put an artificial limit on your ftyp box size. I believe Chromium happens to cap it at ~144 bytes(?), which would allow for something like 32 compatible brands, which is plenty.

Very, very minor aside: I glanced at your code, and one minor note I'd make is changing the name of the helper function php_little2bigendian(), as it might be a little misleading. In the place it is being used, it is really converting network order (which is actually "big") to host order (which can be either, and this implementation works for both). The actual implementation is perfectly fine; I just think it might be better served to have a name like the canonical ntohl. The actual credit for that implementation is really Rob Pike's though, from a rant of his back in 2012 (attributed in libavif here!).

@morsssss
Copy link
Contributor

morsssss commented Jun 2, 2021

Thanks, @joedrago ! Now I see why that implementation is written in that way... making the change, and giving credit where credit is due :)

@morsssss
Copy link
Contributor

See #7091

@andypost
Copy link
Contributor

andypost commented Jul 21, 2021

It could be closed? The current work in #7253

Meantime used to try build --with-avif and libavif 0.9.2 on Alpinelinux and got in test

011+ Encoding AVIF with default quality: 
012+ Warning: imageavif(): avif error - Could not encode image: No codec available in /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/avif_decode_encode.php on line 22
013+ ok

at the build time

/builds/alpine/aports/community/php8/src/php-8.1.0beta1/ext/gd/libgd/gd_avif.c: In function 'readFromCtx':
/builds/alpine/aports/community/php8/src/php-8.1.0beta1/ext/gd/libgd/gd_avif.c:173:32: warning: comparison of unsigned expression in '< 0' is always false [-Wtype-limits]
  173 |  if (offset > LONG_MAX || size < 0)
      |                                ^

Also getting

TEST 2/2 [ext/gd/tests/imagecreatefromstring_avif.phpt]
========DIFF========
--
     int(250)
     int(375)
     Reading image with a compatible brand that's 'avif':
005+ 
006+ Warning: imagecreatefromstring(): avif error - Could not parse image: BMFF parsing failed in /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/imagecreatefromstring_avif.php on line 8
005- int(480)
006- int(270)
007+ 
008+ Warning: imagecreatefromstring(): Passed data is not in "AVIF" format in /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/imagecreatefromstring_avif.php on line 8
009+ 
010+ Warning: imagecreatefromstring(): Couldn't create GD Image Stream out of Data in /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/imagecreatefromstring_avif.php on line 8
011+ 
012+ Fatal error: Uncaught TypeError: imagesx(): Argument #1 ($image) must be of type GdImage, bool given in /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/imagecreatefromstring_avif.php:9
013+ Stack trace:
014+ #0 /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/imagecreatefromstring_avif.php(9): imagesx(false)
015+ #1 {main}
016+   thrown in /mnt/community/php8/src/php-8.1.0beta1/ext/gd/tests/imagecreatefromstring_avif.php on line 9
========DONE========
FAIL imagecreatefromstring() - AVIF format [ext/gd/tests/imagecreatefromstring_avif.phpt] 

@cmb69
Copy link
Member

cmb69 commented Jul 21, 2021

if (offset > LONG_MAX || size < 0)

looks wrong. Is it supposed to check offset < 0?

@nikic
Copy link
Member

nikic commented Jul 21, 2021

@cmb69 Both size < 0 and offset < 0 don't make sense, as they're both unsigned. We should probably cherry-pick libgd/libgd@f6a111c?

@cmb69
Copy link
Member

cmb69 commented Jul 21, 2021

@nikic, oh, right!

@morsssss
Copy link
Contributor

Yep, I'm pretty sure I was asked to remove checks for < 0 for at least one unsigned variable in libgd :)

@andypost, the test failure you just got was just recently fixed in 5889a4a .

@andypost andypost mentioned this pull request Jul 21, 2021
@andypost
Copy link
Contributor

@cmb69 Both size < 0 and offset < 0 don't make sense, as they're both unsigned. We should probably cherry-pick libgd/libgd@f6a111c?

Filed PR #7296

nikic pushed a commit that referenced this pull request Jul 22, 2021
@ophian
Copy link

ophian commented Sep 23, 2021

Upcoming PHP 8.1 is announced with full AVIF support.
But with PHP 8.1-RC2 PHP getimagesize() still fails with avif files. IMHO it is essential to have!

Do any of you know why this isn't supported yet?

@cmb69
Copy link
Member

cmb69 commented Sep 23, 2021

getimagesize() should recognize AVIF files. Can you please provide a failing AVIF?

@ophian
Copy link

ophian commented Sep 23, 2021

Sorry for being unclear. Essential to have are key 0 and key 1 for width and height. These are converted jpegs to avif format by ImageMagick and/or GD with the new PHP 8.1-RC2 avif methods.

Array
(
    [0] => 0
    [1] => 0
    [2] => 19
    [3] => width="0" height="0"
    [mime] => image/avif
)

BTW, do they have a [bits] key like webp files?

@morsssss
Copy link
Contributor

Hi @ophian . See this comment above... unfortunately it wasn't practical for getimagesize() to include the size of AVIF images - since it doesn't have access to libgd or libavif.

The good news is that a group of folks at Google who work with AVIF are working right now on a function that would in fact be able to detect image size. If they succeed, I'll be submitting a PR here :)

@andypost
Copy link
Contributor

@morsssss what's left here? this PR looks outdated

@cmb69
Copy link
Member

cmb69 commented Nov 22, 2021

Oh, we already merged PR #7091. Are there, maybe, some useful improvements in this PR? Otherwise the PR should be closed.

@morsssss
Copy link
Contributor

Indeed, this PR has been superseded by #7091 and others. It can be closed.

@cmb69
Copy link
Member

cmb69 commented Nov 22, 2021

Closing per @morsssss comment above. Thank you for the PR @y-hashida nonetheless!

@cmb69 cmb69 closed this Nov 22, 2021
@morsssss
Copy link
Contributor

Yes, many thanks to @y-hashida for starting this journey!

@y-hashida
Copy link
Author

I'm sorry I couldn't continue development to this PR.
And, thank you for taking over this PR!

@ophian
Copy link

ophian commented Nov 23, 2021

The good news is that a group of folks at Google who work with AVIF are working right now on a function that would in fact be able to detect image size. If they succeed, I'll be submitting a PR here :)

@morsssss
Reminder: Please do not forget this part! :-)
Since this still is essential to have when working with AVIF for resize, rotate, or scaling actions.

@morsssss
Copy link
Contributor

morsssss commented Nov 23, 2021

Agreed!

Please see work in progress here and here.

It will be a few hundred lines of code. This is a lot more than what's used for other image formats, but I hope the community will accept this much code for full AVIF support in PHP.

@morsssss
Copy link
Contributor

@cmb69 , the work in progress is on the order of 700-800 lines of code. It takes some work to find the size of an AVIF image reliably :)

Does it feel reasonable to add this to PHP? It's not much compared to the existing codebase, but it's a lot more than we have for other images.

@cmb69
Copy link
Member

cmb69 commented Nov 24, 2021

@morsssss, if that is a stand-alone library, we can bundle temporarily until it is available on the most relevant systems. We should support scenarios where the libraray is not available (falling back to the existing code). It's likely a good idea to discuss that on the internals mailing list, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants