Add support for png_text metadata, allow to customize metadata for other backends. #7349

Merged
merged 21 commits into from Dec 25, 2016

Conversation

Projects
6 participants
Contributor

Xarthisius commented Oct 25, 2016

PNG Specification allows for custom png_text structures as an image attributes. The keywords that are given in the PNG specs are:

    Title            Short (one line) title or
                     caption for image
    Author           Name of image's creator
    Description      Description of image (possibly long)
    Copyright        Copyright notice
    Creation Time    Time of original image creation
                     (usually RFC 1123 format, see below)
    Software         Software used to create the image
    Disclaimer       Legal disclaimer
    Warning          Warning of nature of content
    Source           Device used to create the image
    Comment          Miscellaneous comment; conversion
                     from other image format

This PR allows to pass a dictionary with a metadata that can be used to fill those key/value pairs in png writer. Structure of the data and its type is not validated.

Additionally this PR allows to customize "software/creator" values in other backends such as PDF and PS.

Member

QuLogic commented Oct 25, 2016

Instead of replacing the entire metadata, I'd have the argument update the existing dictionary that was originally generated in the code. That way, the user doesn't need to figure out the default strings (version info and such), but can still provide other metadata or override it if really wanted.

QuLogic added this to the 2.1 (next point release) milestone Oct 25, 2016

Owner

tacaswell commented Oct 25, 2016

I agree with @QuLogic that user supplied data should update the default data, not completely replace it.

@mdboom should check the c, but this looks reasonable to me otherwise.

This will need and entry in whats_new.

@tacaswell

Please update the existing meta-data rather than completely replacing it with the user supplied metadata.

@QuLogic

A few small issues.

lib/matplotlib/backends/backend_agg.py
@@ -552,8 +552,16 @@ def print_png(self, filename_or_obj, *args, **kwargs):
else:
close = False
+ version_str = 'matplotlib version ' + __version__ + \
+ ', http://matplotlib.org/'
+ metadata = {six.b('Software'): six.b(version_str)}
@QuLogic

QuLogic Oct 27, 2016

Member

This would be confusing for users, as they would probably provide regular strings (as that's what all the other ones accept.) If bytestrings are really required here, it should do the encoding after obtaining the user's metadata.

@QuLogic

QuLogic Oct 27, 2016

Member

You also don't need six.b because we don't need to support Python 2.5.

lib/matplotlib/backends/backend_pdf.py
@@ -469,10 +469,13 @@ def __init__(self, filename):
revision = ''
self.infoDict = {
- 'Creator': 'matplotlib %s, http://matplotlib.org' % __version__,
+ 'Creator': 'matplotlib ' + __version__ +
+ ', http://matplotlib.org',
@QuLogic

QuLogic Oct 27, 2016

Member

Unnecessary change, looks like.

src/_png.cpp
+#ifdef PNG_TEXT_SUPPORTED
+ // Save the metadata
+ if (metadata != NULL)
+ {
@QuLogic

QuLogic Oct 27, 2016

Member

Brace is on the wrong line (comparatively speaking.)

src/_png.cpp
+ // Save the metadata
+ if (metadata != NULL)
+ {
+ meta_size = PyDict_Size(metadata);
@QuLogic

QuLogic Oct 27, 2016

Member

Should be 4-space indent.

@#{review_dismissed_event.actor.login} tacaswell dismissed their review Oct 27, 2016

Concern has been addressed.

src/_png.cpp
- text[meta_pos].key = PyBytes_AsString(meta_key);
- text[meta_pos].text = PyBytes_AsString(meta_val);
+ if (PyUnicode_Check(meta_key)) {
+ PyObject *temp_key = PyUnicode_AsEncodedString(meta_key, "ASCII", "strict");
@QuLogic

QuLogic Oct 28, 2016

Member

I believe it's actually (some small subset of) latin1/iso-8859-1.

lib/matplotlib/backends/backend_pdf.py
@@ -2423,8 +2425,11 @@ def __init__(self, filename, keep_empty=True):
keep_empty: bool, optional
If set to False, then empty pdf files will be deleted automatically
when closed.
+ metadata: dictionary, optional
+ Information dictionary object (see PDF reference section 10.2.1
@NelleV

NelleV Oct 29, 2016

Contributor

I don't find this documentation particularly useful. You could you add at least some examples of metadata ?
Also, numpydoc format requires a space before the column : metadata : dictionary, …

@QuLogic

QuLogic Oct 29, 2016

Member

The original parameters all don't have a space before the colon, and rendered fine, so I don't think that's a requirement.

@NelleV

NelleV Oct 29, 2016

Contributor

It is. The current documentation is not rendered properly.

@QuLogic

QuLogic Oct 29, 2016

Member

Oh yes, I missed that the bolding is wrong.

lib/matplotlib/backends/backend_ps.py
@@ -1073,13 +1073,19 @@ def write(self, *kl, **kwargs):
self.figure.set_facecolor(origfacecolor)
self.figure.set_edgecolor(origedgecolor)
+ # check for custom metadata
+ metadata = kwargs.pop("metadata", None)
@tacaswell

tacaswell Oct 31, 2016

Owner

Mild preference for adding metadata=None to the signature.

@tacaswell

tacaswell Oct 31, 2016

Owner

This pop needs to be removed?

@tacaswell

tacaswell Oct 31, 2016

Owner

err, sorry, there were two of these and I grew confused reading the diffs.

Owner

tacaswell commented Oct 31, 2016

Sorry in advance if I am asking obvious questions, just trying to understand the libpng API.

From http://www.libpng.org/pub/png/libpng-manual.txt

    png_set_text(png_ptr, info_ptr, text_ptr, num_text);

    text_ptr       - array of png_text holding image
                     comments

    text_ptr[i].compression - type of compression used
                 on "text" PNG_TEXT_COMPRESSION_NONE
                           PNG_TEXT_COMPRESSION_zTXt
                           PNG_ITXT_COMPRESSION_NONE
                           PNG_ITXT_COMPRESSION_zTXt
    text_ptr[i].key   - keyword for comment.  Must contain
                 1-79 characters.
    text_ptr[i].text  - text comments for current
                         keyword.  Can be NULL or empty.
    text_ptr[i].text_length - length of text string,
                 after decompression, 0 for iTXt
    text_ptr[i].itxt_length - length of itxt string,
                 after decompression, 0 for tEXt/zTXt
    text_ptr[i].lang  - language of comment (NULL or
                         empty for unknown).
    text_ptr[i].translated_keyword  - keyword in UTF-8 (NULL
                         or empty for unknown).

    Note that the itxt_length, lang, and lang_key
    members of the text_ptr structure only exist when the
    library is built with iTXt chunk support.  Prior to
    libpng-1.4.0 the library was built by default without
    iTXt support. Also note that when iTXt is supported,
    they contain NULL pointers when the "compression"
    field contains PNG_TEXT_COMPRESSION_NONE or
    PNG_TEXT_COMPRESSION_zTXt.
  • Do we also need to set text_length?
  • From these docs it is not clear to me that setting the key to NULL is allowed.
  • Do we need to check the length of the key?

I can not quite sort out the difference between tEXt, TEXT, and iTXt in the docs. It seems we are using tEXt (due to the compression flag set) which seems like a good idea as the iTXt has a bunch of comments about bugs that result in non-readable pngs with some versions of libpng.

This is valuable work thank you again for taking this on!

Contributor

Xarthisius commented Oct 31, 2016

Do we also need to set text_length?

AFAICT, it's totally ignored. There's an explicit strlen in png_set_text that populates text_length in the struct.

From these docs it is not clear to me that setting the key to NULL is allowed.

I'm not sure how to interpret the standard either. In libpng there's explicit:

if (text_ptr[i].key == NULL)
    continue;

in the loop over text_ptr.

Do we need to check the length of the key?

It's checked internally. If the key exceeds 79 it is truncated and warning is issued:

libpng warning: keyword truncated
Owner

tacaswell commented Oct 31, 2016

It makes sense that it takes care of the length it's self (so we can not lie to it, that seems like an obvious attack vector).

I am a tad concerned about the NULL and that behavior may be a libpng implementation detail. Might be safer to go with 'BAD KEY N' or something like that? Keys can be repeated, but seems safer to make them unique (as coming from a python dictionary they will be unique on the way in).

Might be worth turning that warning into a python warning? I am 👍 with punting on that for this PR.

Contributor

Xarthisius commented Nov 15, 2016

ping, is there anything else left to do before this PR is merged?

Owner

efiring commented Nov 16, 2016

This looks like a good contribution, but it would be nice to have @mdboom look at the agg part and @jkseppan look at the pdf and pd parts.

As @tacaswell noted, it will need a whats-new entry.

I have a question about the API and behavior: it looks like the present scheme, in which metadata comes in as a dictionary, has the undesirable side effect of making the order in which the metadata appear in the file non-deterministic (and at the very least, not under the control of the user). This could be avoided by using an OrderedDict, or by using a sequence of (key, value) tuples. Correct?

Contributor

Xarthisius commented Nov 16, 2016

As @tacaswell noted, it will need a whats-new entry.

Sorry, it wasn't clear who's supposed to add it. I'll work on it.

I have a question about the API and behavior: it looks like the present scheme, in which metadata comes in as a dictionary, has the undesirable side effect of making the order in which the metadata appear in the file non-deterministic (and at the very least, not under the control of the user). This could be avoided by using an OrderedDict, or by using a sequence of (key, value) tuples. Correct?

In principle you are right. However, since there are no methods to read text value by its index why would it matter? I would say that random order is desirable side effect. It will prevent people from writing software that makes ill assumptions about metadata.

Owner

tacaswell commented Nov 16, 2016

On the bright side, with python 3.6 dictionary order is deterministic again!

Owner

efiring commented Nov 16, 2016

I know zilch about metadata in image files, but I am guessing that utilities dump the metadata in the order in which it is found, e.g. for display to the screen, in which case it would be nice to control the order--wouldn't it?

Regarding deterministic vs non-deterministic, see #6317 for a discussion.

codecov-io commented Nov 20, 2016 edited

Current coverage is 62.07% (diff: 81.81%)

Merging #7349 into master will decrease coverage by <.01%

@@             master      #7349   diff @@
==========================================
  Files           174        174          
  Lines         56007      56021    +14   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          34765      34773     +8   
- Misses        21242      21248     +6   
  Partials          0          0          

Powered by Codecov. Last update e73625e...935e02f

tacaswell closed this Nov 20, 2016

tacaswell reopened this Nov 20, 2016

Owner

tacaswell commented Nov 20, 2016

'power cycled' to restart against current master

NelleV changed the title from Add support for png_text metadata, allow to customize metadata for other backends. to [MRG+1] Add support for png_text metadata, allow to customize metadata for other backends. Dec 19, 2016

Contributor

NelleV commented Dec 19, 2016

Hi @Xarthisius
Do you mind rebasing? I think this is ready to be merged. Sorry it took so long.
Thanks,
N

Xarthisius added some commits Oct 25, 2016

@Xarthisius Xarthisius Allow to store metadata in png files bc6e367
@Xarthisius Xarthisius Ensure that metadata pointer is initially set to NULL 5200aff
@Xarthisius Xarthisius Allow to pass custom infoDict to images created with pdf backend 5a5bd96
@Xarthisius Xarthisius Allow to pass custom metadata to png images created with agg backend d8cac23
@Xarthisius Xarthisius Allow to pass custom Creator to images created with ps backend 8305868
@Xarthisius Xarthisius 'Software' starts with capital letter as per PNG specification 0d814e0
@Xarthisius Xarthisius fix pep8 issue 112e38a
@Xarthisius Xarthisius Drop debug statements 0f006e3
@Xarthisius Xarthisius Preserve the default values of the metadata. Allow user to update if …
…necessary
12b0120
@Xarthisius Xarthisius Revert accidental changes and fix indentation d75b068
@Xarthisius Xarthisius Handle unicode/bytes directly in the C extension 92bccd2
@Xarthisius Xarthisius Use 'latin_1' encoding instead of ascii e174284
@Xarthisius Xarthisius Fix numpydoc format issues af4c1d1
@Xarthisius Xarthisius Add example info dictionary 98b5587
@Xarthisius Xarthisius Add a description for the 'metadata' keyword to the docstring a2d2c37
@Xarthisius Xarthisius Explicitly define 'metadata' kwarg in '_print_figure' 06ad9fb
@Xarthisius Xarthisius Define 'metadata' as kwarg in _print_figure_tex for consistency 6916e77
@Xarthisius Xarthisius Use default value for invalid key instead of NULL 8bb0ae3
@Xarthisius Xarthisius Use OrderedDict for metadata 66f6e2d
@Xarthisius Xarthisius Add 'what is new' entry for metadata kwarg in matplotlib.pyplot.savefig
8897f0f
@Xarthisius Xarthisius Fix merge
935e02f

@tacaswell tacaswell merged commit ab98852 into matplotlib:master Dec 25, 2016

5 checks passed

codecov/patch 81.81% of diff hit (target 62.07%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +19.74% compared to e73625e
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.001%) to 62.071%
Details
Owner

tacaswell commented Dec 25, 2016

@Xarthisius Thanks!

Sorry this took so long to get merged.

QuLogic changed the title from [MRG+1] Add support for png_text metadata, allow to customize metadata for other backends. to Add support for png_text metadata, allow to customize metadata for other backends. Dec 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment