Warning about invalid entry names #469

Merged
merged 3 commits into from Nov 20, 2015

Projects

None yet

6 participants

@olivierdalang

Hi !

When using QgsProject.instance().writeEntry(scope, key, value), it's really easy to make the whole project file unreadable.

This happens because scope and keys are used as XML elements names in the project file.

In case a plugin developer uses user input as the key, he may not notice that he's endangering the user's files.

The commit sanitize scope and keys to make sure the project's XML remains valid whatever scope/key is used. The regexp are done using this : http://www.w3.org/TR/REC-xml/#NT-NameStartChar

http://hub.qgis.org/issues/7371 can be closed upon merge...

Thanks !

Olivier

@mhugent mhugent was assigned Mar 28, 2013
@olivierdalang

Will this be merged for 2.0 ? (I'm currently working on a plugin that could rely on this)

@mhugent
Collaborator
mhugent commented Apr 13, 2013

Denis commited something similar in fe7f26f
That lets Qt remove the invalid characters from the xml names. Is this sufficient for your usage?

@olivierdalang

Hi ! Sorry for the delay answering to this.

I just tested fe7f26f and it does indeed solve the problem of having an invalid XML file.

But still some strange behaviour can occur with the writeEntry function :

QgsProject.instance().writeEntry('testingWriteEntry','abc','abc')  #works
QgsProject.instance().writeEntry('testingWriteEntry','123','123') #not written to file at all (since an XML tag can't start with a number)
QgsProject.instance().writeEntry('testingWriteEntry','def','def') #works too
QgsProject.instance().writeEntry('testingWriteEntry','def','def again') #works (overwrites def)
QgsProject.instance().writeEntry('testingWriteEntry','d e f','d e f') #bugs (append another def node, since spaces are dropped from xml tags)

This can be a problem, since a plugin developer may ignore that the entry keys are directly used in the XML file or ignore what characters are allowed as XML tags.

Also, in master, the entries are always stored in memory, but may not be written to file (or with an altered tag). So when testing, it can seem to work, but when saving and reloading the project, some entries can be lost or altered.
Those problems are hard to spot and hard to debug.

A scenario would be a developer using user input as a writeEntry key (for instance if one wants to store several user-named settings) : he may not think of sanitizing the input, and get those annoying bugs.

I don't know if this should be considered a bug or not. Maybe a notice in the documentation would be enough ? Or a message in the console when an illegal name is used ?

But if it's a bug, then my pull request still makes sense.

Since the sanitation is made in the makeKeyTokens method (which is used at read and write), the alteration of the name won't be as problematic as in master (where the tags name are altered at writing to file only).

Thinking of it, in my pull request, the problem is only partially solved : if a character is invalid, it replaces it with an underscore, which doesn't fix similar keys with invalid characters being different.

So this solves

QgsProject.instance().writeEntry('testingWriteEntry','123','123') #stored as _123

but not

QgsProject.instance().writeEntry('testingWriteEntry','a b c','abc') #stored as a_b_c
QgsProject.instance().writeEntry('testingWriteEntry','a<b_c','abc') #also stored as a_b_c (overwrite)

(so for that latter case, it may be a good idea to modify the pull request to "encode" illegal characters, rather than just replace them with an underscore)

What do you think ?

@olivierdalang

So what do you think ? is it worth working on that (resolving that latter case) ?
If so, does someone know what string encoding method could fit this need ?

@mhugent
Collaborator
mhugent commented May 13, 2013

Sorry for the long delay on my side. Maybe it is better to simply reject writing keys with invalid xml names (and say so in the API docs) rather than automatically fix it. Automatically fixing might surprise the user and lead to bugs which are hard to track down.

@m-kuhn
Member
m-kuhn commented Jul 3, 2013

I support @mhugent. I cannot see a good reason why arbitrary keys are allowed, but potential problems with colliding keys. Rejecting (with an error message) is more stable and sounds absolutely valid.

@olivierdalang olivierdalang and 1 other commented on an outdated diff Jan 18, 2014
src/core/qgsproject.cpp
@@ -64,7 +64,28 @@ QStringList makeKeyTokens_( QString const &scope, QString const &key )
// be sure to include the canonical root node
keyTokens.push_front( "properties" );
- return keyTokens;
+ //sanitize keys since an unvalid xml name will make the project file unreadable !
+ QStringList sanitizedKeyTokens = QStringList();
+ for (int i = 0; i < keyTokens.size(); ++i){
+ QString sanitizedKey = keyTokens.at(i);
+
+ //replace invalid chars in XML document ( http://www.w3.org/TR/REC-xml/#NT-NameChar )
+ //note : it seems this should also keep \x10000-\xEFFFF but then a lot of unwanted chars remain...
+ QString nameCharRegexp = QString( "[^:A-Z_a-z\\xC0-\\xD6\\xD8-\\xF6\\xF8-\\x2FF\\x370-\\x37D\\x37F-\\x1FFF\\x200C-\\x200D\\x2070-\\x218F\\x2C00-\\x2FEF\\x3001-\\xD7FF\\xF900-\\xFDCF\\xFDF0-\\xFFFD\\-\\.0-9\\xB7\\x0300-\\x036F\\x203F-\\x2040]" );
+ QString nameStartCharRegexp = QString( "^[^:A-Z_a-z\\xC0-\\xD6\\xD8-\\xF6\\xF8-\\x2FF\\x370-\\x37D\\x37F-\\x1FFF\\x200C-\\x200D\\x2070-\\x218F\\x2C00-\\x2FEF\\x3001-\\xD7FF\\xF900-\\xFDCF\\xFDF0-\\xFFFD]" );
+
+ if( sanitizedKey.contains( QRegExp(nameCharRegexp) ) || sanitizedKey.contains( QRegExp(nameStartCharRegexp) ) ){
+
+ //TODO :
+ // >>> DISPLAY SOME KIND OF MESSAGE <<<
@olivierdalang
olivierdalang Jan 18, 2014

Okay for the rejection + error message...

I made the changes and the pull request is almost ready excepted that I did not manage to display a message here...

I tried with QgsDebugMsg(), but the console didn't print anything...

Can someone help ?

@m-kuhn
m-kuhn Jan 19, 2014 QGIS member

Strange that it does not work for you. QgsDebugMsg() reliably sends all the output to stderr according to my experience.

However, I assume this message will be interesting also for people developing plugins?
You could use the QgsMessageLog which will also be available in release versions, so plugin devs using release versions also get the messages.

@olivierdalang

Ok it seems to work with QgsMessageLog !
So now, when an invalid key is used, it logs a message and outputs a debug message as well.

I also updated the docstrings to briefly explain the key's constraints (as well as removed some obsolete notes).

I'd say the pull request is ready to merge, but please review it with some attention since I'm still not very comfortable with C++.

@3nids
Collaborator
3nids commented Apr 28, 2014

@olivierdalang can you maybe rebase this, sorry for not reviewing this?

@olivierdalang olivierdalang check validity of keys used by writeEntry and readEntry
and print a message in the console in case of invalid key

update of docstrings (obsolete note removed, note added)
313fe6a
@olivierdalang

@3nids ok, done.

Now, a message is printed to the console in case the keys are not valid, but the behaviour is not changed : the key is still set and dropped on file save. So it's more a warning than a sanitation...

I also removed the note The key string <em>must</em> include '/'s. E.g., "/foo" not "foo". since it seem to be obsolete.

Again: I'd say the pull request is ready to merge, but please review it with some attention since I'm still not very comfortable with C++.

@olivierdalang olivierdalang changed the title from Sanitize entry names to avoid make project unreadable to Warning about invalid entry names Jun 1, 2014
@3nids
Collaborator
3nids commented Jun 3, 2014

@m-kuhn or @mhugent can one of you two review this?

@m-kuhn
Member
m-kuhn commented Jun 4, 2014

Hi Olivier,

would it be ok for you to write some small tests? I think it will be very straightforward for this (some keys that are rejected, some that are not). The regex looks quite cryptic to just review like that :-)

I also wonder if you think that this should be merged for 2.4 still? I am a bit afraid that there is a possibility to render working plugins unusable and the remaining time for bugfixing gets shorter. So maybe this should rather be treated as a feature for 2.6. What's your opinion, @olivierdalang @mhugent ?

@olivierdalang

Hi !

I'll try to write some tests in the following days (never did yet, but it's indeed probably a good opportunity to learn).

About the merge, it shouldn't make plugins unusable : the latest version of the PR only prints a message to the console, it doesn't reject the key (since they work, they are just not saved to file).

So, for the merge, it's up to you, but I think at least we should merge the update of the docstring to explain that the key must be valid xml tags, and not just any string.

The regexp is very cryptic (hehe, it's a regexp), but it simply searches for characters that are not in the allowed range defined on http://www.w3.org/TR/REC-xml/#NT-NameChar and http://www.w3.org/TR/REC-xml/#NT-NameStartChar (the latter being only for the first character).

@m-kuhn
Member
m-kuhn commented Nov 24, 2014

@olivierdalang
This is our oldest pull request, would be nice to merge it :)

Do you think you can handle writing a test for it?

@olivierdalang

@m-kuhn
Hehe it seems I should have had some time to work on those tests since the 4th of june...

If you want to merge soon, maybe the best would be to find someone to write these tests (I guess it's easy for someone who knows how it works), if not, I still have making those tests on my long term todo list (I should be able to do this before 2015).

@m-kuhn
Member
m-kuhn commented Nov 26, 2014

Yes, I think it should be trivial. Some known good and known bad entries can be written and the result of makeKeyTokens be checked.
Looking forward to this :)

@m-kuhn
Member
m-kuhn commented Nov 5, 2015

@olivierdalang Any chance to get this up and running? :-)

@SebDieBln SebDieBln referenced this pull request in olivierdalang/QGIS Nov 19, 2015
Merged

add a test for makeKeyTokens_() #1

@olivierdalang

The tests are there thanks to SebDieBln !

@m-kuhn
Member
m-kuhn commented Nov 19, 2015

Awesome, thank you guys!

Just waiting for the green light from travis.

@m-kuhn m-kuhn commented on the diff Nov 20, 2015
tests/src/python/test_qgsproject.py
+
+ # generate the characters that are allowed at the start of a token (and at every other position)
+ validStartChars = u":_";
+ charRanges = [
+ (ord(u'a'), ord(u'z')),
+ (ord(u'A'), ord(u'Z')),
+ (0x00F8, 0x02FF),
+ (0x0370, 0x037D),
+ (0x037F, 0x1FFF),
+ (0x200C, 0x200D),
+ (0x2070, 0x218F),
+ (0x2C00, 0x2FEF),
+ (0x3001, 0xD7FF),
+ (0xF900, 0xFDCF),
+ (0xFDF0, 0xFFFD),
+ #(0x10000, 0xEFFFF), while actually valid, these are not yet accepted by makeKeyTokens_()
@m-kuhn
m-kuhn Nov 20, 2015 QGIS member

Should that be considered before merging?

@SebDieBln
SebDieBln Nov 20, 2015

I'd say not. The characters 0x10000 ... 0xEFFFF are outside of the Basic Multilingual Plane and you need a wide build Python which is not installed in the current Travic CI setup (at least for the Mac):

And reading @olivierdalang's comment on this I think similar issues arise with the C++-files misinterpreting these characters.

The worst thing we will do is warn the plugin developer where there probably(!) is no reason to warn. Did anyone actually test a project file containing wide unicode characters?

@m-kuhn m-kuhn merged commit d795e50 into qgis:master Nov 20, 2015

1 check passed

Details continuous-integration/travis-ci/pr The Travis CI build passed
@m-kuhn
Member
m-kuhn commented Nov 20, 2015

Thank you for making it possible to handle the last outstanding 3-digit pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment