path invalid characters (on Jackrabbit) #32

dotZoki opened this Issue Nov 9, 2011 · 20 comments


None yet

4 participants

dotZoki commented Nov 9, 2011

Since Jackrabbit allows me to add files with "&" character in it I guess Jackalope would also have to allow me to read those paths.

Exception Message: Path is not well-formed or contains invalid characters: /edu/jhu/pha/www/einstein/stuff/einstein&music.pdf

dbu commented Nov 9, 2011

ups, sorry about that. yes, we did some things on that but seem to have missed something then. will try to find some time in the next few days.

lsmith77 commented Nov 9, 2011

i havent checked this, but does Jackrabbit allow umlauts?

dbu commented Nov 9, 2011

it allows really about everything different from :
but it encodes weird characters that are not xml like spaces. and it should make everything urlencoded which is probably the problem here. or maybe something else.

lsmith77 commented Nov 9, 2011

ok. would be great to get this fixed then. ran into this in a client project already bit assumed it was a spec issue

dbu commented Nov 9, 2011

the spec basically leaves it to the implementation to decide what they want to restrict.

dbu commented Nov 13, 2011

this is probably related: - adding it here and closing JACK-56 in the old tracker.

When jackalope reads a node with # in the name, the part from # on seems to be eaten away. missing escaping?
Jackrabbit can do it, so fixture loading for example works.

"testAddMixinOnNewNode with data set #1" becomes (note that the path ends with "...set")

PHPCR\RepositoryException: HTTP 403: Invalid path:/tests_write_nodetype/testAddMixinOnNewNode with data set


dotZoki commented Nov 30, 2011

I'm trying to understand this :)

First to escape special chars ((* Any Unicode character except: '/', ':', '[', ']', '*', ''', '"', '|' or any whitespace character *)) must be done to fit the JCR path requirement.

Second escaping is then done to fit the transport.

I guess that first level of escaping should be done in the PHPCR and second on the client implementation.

I was looking at Jackrabbit client of the jackalope, looks that all (both levels) escaping (and path validity checking) is done there, right?
UPDATE: jackalope escapes only for transport needs

UPDATE: Or it should be left to user to use util for escaping paths, before using them for reading/writing and jackalope should only check if they are valid and escape them correctly for transport

dbu commented Nov 30, 2011

would be awesome if you want to have a look into this! i have some ideas - and poke me on irc if you have questions or ask them here if i am not responding :-)

the jcr spec is here:
and specifically here

This definition of JCR name represents the least restrictive set of constraints permitted 
for the naming of items and other entities. A repository may further restrict the names of
entities to a subset of JCR names and in most cases is encouraged to do so. 

so basically, it tells us nothing much. i think what we should do in the phpcr-api write tests is have a test that loops through all possible character codes and creates and saves nodes with that character as a name. (concatenate some "a" to avoid issue on "."). then it tries to read that node with a new session.
adding the node or saving may throw ConstraintViolationException which according to the spec would be valid too.

on jackalope level, we need to fix the errors we find with that (that is make sure values are properly encoded when writing and reading).

i think in the non-transport part, should be no escaping needed. (or did you see the exportDocumentView code? thats something else again, converting names to valid xml element names...)

dotZoki commented Nov 30, 2011

The part where I said

Or it should be left to user to use util for escaping paths, before using them for reading/writing and jackalope should only check if they are valid and escape them correctly for transport

I see use of it at (and some code:

For transport escaping part code:

I was playing a bit with Client.php, and got it to read node if it had & in the name, but then I broke some other stuff, fixed that, broke something else etc. :) I guess I made client code to do too much work :)

dbu commented Nov 30, 2011

i think the code needing to be fixed is in Client::encodePathForDavex

i would recommend first writing the test to see what exactly fails.

dbu commented Dec 7, 2011

i identified the issue but don't have time to properly fix it now. if somebody feels like doing it, please do :-)


dbu commented Dec 28, 2011

@dotZoki will you follow up on this one again?

dotZoki commented Dec 29, 2011

I was working on it, but got a bit busy with other stuff for now.

If somebody else wants/has time to fix it, go for it :) otherwise I'll get to it when I get some time.

dbu commented Dec 29, 2011

would be cool if you get some time to finish it. i went through the tickets and was afraid i might had lost track of something you did and then somebody would re-do it...

lets just whoever gets to work on it put a quick heads-up into this ticket to avoid duplicated work ;-)


ping .. imho this is a very critical bug we need to fix ASAP

dotZoki commented Jan 24, 2012

Well, I can tell you what I know so far and if somebody else has time to fix/finish this.

Jackrabbit uses it's own written function for escaping folder names. :: private static String escape(String string, char escape, boolean isPath)

This is the way how JavaScript escapes URIs with the use of encodeURI().

PHP's rawulrencode doesn't work that way.

Char #128 is encoded by JS/Jackrabbit to %C2%80
and in PHP to %80

Creating function in PHP that does same encoding/decoding as Jackrabbit creates new problems in Jackalope. Jackalope uses PHP's basename() which is locale aware and thus causes wrong return of basename when using it on path that contains "non-standard" chars.

Escaping paths for davex is not problematic. When path is escaped for the needs of Jackrabbit, all that is needed to be done is to call htmlspecialchars() on that value (so it doesn't break XML).

Thoughts? :)

edit: link to check encoding difference

@dbu dbu closed this Jun 26, 2013
ivan1986 commented Jul 3, 2013

whis new version find bug
if node name starts on non ansi charset (rus in my example) starts non ansi charset remove from node name.

for example:
"test-тест" save as "test-тест"
"тест-test" save as "-test"

in old version rus symbols not works - this fixed :)

No, it's doctrine phpcr-orm bug

dbu commented Jul 3, 2013

@ivan1986 are you saying that rawurlencode eats away russian characters? this should not be the case afaik. can you please identify where exactly this happens, in the jackalope layer or the jackrabbit transport and open a new issue with your findings?

ivan1986 commented Jul 3, 2013

It's doctrine phpcr-orm

create issue in his tracker

@danrot danrot pushed a commit to danrot/phpcr-api-tests that referenced this issue Jun 11, 2015
@dbu dbu fix jackalope/jackalope#32 by properly urlencoding 7f812c1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment