Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entities are not decoded #276

Open
PVince81 opened this issue Sep 30, 2021 · 6 comments
Open

Entities are not decoded #276

PVince81 opened this issue Sep 30, 2021 · 6 comments

Comments

@PVince81
Copy link

See https://github.com/perry-mitchell/webdav-client/blob/v4.7.0/source/tools/dav.ts#L80-L81

it seems they used to and the comment says it will be decoded "manually later" but I didn't find any location in the code where it's done

@perry-mitchell
Copy link
Owner

perry-mitchell commented Oct 10, 2021

Hi @PVince81 - I'm looking into this now. From a first glance it looks quite easy to solve - adding such a function to support safe decoding of all values early on would help:

export function decodeURLComponentSafely(value: string): string {
    if (!value) return value;
    try {
        return decodeURIComponent(value.replace(/%(?![0-9a-fA-F]+)/g, "%25"));
    } catch (err) {
        throw new Layerr(err, `Failed decoding value: ${value}`);
    }
}

And then I could add it to the parseXML call:

export function parseXML(xml: string): Promise<DAVResult> {
    return new Promise(resolve => {
        const result = xmlParser.parse(xml, {
            arrayMode: false,
            ignoreNameSpace: true,
            tagValueProcessor: val => decodeHTMLEntities(decodeURLComponentSafely(val))
        });
        resolve(normaliseResult(result));
    });
}

The problem is, what if the target file is literally named test%20file, which is a valid Linux filename. This is now decoded automatically returning an incorrect result set. I have everything working except this one facet.. Once I figure it out I'll push a fix.

In the meantime a sample response payload (XML) with the encoded entities would be helpful.

@PVince81
Copy link
Author

@perry-mitchell thanks a lot for looking into this

I think we should not mix up concepts.
This ticket is about decoding HTML/XML entities, not URL-decoding.

Just to answer your question about legit files called test%20file, I've attached an example Webdav response from a Nextcloud PROPFIND, see "d:href" tags:

<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns" xmlns:nc="http://nextcloud.org/ns">
  <d:response>
    <d:href>/remote.php/dav/files/admin/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:57 GMT</d:getlastmodified>
        <d:getetag>"6165285928a69"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>2</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>164</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%20regular%20space/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:14:25 GMT</d:getlastmodified>
        <d:getetag>"616527c16531d"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>113</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%22doublequote/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:57 GMT</d:getlastmodified>
        <d:getetag>"6165285926070"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>117</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%2520percent%2520twenty/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:14:00 GMT</d:getlastmodified>
        <d:getetag>"616527a88cdb6"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>112</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%27apostrophe/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:32 GMT</d:getlastmodified>
        <d:getetag>"6165284098733"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>114</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%3clowerthan/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:43 GMT</d:getlastmodified>
        <d:getetag>"6165284b826b5"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>115</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%3egreaterthan/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:50 GMT</d:getlastmodified>
        <d:getetag>"6165285200f71"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>116</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/welcome.txt</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:12:12 GMT</d:getlastmodified>
        <d:getetag>"cd3b89f9796ffb477bcf847e147cd154"</d:getetag>
        <d:getcontenttype>text/plain</d:getcontenttype>
        <d:resourcetype/>
        <oc:fileid>3</oc:fileid>
        <oc:permissions>RGDNVW</oc:permissions>
        <oc:size>164</oc:size>
        <d:getcontentlength>164</d:getcontentlength>
        <nc:has-preview>true</nc:has-preview>
        <nc:mount-type/>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">19</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:quota-available-bytes/>
        <nc:is-encrypted/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
</d:multistatus>

However I think the library should leave these as is.

And here's a Webdav response from retrieving comments which contain special characters which SabreDAV converted to entities while serializing to XML. See oc:message tags:

<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns" xmlns:nc="http://nextcloud.org/ns">
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/4</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>4</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:24:12 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>higher than &amp;gt;</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/3</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>3</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:24:07 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>lower than &amp;lt;</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/2</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>2</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:23:56 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>a single quote '</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/1</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>1</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:23:49 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>a double quote &quot;</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
</d:multistatus>

@PVince81
Copy link
Author

hmm weird, for &quot; I understand, but why would the lower character be encoded as &amp;lt instead of &lt;.

@Pytal I remember you spoke of double encoding, maybe it's a bug ?

@Pytal
Copy link
Contributor

Pytal commented Oct 12, 2021

hmm weird, for &quot; I understand, but why would the lower character be encoded as &amp;lt instead of &lt;.

@Pytal I remember you spoke of double encoding, maybe it's a bug ?

Yeah, discovered the strange occurrence while investigating nextcloud/server#28243, seems to be a bug 🐞

@perry-mitchell
Copy link
Owner

perry-mitchell commented Oct 20, 2021

Thanks @PVince81 - I'd missed your comment on the Nextcloud issue. So it seems that the issue stems from the XML parser I'm using. What would you recommend?

Changing parser is a big deal.. and finding one as small and efficient as this one would be tricky I suspect (though I haven't looked recently). Happy to consider alternatives.

EDIT: Thanks for the XML too, I'll have a play with the parser options to see if a combination of he or another tool can help.

@perry-mitchell
Copy link
Owner

I had a play and couldn't find any alternate configuration for the parser that seemed to work in every test case I have. I'd be happy if someone wanted to add a test case or MR that outlines a fix for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants