Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probable security issue through use of URLs in "url-or-path" #650

iSnow opened this issue Nov 15, 2019 · 4 comments

Probable security issue through use of URLs in "url-or-path" #650

iSnow opened this issue Nov 15, 2019 · 4 comments


Copy link

@iSnow iSnow commented Nov 15, 2019

Similar to the attack vector described for paths here, I believe the use of URLs in Data Resources can lead to exfiltration of data about a network.


Consider the case of an API service that allows users to upload DataPackages and browse the contained resources' contents. For that, the service must download the contents from the URLs specified by the packages' Resources.

Attack scenario

The attacker would upload a DataPackage containing a resource with URLs pointing to guessed IP numbers out of the non-routed ranges, eg.

  "path": "",
  "mediatype": "text/html"

and ask the service for the content of the Resource. The service would query to fetch the index HTML, and if that server is running a HTTP-daemon, the API service would transfer the HTML back to the attacker. If the IP number is not used by a server, an error would be returned.

By slowly iterating the whole 10.x.x.x, 192.168.x.x and other ranges that are typically used for LAN's, the attacker could map out the LAN, and glean additional information about the servers by analyzing the exfiltrated HTML.

I am not sure about the real-world impact of this - on the one hand, a service typically would be running on either a hosted server not on a company's LAN or even on a container on a hosted server, which would blunt the mapping attack somewhat. On the other hand, it would be a probable attack for rogue employees/visitors of a data-science company that hosts data-pipelines for their data scientists and has a segmented network as part of a defense in depth. That segmentation would be weakened by such an attack as each employee with HTTP access to such a service could map out the segment the service resides in.

Attack mitigation

I don't think complete mitigation is possible. I believe the use of uncontrolled URLs is a fundamental weakness that allows all kinds of attacks (eg. a datapackage with thousands of URLs linking to very big Resource payloads would create an effective denial of service attack against either the API service or even the site hosting the payload files).

Ideally, a same-origin restriction should apply to Resource URLs to ensure site cannot DOS site via uploading packages to This still would not prevent user-uploaded DataPackages to exfiltrate data.

Some incomplete protection could be achieved by blocking all Resource resolution for URLs pointing to IP numbers from the non-routable blocks.

Action points

I welcome feedback on this, maybe I am overlooking some points. Also, a strong warning for implementors should be in the DataResource specs and an equally strong warning in the user docs.

Personally, I believe it should be part of the spec that an implementing library has a switch that enables users of that library to disallow Resource-addressing via URLs. Only self-contained packages would be parsed.

@roll roll added this to Specifications in Frictionless General Nov 18, 2019
Copy link

@roll roll commented Nov 18, 2019

Greate catch @iSnow !

@iSnow iSnow changed the title Proable security issue through use of URLs in "url-or-path" Probable security issue through use of URLs in "url-or-path" Nov 18, 2019
@iSnow iSnow mentioned this issue Nov 18, 2019
Copy link

@lwinfree lwinfree commented Nov 25, 2019

Thanks @iSnow!
Hey @rufuspollock + @pwalsh Could y'all please take a look at this & comment on next steps? Thx!

Copy link

@rufuspollock rufuspollock commented Nov 25, 2019

@iSnow this kind of attack has similarities to that discussed and addressed in discussion of unix relative paths ...

POSIX paths (unix-style with / as separator) are supported for referencing local files, with the security restraint that they MUST be relative siblings or children of the descriptor. Absolute paths (/) and relative parent paths (../) MUST NOT be used, and implementations SHOULD NOT support these path types.

Obviously, for fully qualified urls there is no simple way to exclude vulnerable paths. Basic starting point would be to disallow numeric IP address, localhost etc. And/or as you suggest to limit to "self-contained" URLs only.

Obviously general points about security apply: run this code on a system with appropriate permissions and we could flag that.

I think some comments on this in the specs would definitely be valuable. If @iSnow you have suggestions feel free to make them or even open a PR.

Thanks again for flagging this.

Copy link
Contributor Author

@iSnow iSnow commented Nov 26, 2019

Thanks, @rufuspollock - I opened a PR as a first draft of a security abstract. It is not meant to be a canonical source, but I'd invite discussion of the best practices I outlined:

  • Implementers should provide the means for the library consumer to weigh security vs features by creating a extensible security interceptor that can allow or disallow all resource loading. Further, Implementers should provide at least default implementations that allow all/block all URLs.

  • Users get a threat-breakdown and should choose their level of security.

I hope I got the threat-matrix right but others should think this through, security topics are notoriously hard to nail down on the first try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Frictionless General

No branches or pull requests

4 participants