Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Probable security issue through use of URLs in "url-or-path" #650
Similar to the attack vector described for paths here, I believe the use of URLs in Data Resources can lead to exfiltration of data about a network.
Consider the case of an API service that allows users to upload DataPackages and browse the contained resources' contents. For that, the service must download the contents from the URLs specified by the packages' Resources.
The attacker would upload a DataPackage containing a resource with URLs pointing to guessed IP numbers out of the non-routed ranges, eg.
and ask the service for the content of the Resource. The service would query 192.168.0.1 to fetch the index HTML, and if that server is running a HTTP-daemon, the API service would transfer the HTML back to the attacker. If the IP number is not used by a server, an error would be returned.
By slowly iterating the whole 10.x.x.x, 192.168.x.x and other ranges that are typically used for LAN's, the attacker could map out the LAN, and glean additional information about the servers by analyzing the exfiltrated HTML.
I am not sure about the real-world impact of this - on the one hand, a service typically would be running on either a hosted server not on a company's LAN or even on a container on a hosted server, which would blunt the mapping attack somewhat. On the other hand, it would be a probable attack for rogue employees/visitors of a data-science company that hosts data-pipelines for their data scientists and has a segmented network as part of a defense in depth. That segmentation would be weakened by such an attack as each employee with HTTP access to such a service could map out the segment the service resides in.
I don't think complete mitigation is possible. I believe the use of uncontrolled URLs is a fundamental weakness that allows all kinds of attacks (eg. a datapackage with thousands of URLs linking to very big Resource payloads would create an effective denial of service attack against either the API service or even the site hosting the payload files).
Ideally, a same-origin restriction should apply to Resource URLs to ensure site evilsite.com cannot DOS site filehoster.com via uploading packages to apiservice.com. This still would not prevent user-uploaded DataPackages to exfiltrate data.
Some incomplete protection could be achieved by blocking all Resource resolution for URLs pointing to IP numbers from the non-routable blocks.
I welcome feedback on this, maybe I am overlooking some points. Also, a strong warning for implementors should be in the DataResource specs and an equally strong warning in the user docs.
Personally, I believe it should be part of the spec that an implementing library has a switch that enables users of that library to disallow Resource-addressing via URLs. Only self-contained packages would be parsed.
@iSnow this kind of attack has similarities to that discussed and addressed in discussion of unix relative paths ...
Obviously, for fully qualified urls there is no simple way to exclude vulnerable paths. Basic starting point would be to disallow numeric IP address, localhost etc. And/or as you suggest to limit to "self-contained" URLs only.
Obviously general points about security apply: run this code on a system with appropriate permissions and we could flag that.
I think some comments on this in the specs would definitely be valuable. If @iSnow you have suggestions feel free to make them or even open a PR.
Thanks again for flagging this.
I hope I got the threat-matrix right but others should think this through, security topics are notoriously hard to nail down on the first try.