-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project name sanitization #117
Comments
Percentage encoding them should also prevent name collisions. One thing to note however before any development on this bug. Last, I don't think these are security vulnerabilities. I remember "scrapyd2" being referenced by one of the maintainers |
Hi @Digenis , I don't consider using percentage encoding the right way. When we insert the project name at root page, we have to display as it is ( I would prefer an intermediary representation like I think that they are vulnerabilities since my scenario to do the audit was thinking about someone using scrapyd and running the spiders in containers, so the issues related to python code execution are restricted at container level and he could use scrapyd "safely". |
The base64 alphabet has "/" The example with the project name containing html Edit: diff --git a/scrapyd/website.py b/scrapyd/website.py
index fde1468..4bfa5c5 100644
--- a/scrapyd/website.py
+++ b/scrapyd/website.py
@@ -4,2 +4,4 @@ import socket
+from xml.sax.saxutils import escape as escape_html
+
from twisted.web import resource, static
@@ -67,3 +69,4 @@ class Home(resource.Resource):
vars = {
- 'projects': ', '.join(self.root.scheduler.list_projects()),
+ 'projects': ', '.join(
+ map(escape_html, self.root.scheduler.list_projects())),
}
@@ -122,5 +125,5 @@ class Jobs(resource.Resource):
s += "<tr>"
- s += "<td>%s</td>" % project
- s += "<td>%s</td>" % str(m['name'])
- s += "<td>%s</td>" % str(m['_job'])
+ s += "<td>%s</td>" % escape_html(project)
+ s += "<td>%s</td>" % escape_html(str(m['name']))
+ s += "<td>%s</td>" % escape_html(str(m['_job']))
s += "</tr>"
@@ -130,3 +133,3 @@ class Jobs(resource.Resource):
for a in ['project', 'spider', 'job', 'pid']:
- s += "<td>%s</td>" % getattr(p, a)
+ s += "<td>%s</td>" % escape_html(getattr(p, a))
s += "<td>%s</td>" % (datetime.now() - p.start_time)
@@ -140,3 +143,3 @@ class Jobs(resource.Resource):
for a in ['project', 'spider', 'job']:
- s += "<td>%s</td>" % getattr(p, a)
+ s += "<td>%s</td>" % escape_html(getattr(p, a))
s += "<td></td>"
|
I forgot to consider case insensitive filesystems. |
Since we are getting close to 1.2 |
moving to 1.3.0 to at least add deprecation warnings for project names with illegal characters |
I went through the blog post at #518. Note: anyone with access to Scrapyd's API can run arbitrary code on the server, via addversion.json and schedule.json. In any case, the issues are concentrated in:
For the first two, the concern is escaping the configured base directory (eggs_dir, logs_dir, items_dir) via the project, spider, version or job API parameters. For that, we can use the typical solution for a directory traversal attack, which is to resolve paths and then check for a common prefix. If it fails, we raise an error, which Scrapyd renders as 200 OK and a JSON message. If a user presently has a project, spider, etc. with two consecutive dots ".." between path separators in its name, they are already encountering bugs, so this change makes no difference to them. With this approach, a parameter with "/" or "\" can still descend the directory tree, but that's not a security issue. I suspect this would cause bugs, but I don't think any users actually put path separators in parameters, so it's not a priority to protect users from these bugs. I don't think any other characters would cause bugs. So, we don't need to sanitize the project, spider, etc. parameters in the API (which would be backwards incompatible), since the eggstorage and environ will error before anything insecure happens. There is the consideration that users might have configured their own filesystem-based eggstorage, and so adding these checks to the webservice can add a layer of protection, in case they didn't consider these issues. However, as I started with, Scrapyd is already a way to run arbitrary code, so this protection for user code is a bit paranoid. I fixed these issues in the commits mentioned in #518, so closing. |
As described here (point 1,2,3 and 5) there are some security issues related to project name. I've thought in a fix sanitizing the project name value using the same logic as with variable
version
:It would have to apply to every method working with project name in
FilesystemEggStorage
to have consistency adding and then getting projects. It adds as side efect that two projects with non-common characters likeproject!
andproject?
will share the same project nameproject_
. Does someone see a better solution?The text was updated successfully, but these errors were encountered: