Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Cleanup file uploads after request #871

Closed
Qard opened this Issue · 15 comments

5 participants

@Qard

I have an express app using bodyParser to parse forms and handle uploads. That's all well and good, but when a file is uploaded formidable creates a temp file that never gets cleaned up. Sure, I can manually iterate the req.files object and remove the files myself when I'm done with them. But what if someone tries to upload a file to a route that I'm not expecting uploads on? The bodyParser middleware will happily accept and store it, whether I want it or not, and my app will be unaware this is happening. Someone could easily exploit that to fill up your disk and kill your server. My own server just got killed by 5GB of unexpected files for this very reason.

There should probably be a res.on('end') event attached to remove all the files listed in req.files after the current request completes, or something like that. Actually, this could be a case for after-response hooks.

I recognize that is somewhat opinionated behavior though. Some apps may be expecting those files to still be there after the request completes, such as video processing. Probably not a good idea to make a change like that in a minor revision...

Thoughts?

@tj
Owner
tj commented

there's some discussion here strongloop/express#1673
we could optionally clean up after, but you'll need some sort of reaper in any production scenario in case node goes down etc they'll still accumulate. As far as the size goes there's the limit option

@rmg

Even just a mention about this in the docs would go a long way.

@Siyfion

Yeah, I think it's pretty important that we get this issue mentioned in the docs, even if only as a warning (even if the following is implemented).

An option to autoremove or similar would also be a worthwhile addition in my mind; especially if it was on by default. I imagine there are a lot of servers out there with this as an exploitable issue; while this wouldn't cure the issue, it would certainly help.

@jasonhargrove

I like @Siyfion 's solution of taking control back from bodyParser. For all routes:

app.use(connect.json());
app.use(connect.urlencoded());

and for file handling manually using:

connect.multipart();

Src: strongloop/express#1673 (comment)

I'm also debating working with Formidable directly.

And for general cleanup @visionmedia 's reaper project sounds great > https://github.com/visionmedia/reap

@Qard

Yeah, that seems like a sufficient fix. There should definitely be some more explicit documentation of the security concerns of using bodyParser globally though.

@tj
Owner
tj commented

you generally do this sort of thing for any web server out there, tmpfiles can't just sit around forever :D

@Qard

Obviously you should cleanup temp files from routes you expect to get them from. I think that, because express tends to train you into using middleware in the configure block, it's a good idea to explicitly state when a middleware probably shouldn't be used that way.

Also, It seems weird that someone that only put in bodyParser() for basic forms, and not uploads, would need to run something like reap to empty a directory they'd never expect to be dumping things in. Sure, using bodyParser() is probably bad practice, but someone new to express won't know that unless you tell them.

What if req.files was a getter? When accessed, it returns the files object and set a flag that it has been accessed. After the request ends, if the property was never accessed but files were uploaded, delete them.

@jasonhargrove

Definitely, but the two issues are separate though. The concern is that bodyParser may be doing a little too much, a little too quietly.

@tj
Owner
tj commented

bodyParser is all three for legacy reasons, no changing that until the next major release. Maybe what we do is go ahead and add that cleanup option, default it to true, it'll break a few apps probably but hopefully they read changelogs. Either way if that fails, or if node crashes, you will need tmpfile gc.

@jasonhargrove

Sorry about the duplicate. Just wanted to get the keywords there and emphasize the risk.

I like how you've used console to warn about deprecated behaviors. That, some docs, and some auto cleanup may be enough to restore balance to the force.

@Siyfion

@visionmedia I think you're right; a cleanup option that's defaulted to true would probably be a decent step in the right direction.

@rmg

If you add cleanup with a default of false, it won't break anything and you can release it in a minor version or patch bump. Then on the the next API breaking release, you can change the default to true and group it with the other breaking changes.

@tj
Owner
tj commented

I'd be fine with doing a 3.0.0, but most people expect lots of changes from a major release

@Qard

You could just update the docs to warn against using bodyParser() globally for now. Then, whenever the next version is released, include something to cleanup old files.

I think making req.files a getter should be a safe solution. Just keep a flag of whether or not req.files has been accessed. If it hasn't be used by the time the response is sent, assume it is now safe to delete the files. You could possibly also include a timeout threshold to delay the cleanup by X milliseconds after the response is sent.

Those ideas are really more band-aid fixes, the real issue is more documentation related.

@tj
Owner
tj commented

https://groups.google.com/forum/#!topic/express-js/iP2VyhkypHo

for now I didn't go with a cleanup option, I think it would be difficult to get right without adding some arbitrary timeouts, in which case you're better off using reap anyway. The "defer" option will probably become the default in 3.0 when we can do a break

@tj tj closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.