Skip to content

Allow local hosting of files (rather than S3) #3

Closed
jpallen opened this Issue Feb 21, 2014 · 24 comments
@jpallen
jpallen commented Feb 21, 2014

All s3 logic is contained in app/coffee/s3wrapper.coffee, so it should be possible to swap out this module with another module with the same API that instead reads and writes from disk.

@jpallen jpallen added the enhancement label Feb 21, 2014
@adarshaj

👍

@henryoswald

there is this offering from riak, I have never used it http://basho.com/riak-cloud-storage/

another option I know even less about is mongodb's gridfs

@thebookworm101

Other alternative storage solutions would include: swift <-- minimal untested knowledge implementing similar stuff.

@coreyp1
coreyp1 commented Feb 21, 2014

I have had good success with DreamHost DreamObjects, and wrote the Drupal integration for it based on the Drupal Amazon S3 integration (no, I'm not affiliated with DreamHost at all). Since Dreamhost is so much cheaper and Amazon, and it supports the S3 protocols (it is almost as simple as changing the URL), it is a much preferable for me to use DreamHost than Amazon.
http://www.dreamhost.com/cloud/dreamobjects/

This should be easy to implement if only the buildDefaultOptions allowed the uri to be customized.

@erasche
erasche commented Feb 21, 2014

Instead of writing a custom storage backend that writes to disk, why not make the URL configurable and suggest users provide their own S3-comaptible backend? After some digging, I've come up with a few:

@jpallen
jpallen commented Feb 21, 2014

I think the simplicity of local file storage makes it a good idea - no config needed, and no other services to add in.

However, it's also an easy change to make the amazon URL configurable, so I think we should do both.

@jpallen jpallen referenced this issue in sharelatex/sharelatex Feb 22, 2014
Closed

file does not compile #27

@tiagoboldt

+1 gridfs from mongo would be able to solve this issue and the project depends on mongo already. Integrating it using https://github.com/aheckmann/gridfs-stream should be trivial.

@tiagoboldt tiagoboldt referenced this issue in sharelatex/sharelatex Feb 24, 2014
Closed

Do not depend on S3 #47

@cwoac
cwoac commented Feb 24, 2014

Started having a look at implementing direct filesystem access (using the bucketName as a root path). Currently the S3 unit tests are what are tripping me up the most since they are checking the knox calls are correct.

@alfredosegundo

Same here @cwoac
The most difficult task for me is being implement the tests.

@henryoswald

@cwoac @alfredosegundo are you adding filesystem writing into the s3 wrapper? Ideally this will be a totally different file which just has functions with the same interface. We could then add another level of abstraction like a "filePersister" (for want of a better name). Then we can do something like below:

if settings.filePersitorType == "s3"
    persitor = require("./s3Wrapper")
else
    persitor = require("./localFileWrapper")

module.exports = persitor

opinions welcome!

@cwoac don't be afraid to add another value to settings, localFileStoragePath:"/tmp/was/a/bad/idea/"

@cwoac
cwoac commented Feb 25, 2014

@henryoswald I coded up a version that used that approach (switch rather than conditional, but hey), but I figured that was a bit nasty and have been trying to do it via a mixin. So you require fsWrapper and that sets its functions to those of the correct wrappers.

I've got that working, although the unit tests bypass it and sinon wrap it directly. I attempted to create a testBackend, but can't see a decent way to override the real config setting from within the module. Actually that gives me an idea...

@henryoswald

@cwoac if you want a hand send me a reference to the code and I can comment away.

@jpallen
jpallen commented Feb 25, 2014

Also, as a side note, we try to design our code structures so that they are easy to unit test. It's one of the main factors in how ShareLaTeX is coded (all flat files, no classes, simple short functions, etc).

@jpallen jpallen referenced this issue in sharelatex/sharelatex Feb 25, 2014
Closed

Alternative to Amazon S3 #50

@cwoac
cwoac commented Feb 25, 2014

I believe I have switching the backend support working (the one thing I lack is an actual s3 key setup to test with, but the mixin certainly works for test calls. I'll raise a pull request as it is probably the easiest way to discuss the actual code.

Once we've got this sorted, I can start looking in earnest at the filesystem backend

@cwoac
cwoac commented Feb 26, 2014

Right, so I've created a branch and built basic unit tests for FSPersistorManager here: 3efb4c8

Beyond the fact that the unit tests pass I have done NO testing on this whatsoever and I have no doubt that it will not work.

In particular:
1. I assume that keys can always be used as filenames (i.e. no subdirectories in the key names).
2. I've yet to look at who/how keys are generated.
3. It uses the s3 settings for bucket names to give the directories where it will store data (I could move that into a setting and ignore the first parameter passed to all the functions, but this seemed nasty, plus it would remove the option to store user and template files in separate locations).
4. no checking that the target directory exists and is writable, etc.

Having said all that, it probably isn't going to require that much work to get it working. I'll have a chance later (today, hopefully) to actually try integrating it into a docker setup I've built for it.

@jpallen
jpallen commented Feb 26, 2014

Great work @cwoac! I think it's safe to assume keys can be used as filenames, since they are just the 24 character objectid used by MongoDB. Can @henryoswald confirm? This is mostly his domain atm.

I think using the s3 bucket name settings is a good idea. If we set these up with defaults then users running the FS backend won't even need to know they exist, and we can just have separate directories called somethings like 'user_files' and 'templates'.

What sort of docker set up have you got? Sounds interesting if you're able to share.

@pablomendes

Just came here to +1 this. 👍 Will instructions on how to run be posted here? I'm eager to try it out.

@henryoswald

yeah the mongo id's are safe to use, its nice to use them also for debugging.

@cwoac
cwoac commented Feb 27, 2014

I'm running redis and mongodb both in their own (rather simple) containers, mapping the db directories to a dir on the host and sharing the port with the container that will run sharelatex - not got that up and running yet, I dived straight into this instead.

Having taken a bit of a look at the master configuration setup now, I reckon I can split each module off into separately running docker instances. I'm not 100% sure it would gain me much on deployment, but would certainly help spin-up/down for development.

If we make the settings universal, probably an idea to rename them from s3. I'll shuffle them around to make them more sensible at some point.

@cwoac
cwoac commented Feb 27, 2014

So, it seems the keys can't be used directly as filenames - I'm not sure if it is bucketing them or just they have a slash in the name.

[2014-02-27T18:27:43.302Z]  INFO: filestore/23013 on albion: reciving request to get file (key=530f2407e7ef165704000007/530f838b46d9a9e859000008, bucket=/tmp/storelatex)
[2014-02-27T18:27:43.303Z]  INFO: filestore/23013 on albion: getting file (bucket=/tmp/storelatex, key=530f2407e7ef165704000007/530f838b46d9a9e859000008, opts={})
@henryoswald

yeah keys will have a slash in them as they are project_id/file_id, you can do:

  key =  key.replace(/\//g,"-")
@cwoac
cwoac commented Feb 27, 2014

Yeah, I've already pulled it out into a helper function.
Shockingly, apart from that, it appears to work, at least on initial inspection. More testing required before pull though.

@jpallen
jpallen commented Mar 5, 2014

This is now possible thanks to the great work by @cwoac. Local file storage is the default now so that ShareLaTeX should just run out of the box, with no external config settings needed.

@jpallen jpallen closed this Mar 5, 2014
@MaximeCheramy

I've just tested, the config is easy and it works. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.