-
-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement localLRUFSLayer #126
Conversation
server/storage/fs_localCache.go
Outdated
} | ||
|
||
if !fs.cache.SetWithTTL(name, &localCachedFSValue{name: name, modtime: time.Now()}, written, fs.TTL) { | ||
fs.remove(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like having to do this, but I felt worse reading the data just to get the size, so I could set in cache, before I do a copy. It's all very weird, but if we had the size as input, I would feel much better, which to be honest, I think we do. All the consumers of this function are working with Buffer and []byte, so they all have a size. I could leave this method just to future proof the case of not having a size, will see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, latest change adds WriteData method which is complementary to WriteFile. Though since all this is specific to esm.sh, I'm close to just replacing WriteFile with the WriteData signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WriteData
makes sence if we storage esm.sh builds to shared space. the offical esm.sh server use a NFS disk as the storage that can be extended dynamic and can be shared in multiple mechines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still mentally going a little back and forth on just removing / replacing the current WriteFile
signature with WriteData
, though I'm probably overthinking it.
thanks @jimisaacs, super! i will talk a look later. |
btw, i just released v49 includes your previous amazing works! thanks a lot 🙏 |
71075ab
to
6b9c881
Compare
Just putting these thoughts down. I think an ideal here would be storage with a local cache backed by s3, with a db that is centralized and scalable, but I'm not sure if I'm going to include all that in this PR. Was just trying to stay focused on keeping the disk managed. |
it is great if you can implement the s3 FS layer, and it's ok for me to accept a large pr. |
6b9c881
to
3e68e04
Compare
@ije ok, please have a look at this again when you get a chance |
@ije one thing I'm worried about here is the default being LRU or not. I think it should be for the general case of open source, but not for the official server. I think it should remain on |
yes, let's keep the |
2753d52
to
fb00e42
Compare
Done |
@@ -0,0 +1,23 @@ | |||
package storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i love this instead of passing options in Open
, great job 👍
I saw the LRU item on the roadmap, and I was already implementing this for my use case, so figured I'd attempt to share. Uses https://github.com/dgraph-io/ristretto as a caching layer built on top of the local file system.
Basically the implementation is a naive('ish), layered, hydrated approach. There's definitely room for improvement, but this seems to at least cap the local storage without much overhead.
Currently this is not trying to improve performance that much. I do have some thoughts there, like exploring a multi layered cache with this approach. Where the top layer is in-memory, the next layer is local files, and the next layer is a remote cache. So if there is a cache miss on the top layer, it would check the local disk, if that's a miss it could go to the remote cache, if that's a miss it would run the build. A hit in any sub layer it would resulting in saving to higher layers.
Even if you do not like this approach, I figured it would be good to iterate on this. There are soooo many competing golang virtual file system approaches, and even more caching approaches. I got lost for a little bit in all this until I settled on jumping in and just going this route.
Something you will notice, is that I changed the public interfaces a bit because I wanted a shared logger, and the
isDev
flag value down in the cache layer.Another big change is that I switched where
modtime
comes from. Instead of it coming fromReadFile
it now comes fromExists
. This is because the latter was already doing a stat, so themodtime
is essentially for free there, and this removes one extra stat in all theReadFile
calls, because everywhere it is called, it is following anExists
call. This came wth an added advantage of working with the cache layering approach.