Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement localLRUFSLayer #126

Merged
merged 7 commits into from
Sep 13, 2021
Merged

feat: implement localLRUFSLayer #126

merged 7 commits into from
Sep 13, 2021

Conversation

jimisaacs
Copy link
Collaborator

@jimisaacs jimisaacs commented Sep 9, 2021

I saw the LRU item on the roadmap, and I was already implementing this for my use case, so figured I'd attempt to share. Uses https://github.com/dgraph-io/ristretto as a caching layer built on top of the local file system.

Basically the implementation is a naive('ish), layered, hydrated approach. There's definitely room for improvement, but this seems to at least cap the local storage without much overhead.

Currently this is not trying to improve performance that much. I do have some thoughts there, like exploring a multi layered cache with this approach. Where the top layer is in-memory, the next layer is local files, and the next layer is a remote cache. So if there is a cache miss on the top layer, it would check the local disk, if that's a miss it could go to the remote cache, if that's a miss it would run the build. A hit in any sub layer it would resulting in saving to higher layers.

Even if you do not like this approach, I figured it would be good to iterate on this. There are soooo many competing golang virtual file system approaches, and even more caching approaches. I got lost for a little bit in all this until I settled on jumping in and just going this route.


Something you will notice, is that I changed the public interfaces a bit because I wanted a shared logger, and the isDev flag value down in the cache layer.

Another big change is that I switched where modtime comes from. Instead of it coming from ReadFile it now comes from Exists. This is because the latter was already doing a stat, so the modtime is essentially for free there, and this removes one extra stat in all the ReadFile calls, because everywhere it is called, it is following an Exists call. This came wth an added advantage of working with the cache layering approach.

}

if !fs.cache.SetWithTTL(name, &localCachedFSValue{name: name, modtime: time.Now()}, written, fs.TTL) {
fs.remove(name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like having to do this, but I felt worse reading the data just to get the size, so I could set in cache, before I do a copy. It's all very weird, but if we had the size as input, I would feel much better, which to be honest, I think we do. All the consumers of this function are working with Buffer and []byte, so they all have a size. I could leave this method just to future proof the case of not having a size, will see.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, latest change adds WriteData method which is complementary to WriteFile. Though since all this is specific to esm.sh, I'm close to just replacing WriteFile with the WriteData signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteData makes sence if we storage esm.sh builds to shared space. the offical esm.sh server use a NFS disk as the storage that can be extended dynamic and can be shared in multiple mechines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still mentally going a little back and forth on just removing / replacing the current WriteFile signature with WriteData, though I'm probably overthinking it.

@ije
Copy link
Member

ije commented Sep 10, 2021

thanks @jimisaacs, super! i will talk a look later.

@ije
Copy link
Member

ije commented Sep 10, 2021

btw, i just released v49 includes your previous amazing works! thanks a lot 🙏

@jimisaacs
Copy link
Collaborator Author

Just putting these thoughts down. I think an ideal here would be storage with a local cache backed by s3, with a db that is centralized and scalable, but I'm not sure if I'm going to include all that in this PR. Was just trying to stay focused on keeping the disk managed.

server/storage/fs_local.go Outdated Show resolved Hide resolved
@ije
Copy link
Member

ije commented Sep 10, 2021

Just putting these thoughts down. I think an ideal here would be storage with a local cache backed by s3, with a db that is centralized and scalable, but I'm not sure if I'm going to include all that in this PR. Was just trying to stay focused on keeping the disk managed.

it is great if you can implement the s3 FS layer, and it's ok for me to accept a large pr.

@jimisaacs jimisaacs changed the title feat: implement localCacheFSLayer feat: implement localLRUFSLayer Sep 12, 2021
@jimisaacs
Copy link
Collaborator Author

@ije ok, please have a look at this again when you get a chance

@jimisaacs
Copy link
Collaborator Author

@ije one thing I'm worried about here is the default being LRU or not. I think it should be for the general case of open source, but not for the official server. I think it should remain on local, and you should decide how to handle a transition administratively. Let me know your thoughts.

@ije
Copy link
Member

ije commented Sep 12, 2021

yes, let's keep the local fs!

@jimisaacs
Copy link
Collaborator Author

yes, let's keep the local fs!

Done

@@ -0,0 +1,23 @@
package storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love this instead of passing options in Open, great job 👍

@ije ije merged commit 55de0b1 into esm-dev:master Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants