Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does anyone want/need an S3/Azure blob cache? #37

Open
lilith opened this issue Dec 6, 2020 · 16 comments
Open

Does anyone want/need an S3/Azure blob cache? #37

lilith opened this issue Dec 6, 2020 · 16 comments

Comments

@lilith
Copy link
Member

lilith commented Dec 6, 2020

No description provided.

@ajbeaven
Copy link

ajbeaven commented Dec 8, 2020

This would be great assuming latency was in acceptable realms. My Imageflow cache is currently sitting on disk in an Azure App Service, but as far as I know, this sort of set up is not recommended by Microsoft.

@JayVDZ
Copy link

JayVDZ commented Dec 8, 2020

Possibly. Azure/AWS CDN would be my go-to for caching in the first instance and then falling back to internal caching, I.e. disk caching, but use of local drives is not recommended by Azure as @ajbeaven says, so for Azure it might make sense to be able to store long-lived cached copies of resized images.

I wouldn't bother though if it was going to be time consuming to develop or maintain.

@lilith
Copy link
Member Author

lilith commented Dec 10, 2020

Since Azure and S3 offer object expiry, I could rely on developers to configure cache expiry and just implement cache misses and hits. That would be rather straightforward/easy to develop.

@AlexMedia
Copy link
Contributor

Yes please! I'm a great proponent of this as it would greatly reduce my bandwidth bill, without me having to give much consideration to local storage needs.

The storage fees per GB are incredibly low, so I'd probably just never have it remove old entries.

@AlexMedia
Copy link
Contributor

I was a little bored so I decided to have a go at this myself. I've taken some code from the DiskCache provider to generate the unique cache keys, and I then use the AWS SDK to get/put the objects in S3. This is all very rough around the edges (e.g. I haven't implemented any locking) hence why I'm sharing this as a gist rather than a pull request.

https://gist.github.com/AlexMedia/ccabfa4d766bc9991fad1f04af561584

Example usage (using AWSSDK.Extensions.NETCore.Setup):

var awsOptions = Configuration.GetAWSOptions();

services.AddImageflowS3Cache(() => awsOptions.CreateServiceClient<IAmazonS3>(),
  new S3CacheOptions
  {
	  BucketName = "imageflow-s3-cache",
	  Prefix = "cache"
  });

@lilith
Copy link
Member Author

lilith commented Mar 30, 2021

@AlexMedia That looks great!

A couple tips:

  1. You can reuse the s3 client, it's thread-safe. Amazon suggests reuse actually. So if you want, you can even rely on the DI container registered s3 client.
  2. I don't think you really need locking, since it would be at most a throughput optimization benefit.
  3. I saw you're storing but not checking the status code from GetObject. I haven't checked to see if AWS already validates this.
  4. The big performance benefit on misses would be to return the result while uploading in the background.

In HybridCache, I use a size-bounded collection for async uploads/writes. I switch back to sync writes when I hit the configured limit so at least there is thread-based backpressure.

https://github.com/imazen/imageflow-dotnet-server/blob/main/src/Imazen.HybridCache/AsyncWriteCollection.cs
https://github.com/imazen/imageflow-dotnet-server/blob/main/src/Imazen.HybridCache/AsyncWrite.cs

https://github.com/imazen/imageflow-dotnet-server/blob/main/src/Imazen.HybridCache/AsyncCache.cs#L276-L285
https://github.com/imazen/imageflow-dotnet-server/blob/main/src/Imazen.HybridCache/AsyncCache.cs#L418-L430

If you use the collection but no extra locking, you may have overlapping threads doing the same work, but at least you won't have repeated work when completed work is being uploaded.

@lilith
Copy link
Member Author

lilith commented Mar 30, 2021

BTW, are you using this in production or have you done any latency testing?

@AlexMedia
Copy link
Contributor

I wrote it as a quick-fix solution to keep bandwidth costs under control. I saw that a lot of bandwidth from my S3 compatible storage account came from the bucket which holds my images, which had an effect on costs. By storing every resized image ever generated I can easily keep my costs under control more easily.

I have it running in production (behind Cloudflare) but it's by no means a finished product. I appreciate your input, when I have time I'll take a look at whether I can optimise the code and maybe change this into a pull request :)

@rudym
Copy link

rudym commented Feb 9, 2022

having the ability to use Azure blob as a cache space will be our main incentive to move from the current image resizer to imageflow

@lilith
Copy link
Member Author

lilith commented May 1, 2022

@AlexMedia Have you done additional work on this, or do have an updated gist?

@AlexMedia
Copy link
Contributor

@lilith I'm afraid I haven't, I've moved on to other projects and this has kind of fallen to the side.

@keremdemirer
Copy link

Anyone has an update on this?

@lilith
Copy link
Member Author

lilith commented Mar 30, 2023 via email

@keremdemirer
Copy link

This one seems essential. Especially for apps using large source files.

I used resizer 3 and 4 with love a long time ago. I need to brush up my knowledge about the api's however would like help if you can guide me.

@keremdemirer
Copy link

@lilith What do you think about azure file shares?

@lilith
Copy link
Member Author

lilith commented Aug 9, 2023

@keremdemirer They might work with HybridCache as-is, if latency is low enough. Have you tried them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants