Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CDN field to the S3 configuration #28

Closed
ganigeorgiev opened this issue Jul 8, 2022 Discussed in #26 · 14 comments
Closed

Add CDN field to the S3 configuration #28

ganigeorgiev opened this issue Jul 8, 2022 Discussed in #26 · 14 comments

Comments

@ganigeorgiev
Copy link
Member

ganigeorgiev commented Jul 8, 2022

Add an optional CDN field to the S3 configuration so that the files api could redirect to it in order to avoid the quotas and rate limits that some S3 providers apply.

@pataquets
Copy link

As a workaround, here's an idea: put some reverse proxy in front which will rewrite the returned URL to that of the CDN.
Depending on how different is the object's CDN URL from the original so it can be easily derived, you might want to give it a try.
Or maybe some middleware like min.io can handle this.
Anyway, the feature would be nice to have.

@dizys
Copy link

dizys commented Nov 10, 2022

Or a optional URL rewrite rule config, e.g.: https://mycdn.com/{BUCKET}/{KEY}

@ganigeorgiev
Copy link
Member Author

ganigeorgiev commented Mar 31, 2023

This unfortunately ended up more troublesome than I initially anticipated.

I've implemented locally a slightly modified version of @dizys's PR#28, but I've stumbled on several issues.

Redirecting to a CDN endpoint will require the objects to be public, but unfortunatelly not all S3 providers have options to control globally the visibility of an entire bucket or prefix. One workaround I found was to upload each individual blob with special write options specifying the ACL public-read but even this is not guaranteed to be supported by every S3 provider (it will also conflict with the planned private/protected files support in #215, but I guess we can ignore it for this specific case).

Another bigger and blocker issue that I've stumbled on is that the uploaded files may not be always immediately available after upload when accessing the CDN endpoint and we'll get 40x when retrieving the file or when just sending a HEAD request to check for the file existence.
I've tested it with AWS CloudFront and Digital Ocean Spaces CDN and both have the above issue. This is not isolated only for the main uploaded file, but also affects the lazy thumbs generation.

To summarize - I'm not sure if implemententing it as it is will be a good idea since it is not reliable enough and it is not guaranteed to work with all S3 compatible vendors.

Depending on the users demand, I may revisit it again in the future, but for now I'll close the issue and the recommended workaround to reduce the number of S3 API calls would be to enable /api/files/* cache on a reverse proxy level.

@danibjor
Copy link

Files not being available is also an issue on local file storage.

Scenario I’ve tested locally. Add subscription on a collection. Upload new record that contains file to this collection. When subscription event receives client, fetch record again with expanded relations, then render on screen. Most of the time image returns 404, but a 2-300ms delay before fetching the record solves it for my case.

@ganigeorgiev
Copy link
Member Author

ganigeorgiev commented Mar 31, 2023

@danibjor Hm, this is because the record model is saved before uploading the files (the realtime events are triggered right after the record model save) to minimize the transaction locking times and to allow developers to specify a custom record id within a OnModelBeforeCreate hook.

I've added to the roadmap to consider refactoring the record save operations order.

@danibjor
Copy link

danibjor commented Mar 31, 2023

It’s probably an edge case I’ve stumbled into, but something to be aware of. These are 5-10MB photos, so shuffling bytes on the disk takes the time it take. And practically no latency between the client and server don’t help on this issues as well.

@ganigeorgiev
Copy link
Member Author

@danibjor No, I don't think it is disk related. The issue you are experiencing is only with the realtime events because they are fired on success SaveRecord(), before the actual files upload. The problematic part of the code is https://github.com/pocketbase/pocketbase/blob/master/forms/record_upsert.go#L746-L772.

Tomorrow I'll investigate it in more details and I'll try to submit a fix.

@ganigeorgiev
Copy link
Member Author

@danibjor The issue with the realtime events firing before the file upload should be fixed in https://github.com/pocketbase/pocketbase/releases/tag/v0.14.1.

@danibjor
Copy link

danibjor commented Apr 1, 2023

@danibjor The issue with the realtime events firing before the file upload should be fixed in https://github.com/pocketbase/pocketbase/releases/tag/v0.14.1.

Did a quick test, seems to have resolved the race conditions that I saw earlier on.

@jgb-solutions
Copy link

jgb-solutions commented Apr 29, 2023

hey @ganigeorgiev I'm trying to rewrite the url based on the filename. But in my cloud provider I see the uploaded file has 2 parent folders, parent-with-id-I-dont-know/record-id/filename.mp3
Screenshot 2023-04-28 at 8 00 59 PM

Any idea how to get this first folder name with the ID I don't know? I thought it was the current logged in user's ID but that's not the case. Note that in an earlier version it used to name that parent folder _pb_users_auth_.

Thanks!

@ganigeorgiev
Copy link
Member Author

@jgb-solutions The first "folder" from the object prefix is the collection id and it will differ for each collection (_pb_users_auth_ is the id of the default users collection)

@jgb-solutions
Copy link

jgb-solutions commented Apr 29, 2023

@ganigeorgiev ok thanks! I'm starting to deeply learn and integrate this software into my stack. It's great! Keep improving it.

Quick question while we are at it: is it possible to bypass the server entirely and directly upload in the cloud? That's what I was thinking of doing for an existing app. Upload to cloud client side and save the filename in pocketbase.

@gedw99
Copy link

gedw99 commented Jul 23, 2023

Fire an event off s3 for any file crud. Then Pocketbase knows when the file is there and can toggle the visibility ?

@jgb-solutions
Copy link

@gedw99 can you elaborate on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

6 participants