Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Cloud-based sitemap storage #84

Closed
vsychov opened this issue May 6, 2022 · 8 comments
Closed

Feature Request: Cloud-based sitemap storage #84

vsychov opened this issue May 6, 2022 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@vsychov
Copy link

vsychov commented May 6, 2022

Hello,

Feature request

Thanks for you great work on this beautiful plugin.

For now plugin don't very support cloud-based deployments, and it's will be nice to have it, main problems:

  • if we have multiple instance of strapi launched, there is no way to restrict only one of them to generate sitemap's (e.g. dedicated sitemap generator worker), it can be solved by some CLI command for generate sitemaps
  • would be nice to have automatic sitemap upload to some cloud storages (and way to configure it over config), e.g. gcs or s3
  • there is no way to set sitemap hostname in config (and would be nice to have way to override any settings in config), and deny to change it in admin panel by users
@boazpoolman
Copy link
Member

Hi @vsychov !

Thanks for the feature request.
I'm not too familiar with cloud-based deployments so I hope you don't mind me asking some follow up questions.

  • The CLI we can easily create. Seems like a good suggestion. Though can you better explain to me how you would use that? Because right now the sitemap is generated on save of the pages in strapi with lifecycle methods.
  • What do you mean with sitemap upload? Having it stored somewhere else, instead of in the Strapi public folder. Why would you want that. Why is that needed?
  • Why do you want to override the settings in config? Can't you use the config sync plugin to persist the settings? Or do you have a different usecase?

@boazpoolman boazpoolman added enhancement New feature or request Needs investigation Further information is requested labels May 6, 2022
@vsychov
Copy link
Author

vsychov commented May 6, 2022

Hi @boazpoolman,

Probably I need explain my setup first there is graphic schema.
I'm using strapi inside docker image, that running in k8s cluster, as a deployment.
I have multiple instances on strapi container and LB, that balance traffic between them. All data inside strapi container is immutable (temporary data stored in ephemeral-volumes, images uploaded to image library goes to cloud file storage by strapi-provider-upload-google-cloud-storage.

Generating sitemap in each container separately can cause problems (performance and race-conditions).
Also generating sitemap on save can cause performance issues in case of huge amount of records in DB and huge sitemap, but it not big issue if auto-generation disable, and generation happens in background by CLI command.

So better - dedicate special worker that will be responsible for sitemap generation and upload it to some external storage (and create some special rule for sitemap on LB).

Hope answer above in enough to clarify case 1 and 2.

About case 3: config sync plugin works great, while you have same config for multiple envs, but it's going more complicated when you need specify different hostname for each env, e.g. you can have env for dev, for test and for production, each of them will have different hostname, it's easy to handle by using env varibles in config, but more complicated to handle when it's in site settings.

@boazpoolman
Copy link
Member

boazpoolman commented May 9, 2022

@vsychov Thanks for the explanation. This makes it very clear.

1
I've created a new issue #87 for adding the CLI

EDIT:
The CLI is in beta. Can be installed like so: yarn add strapi-plugin-sitemap@beta.
See some docs here.

3
I've created an issue in the config-sync repo for environment specific config #55.
I'm probably not going to allow setting the hostname in config for this plugin.
But I don't think this should be too big of an issue, as you won't submit your staging/testing sitemap to Google, right?

Regarding 2
I understand the usecase here, but I'm not sure how to implement this, as it should be upload provider agnostic and I'm not too familiar with external upload providers.
If you have a idea about how this should be implemented please leave your thoughts here :)

@vsychov
Copy link
Author

vsychov commented May 9, 2022

Thanks @boazpoolman, 3 is definitely not big issue.

Regarding 2:
unfortunately nodejs still don't have many filesystem cloud agnostics libs. But something like this nice lib can be used - https://github.com/tweedegolf/storage-abstraction/ for saving sitemap.

there is use example article from lib author: https://tweedegolf.nl/en/blog/38/clouds-without-shadows

@boazpoolman
Copy link
Member

I'll look in to this!

@boazpoolman boazpoolman changed the title Feature Request: More support of cloud-based deployments Feature Request: More support of cloud-based sitemap storage May 9, 2022
@boazpoolman boazpoolman changed the title Feature Request: More support of cloud-based sitemap storage Feature Request: Cloud-based sitemap storage May 9, 2022
@boazpoolman boazpoolman removed the Needs investigation Further information is requested label May 9, 2022
@boazpoolman boazpoolman added this to the v2.1.0 milestone May 9, 2022
@mmondou
Copy link

mmondou commented Jun 21, 2022

Hi! Thanks for the plugin, it works great!

To continue the discussion on "Regarding 2", an example of the cloud provider implementation is the AWS S3 Strapi plugin for the upload management. This feature is very useful when you have multiple instances/pods of the same repository deployed and when the user (or the crawler!) is redirected to an instance. We are not sure about the instance and so if the sitemap is generated in the public folder of Strapi, we cannot know which instance/pod is the right one, it's the same for file uploads. The solution is to define a provider to upload the files to the cloud. Each instance can upload/replace the sitemap in the cloud, it doesn't matter because each instance reads the same database and when the build comes, it's up to date and the new generated sitemap will be uploaded to the cloud.

I'm not sure this is 100% clear, feel free if you have any questions.

@boazpoolman
Copy link
Member

Hi @mmondou

Yep! I think it makes most sense to hook into the Strapi upload provider to store your sitemap file.
If you have a cloud based upload provider set in Strapi then this plugin will upload the sitemap file there.
Wheter it's Google cloud or AWS S3, doesn't really matter then.

Though I don't have any time planned to work on implementing this feature.
If you, or anyone, would like to dedicate some time to this I would be glad to review PRs and help with releasing it to NPM.

@boazpoolman
Copy link
Member

This feature has been released with version 3.0.0 of the Sitemap plugin.

As of this new version, the plugin serves a "virtual sitemap". This is not a XML file stored in the public folder in your file system. Instead the XML gets saved to the database and an XML API endpoint is exposed to access the sitemap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants