Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sitemap): add ignorePatterns option #6979

Merged
merged 7 commits into from Apr 6, 2022
Merged

feat(sitemap): add ignorePatterns option #6979

merged 7 commits into from Apr 6, 2022

Conversation

ApsarasX
Copy link
Contributor

Motivation

When I generate a sitemap, I want to ignore some paths, such as the path starting with /tags, so I add an option named ignorePatterns to plugin-sitemap, which accepts an array of regular expressions and allows users to ignore some paths that do not need to appear in the sitemap.

Have you read the Contributing Guidelines on pull requests?

Yes.

Test Plan

I've added several test cases in the following files.

  • packages/docusaurus-plugin-sitemap/src/__tests__/createSitemap.test.ts
  • packages/docusaurus-plugin-sitemap/src/__tests__/options.test.ts.

Related PRs

No other PRs related to this request as far as I could find.

@facebook-github-bot
Copy link
Contributor

Hi @ApsarasX!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@netlify
Copy link

netlify bot commented Mar 24, 2022

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit 188b76b
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/623ec159be30e600093d8892
😎 Deploy Preview https://deploy-preview-6979--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions
Copy link

github-actions bot commented Mar 24, 2022

⚡️ Lighthouse report for the changes in this PR:

Category Score
🟠 Performance 62
🟢 Accessibility 100
🟢 Best practices 92
🟢 SEO 100
🟢 PWA 90

Lighthouse ran on https://deploy-preview-6979--docusaurus-2.netlify.app/

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Mar 24, 2022
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@@ -24,6 +25,9 @@ const PluginOptionSchema = Joi.object({
.valid(...Object.values(EnumChangefreq))
.default(DEFAULT_OPTIONS.changefreq),
priority: Joi.number().min(0).max(1).default(DEFAULT_OPTIONS.priority),
ignorePatterns: Joi.array()
.items(Joi.object().instance(RegExp))
Copy link
Collaborator

@Josh-Cena Josh-Cena Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we allow strings as well, which will be glob patterns like /tags/**? We have a utils.createMatcher for that.

Copy link
Collaborator

@slorber slorber Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this runs in Node side so why not expose a callback as well (and thus rename to simply "ignore"?)

a callback is always the most flexible option

now I'm not against providing support for RegExp/globs too but all this can be implemented in userland with a callback and this is a quite niche feature 🤷‍♂️

Copy link
Collaborator

@Josh-Cena Josh-Cena Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like callback-based APIs unless we can't implement a good solution otherwise—Webpack loaders don't use callbacks everywhere only because they can. I do prefer serializable APIs if they provide the right level of abstraction. While ignorePatterns: ["/tags/**"] is probably as useful as ignore: (path) => path.startsWith("/tags/"), the former is more well-understood by everyone and more approachable for those not well-versed with JS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️ I don't know, it's also more ambiguous because some edge cases might depend on the underlying glob library etc... A callback always behaves explicitly, other apis are just shortcuts.

Not saying that we shouldn't provide shortcuts, but this use-case seems niche enough that a lower-level but more flexible API might be enough

Copy link
Contributor Author

@ApsarasX ApsarasX Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that glob patterns are better than regular expressions, and I've used utils.createMatcher instead of regular matching.

In addition, I don't quite agree to use the callback based API, because it will make docusaurus.config.js looks disorganized. And not all docusaurus users are professional programmers.


const sitemapStream = new SitemapStream({hostname});

routesPaths
.filter((route) => !route.endsWith('404.html'))
.filter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we apply trailingSlash before the filtering? I don't know what is best in this case 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404s should not be applied trailing slash anyways, I think this is fine. The odds that someone gets tricked by this seems low to me, and they can always check their glob.

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

@Josh-Cena Josh-Cena added the pr: new feature This PR adds a new API or behavior. label Mar 24, 2022
@Josh-Cena
Copy link
Collaborator

That looks rather nice, thanks!

I will do a few refactors, hold back while I commit 😄

@ApsarasX
Copy link
Contributor Author

That looks rather nice, thanks!

I will do a few refactors, hold back while I commit 😄

In addition, I found a problem about utils.createMatcher. The return value of utils.createMatcher([]) will match arbitrary string.

@Josh-Cena
Copy link
Collaborator

Ah yes, I think matching none seems like better semantics. Will fix it in this commit

@Josh-Cena Josh-Cena changed the title feat(plugin-sitemap): add ignorePatterns option feat(sitemap): add ignorePatterns option Mar 24, 2022
@ApsarasX
Copy link
Contributor Author

Is there any problem with this PR?

@Josh-Cena
Copy link
Collaborator

Nope, it looks good to me. But new public APIs have to go through @slorber, who unfortunately will be on holiday next. Sorry @ApsarasX you probably have to wait a bit longer than usual.

@Josh-Cena Josh-Cena added the status: awaiting review This PR is ready for review, will be merged after maintainers' approval label Mar 30, 2022
@slorber
Copy link
Collaborator

slorber commented Apr 6, 2022

Thanks 👍

Not 100% convinced that ignorePatterns is future-proof (compared to just ignore) in case we want a callback later, but that wouldn't be a very complex breaking change for users if we change our mind so let's move on for now 👍

@slorber slorber merged commit 103ea04 into facebook:main Apr 6, 2022
@Josh-Cena Josh-Cena removed the status: awaiting review This PR is ready for review, will be merged after maintainers' approval label Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: new feature This PR adds a new API or behavior.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants