Skip to content

docsite: allow content to be removed from the search index if it matc…#83

Merged
coury-clark merged 2 commits intomainfrom
docsite/unindex-content-files-by-pattern
Feb 8, 2022
Merged

docsite: allow content to be removed from the search index if it matc…#83
coury-clark merged 2 commits intomainfrom
docsite/unindex-content-files-by-pattern

Conversation

@coury-clark
Copy link
Copy Markdown
Contributor

@coury-clark coury-clark commented Feb 8, 2022

This PR will allow us to remove certain content files from the docsite search index if they match a pattern in the URL. We have a requirement from marketing that we remove some docs from our search results, and this seemed relatively straightforward.

Example JSON:

{
  "$comment": "This configures the documentation server used in local development only. See ./doc/dev/documentation.md#previewing-changes-locally.",
  "templates": "_resources/templates",
  "content": ".",
  "baseURLPath": "/",
  "rootURL": "https://docs.sourcegraph.com",
  "assets": "_resources/assets",
  "assetsBaseURLPath": "/assets/",
  "check": {
    "ignoreURLPattern": "(^https?://)|(^#)|(^mailto:(hi|support|security|feedback)@sourcegraph\\.com(\\?|$))|(^chrome://)|(^/@)"
  },
  "search": {
    "skipIndexURLPattern": ".*insights*."
  }
}

I was pretty split on how to implement this, and tried a few things:

  • At first I wanted to do something similar to the search engine <meta name="robots"...> HTML tag. Unfortunately, it seems that the HTML that is returned from the markdown parser doesn't include the HTML defined in the template file (obvious in hindsight...), and that seemed like a larger refactor that exceeds the bounds of how much we want to invest in this.
  • Next I tried introducing a special tag in the Markdown metadata that would remove it from the index. Not sure if I was just doing something wrong but I couldn't get the template to render, so I moved on to the next simplest thing, and here we are.

@coury-clark coury-clark self-assigned this Feb 8, 2022
@coury-clark coury-clark requested review from a team and CristinaBirkel and removed request for CristinaBirkel February 8, 2022 22:35
@coury-clark coury-clark marked this pull request as ready for review February 8, 2022 22:38
@coury-clark coury-clark merged commit d0aac74 into main Feb 8, 2022
@coury-clark coury-clark deleted the docsite/unindex-content-files-by-pattern branch February 8, 2022 22:38
Copy link
Copy Markdown
Contributor

@LawnGnome LawnGnome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine as it is, but which site is this for? For example, I believe handbook.sourcegraph.com doesn't use the built in search any more. (At the very least, it's fronted by swiftype; I don't know if it then falls back to the search when building its corpus.)

@coury-clark
Copy link
Copy Markdown
Contributor Author

@LawnGnome This is for the docsite docs.sourcegraph.com - for the Code Insights GA launch we are being asked to unindex any relevant pages for marketing purposes.

@LawnGnome
Copy link
Copy Markdown
Contributor

@LawnGnome This is for the docsite docs.sourcegraph.com - for the Code Insights GA launch we are being asked to unindex any relevant pages for marketing purposes.

Cool. 👍, then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants