Skip to content

Commit

Permalink
docs: include Algolia DocSearch ADR and crawler config (#2991)
Browse files Browse the repository at this point in the history
  • Loading branch information
adamstankiewicz committed Jan 10, 2024
1 parent 6927788 commit 35bfa7d
Show file tree
Hide file tree
Showing 3 changed files with 175 additions and 0 deletions.
5 changes: 5 additions & 0 deletions algolia-docsearch/README.md
@@ -0,0 +1,5 @@
# Paragon | Algolia DocSearch

This module contains the Algolia DocSearch crawler configuration (i.e., ``crawler-config.js``) that controls the behavior of the Algolia site crawler responsible for indexing content from the Paragon documentation website to the Algolia index.

Any revisions to the configuration in this module must also be persisted upstream in the Algolia DocSearch crawler configuration.
136 changes: 136 additions & 0 deletions algolia-docsearch/crawler-config.js
@@ -0,0 +1,136 @@
/* eslint-disable */

// README: When updating the Algolia DocSearch crawler configuration here, it will also need to be updated
// in the Algolia DocSearch crawler editor (https://crawler.algolia.com/). Otherwise, changes to this persisted
// configuration will not actually apply to the Paragon documentation website as intended.

// Note: there are REDACTED Algolia `appId` and `apiKey` values below; these should not be committed to the repository
// but should be included in the crawler configuration in the Algolia DocSearch crawler editor.

new Crawler({
rateLimit: 8,
startUrls: ["https://paragon-openedx.netlify.app/"],
renderJavaScript: false,
sitemaps: [],
ignoreCanonicalTo: false,
discoveryPatterns: ["https://paragon-openedx.netlify.app/**"],
schedule: "every 1 day",
actions: [
{
indexName: "paragon-openedx",
pathsToMatch: [
"https://paragon-openedx.netlify.app/**",
"!https://paragon-openedx.netlify.app/insights/",
"!https://paragon-openedx.netlify.app/status/",
"!https://paragon-openedx.netlify.app/changelog/",
],
recordExtractor: ({ helpers, url, $ }) => {
const category = url.pathname.split("/")[1] || "Documentation";
return helpers.docsearch({
recordProps: {
// lvl1: ["header h1", "article h1", "main h1", "h1", "head > title"],
lvl1: ["main h1"],
lvl0: {
selectors: "",
defaultValue:
category.charAt(0).toUpperCase() + category.slice(1),
},
lvl2: ["main h2"],
lvl3: ["article h3", "main h3", "h3"],
lvl4: ["article h4", "main h4", "h4"],
lvl5: ["article h5", "main h5", "h5"],
lvl6: ["article h6", "main h6", "h6"],
content: ["article p, article li", "main p, main li", "p, li"],
},
aggregateContent: true,
});
},
},
],
initialIndexSettings: {
"paragon-openedx": {
attributesForFaceting: ["type", "lang"],
attributesToRetrieve: [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type",
],
attributesToHighlight: ["hierarchy", "hierarchy_camel", "content"],
attributesToSnippet: ["content:10"],
camelCaseAttributes: ["hierarchy", "hierarchy_radio", "content"],
searchableAttributes: [
"unordered(hierarchy_radio_camel.lvl0)",
"unordered(hierarchy_radio.lvl0)",
"unordered(hierarchy_radio_camel.lvl1)",
"unordered(hierarchy_radio.lvl1)",
"unordered(hierarchy_radio_camel.lvl2)",
"unordered(hierarchy_radio.lvl2)",
"unordered(hierarchy_radio_camel.lvl3)",
"unordered(hierarchy_radio.lvl3)",
"unordered(hierarchy_radio_camel.lvl4)",
"unordered(hierarchy_radio.lvl4)",
"unordered(hierarchy_radio_camel.lvl5)",
"unordered(hierarchy_radio.lvl5)",
"unordered(hierarchy_radio_camel.lvl6)",
"unordered(hierarchy_radio.lvl6)",
"unordered(hierarchy_camel.lvl0)",
"unordered(hierarchy.lvl0)",
"unordered(hierarchy_camel.lvl1)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy_camel.lvl2)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy_camel.lvl3)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy_camel.lvl4)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy_camel.lvl5)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy_camel.lvl6)",
"unordered(hierarchy.lvl6)",
"content",
],
distinct: true,
attributeForDistinct: "url",
customRanking: [
"desc(weight.pageRank)",
"desc(weight.level)",
"asc(weight.position)",
],
ranking: [
"words",
"filters",
"typo",
"attribute",
"proximity",
"exact",
"custom",
],
highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
highlightPostTag: "</span>",
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
allowTyposOnNumericTokens: false,
minProximity: 1,
ignorePlurals: true,
advancedSyntax: true,
attributeCriteriaComputedByMinProximity: true,
removeWordsIfNoResults: "allOptional",
},
},
appId: "", // REDACTED
apiKey: "", // REDACTED
extraUrls: [
"https://paragon-openedx.netlify.app/foundations/colors",
"https://paragon-openedx.netlify.app/foundations/elevation",
"https://paragon-openedx.netlify.app/foundations/typography",
"https://paragon-openedx.netlify.app/foundations/css-utilities",
"https://paragon-openedx.netlify.app/foundations/responsive",
"https://paragon-openedx.netlify.app/foundations/brand-icons",
"https://paragon-openedx.netlify.app/guides/installation-and-usage",
"https://paragon-openedx.netlify.app/tools/component-generator",
"https://paragon-openedx.netlify.app/playground",
],
});
34 changes: 34 additions & 0 deletions docs/decisions/0020-algolia-docsearch.rst
@@ -0,0 +1,34 @@
20. Adopting and maintaining Algolia DocSearch
----------------------------------------------

Status
------

Accepted

Context
-------

The Paragon documentation website (https://paragon-openedx.netlify.app/), hosted on Netlify, is used by designers and engineers to understand and use the capabilities provided by the Paragon design system and React component library. Ensuring consumers of Paragon can efficiently find the content they need is critical to ensuring the Paragon design system is easy to use and adopt.

Without formally supporting search, Paragon consumers generally need to use native browser search capabilities (e.g., `Cmd + F`). To make content discoverability easier, we would like to support search functionality on the Paragon documentation website in a lightweight and low-maintenance way.

Decision
--------

We will adopt Algolia DocSearch (https://docsearch.algolia.com/), a free tool provided to open-source projects who have documentation needs. Algolia DocSearch provides a configurable crawler that parses the Paragon documentation website on a regular frequency and indexes the content for search in an Algolia index. Algolia DocSearch also provides a search UI widget that can be embedded in the Paragon documentation website to provide search functionality.

While the Algolia DocSearch crawler is configurable, its code is not open-source given it needs to be applied within the Algolia DocSearch crawler itself. To mitigate this, we will persist the code of the crawler configuration in the Paragon repository so that it can be easily referenced and updated as needed. When the crawler configuration is updated in the Paragon repository, it should also be updated in the Algolia DocSearch crawler editor.

Consequences
------------

* By persisting the Algolia DocSearch crawler configuration in the Paragon repository, we may run into situations where the crawler configuration is updated in the Paragon repository without it also getting persisted in the Algolia DocSearch crawler editor itself. This could result in the crawler configuration in the Algolia DocSearch crawler editor being out of sync with the Paragon repository. To mitigate this, we will document the process for updating the crawler configuration in the Paragon repository and the Algolia DocSearch crawler editor.

Resources
---------

* https://www.algolia.com/
* https://docsearch.algolia.com/
* https://crawler.algolia.com/
* https://dashboard.algolia.com/

0 comments on commit 35bfa7d

Please sign in to comment.