Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algolia search does not pass query string correctly and returns no results #9532

Closed
6 of 7 tasks
timothymcmackin opened this issue Nov 10, 2023 · 9 comments
Closed
6 of 7 tasks
Labels
bug An error in the Docusaurus core causing instability or issues with its execution closed: working as intended This issue is intended behavior, there's no need to take any action.

Comments

@timothymcmackin
Copy link

timothymcmackin commented Nov 10, 2023

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

I'm using the built-in Algolia search features of docusaurus in preset-classic. I know that my site is indexed and that my API key works because I can do a simple search from a curl request and get results. For example, this command searches for the keyword "contract" and returns many results from docs.tezos.com:

curl -X GET \
     -H "X-Algolia-API-Key: 57d6a376a3528866784a143809cc7427" \
     -H "X-Algolia-Application-Id: QRIAHGML9Q" \
    "https://QRIAHGML9Q-dsn.algolia.net/1/indexes/tezosdocs?query=contract&hitsPerPage=2&getRankingInfo=1"

When I open my site at docs.tezos.com and do a search for the same word, I get no results. In the Algolia dashboard, I see many results with an empty "query" field in the request, so that's probably why the searches return no results.
Screenshot 2023-11-10 at 1 19 16 PM

When I open the console and grab the URL of one of the search requests to Algolia, its form data looks like this:

{
  "requests": [
    {
      "query": "contract",
      "indexName": "tezosdocs",
      "params": "attributesToRetrieve=%5B%22hierarchy.lvl0%22%2C%22hierarchy.lvl1%22%2C%22hierarchy.lvl2%22%2C%22hierarchy.lvl3%22%2C%22hierarchy.lvl4%22%2C%22hierarchy.lvl5%22%2C%22hierarchy.lvl6%22%2C%22content%22%2C%22type%22%2C%22url%22%5D&attributesToSnippet=%5B%22hierarchy.lvl1%3A10%22%2C%22hierarchy.lvl2%3A10%22%2C%22hierarchy.lvl3%3A10%22%2C%22hierarchy.lvl4%3A10%22%2C%22hierarchy.lvl5%3A10%22%2C%22hierarchy.lvl6%3A10%22%2C%22content%3A10%22%5D&snippetEllipsisText=%E2%80%A6&highlightPreTag=%3Cmark%3E&highlightPostTag=%3C%2Fmark%3E&hitsPerPage=20&clickAnalytics=false&facetFilters=%5B%22language%3Aen%22%2C%5B%22docusaurus_tag%3Adefault%22%2C%22docusaurus_tag%3Adocs-default-current%22%5D%5D"
    }
  ]
}

I can copy this network request as a curl command from the console, run it in my terminal, and confirm that it gets no results.

Based on Algolia's API documentation for this API endpoint, the form data should look like this:

{
  "requests": [
    {
      "indexName": "tezosdocs",
      "params": "query=contract"
    }
  ]
}

If I replace the data in the network request with this JSON, the curl command returns results.

So it appears to me that docusaurus or the algolia search component (@docsearch/react) is sending a malformed query to algolia, causing it to return no results.

Reproducible demo

https://github.com/trilitech/tezos-developer-docs

Steps to reproduce

  1. Open docs.tezos.com in a web browser.
  2. Open the dev tools (option-command-i on mac or shift-control-i on windows).
  3. In the dev tools pane, go to the network tab and click Clear Network Log.
  4. On the site, click the search bar at the top right of the screen and type "contract" in the popup search window. Note that there are no search results.
  5. In the network tab, right-click the last network request and then click Copy > As Curl.
  6. Paste the command into a text editor. It should be going to the algolia.net domain.
  7. Run the command in the terminal and verify that it returned no search results because the nbHits field is 0 and the hits array is empty.
  8. In the text editor, replace the line that starts with --data-raw with this line:
  --data-raw '{"requests":[{"indexName":"tezosdocs","params":"query=contract"}]}' \
  1. Run the updated command in the terminal and see that there are many search results. I'm seeing the nbHits field say 356.

Expected behavior

Get search results

Actual behavior

No search results

Your environment

Self-service

  • I'd be willing to fix this bug myself.
@timothymcmackin timothymcmackin added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Nov 10, 2023
@slorber
Copy link
Collaborator

slorber commented Nov 11, 2023

So it appears to me that docusaurus or the algolia search component (@docsearch/react) is sending a malformed query to algolia, causing it to return no results.

Most likely Docusaurus sends the right query, but your index is not configured correctly according to our recommendations.

For our queries to work, the index must contain the query fields we query on, notably the "docusaurus_tag" field.

CleanShot 2023-11-11 at 17 11 38@2x


Please delete your index and recrawl your site with the recommended crawler configuration that we link to in our documentation. If it still does not work then we can re-open but we'll need you to provide your crawler config and screenshots of your Algolia index UI.

You can also reach out to the DocSearch support team through email or on their Discord.

@slorber slorber closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2023
@slorber slorber added closed: working as intended This issue is intended behavior, there's no need to take any action. and removed status: needs triage This issue has not been triaged by maintainers labels Nov 11, 2023
@timothymcmackin
Copy link
Author

Thanks for the info. I was able to get it working by creating a new crawler, re-indexing the site, and setting contextualSearch to False.

@slorber
Copy link
Collaborator

slorber commented Nov 16, 2023

and setting contextualSearch to False.

Setting it to false might work but might also "hide" the problem. This setting disable the filtering on docusaurus_tag, so even if your index is misconfigured it will return results. The problem remains that your index is eventually misconfigured and it's important that you ensure the field docusaurus_tag is correctly indexed

@timothymcmackin
Copy link
Author

I duplicated the crawler in the docusaurus documentation and made the small changes for my site:

new Crawler({
  appId: "QRIAHGML9Q",
  apiKey: "MY_API_KEY",
  rateLimit: 8,
  startUrls: ["https://docs.tezos.com"],
  sitemaps: ["https://docs.tezos.com/sitemap.xml"],
  saveBackup: true,
  ignoreQueryParams: ["source", "utm_*"],
  ignoreCanonicalTo: true,
  discoveryPatterns: ["https://docs.tezos.com/**"],
  actions: [
    {
      indexName: "tezosdocs",
      pathsToMatch: ["https://docs.tezos.com/**"],
      recordExtractor: ({ $, helpers }) => {
        // priority order: deepest active sub list header -> navbar active item -> 'Documentation'
        const lvl0 =
          $(
            ".menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active"
          )
            .last()
            .text() || "Documentation";

        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: lvl0,
            },
            lvl1: ["header h1", "article h1"],
            lvl2: "article h2",
            lvl3: "article h3",
            lvl4: "article h4",
            lvl5: "article h5, article td:first-child",
            lvl6: "article h6",
            content: "article p, article li, article td:last-child",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
  initialIndexSettings: {
    "Tezos docs crawler": {
      attributesForFaceting: [
        "type",
        "lang",
        "language",
        "version",
        "docusaurus_tag",
      ],
      attributesToRetrieve: [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:10"],
      camelCaseAttributes: ["hierarchy", "content"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: "allOptional",
      separatorsToIndex: "_",
    },
  },
});

I re-indexed the site with this crawler and I can see that it is indexed on the docusaurus_tag tag:
Screenshot 2023-11-17 at 9 08 35 AM

However, I still see no results in my search when I turn contextualSearch off. What does my index need to look like to work correctly?

@timothymcmackin
Copy link
Author

Per your comment here:
#6693 (comment)
I have also made docusaurus_tag searchable in Algolia:
Screenshot 2023-11-17 at 12 57 40 PM
But the search still returns no results in the popup/modal window when contextual search is on.

timothymcmackin added a commit to trilitech/tezos-developer-docs that referenced this issue Nov 17, 2023
timothymcmackin added a commit to trilitech/tezos-developer-docs that referenced this issue Nov 17, 2023
timothymcmackin added a commit to trilitech/tezos-developer-docs that referenced this issue Nov 17, 2023
@slorber
Copy link
Collaborator

slorber commented Nov 18, 2023

Contextual search should rather be on, not off

I don't know all the algolia docsearch details to be able to troubleshoot this on your site but you can reach out to their support if needed. Cc @shortcuts

@FZambia
Copy link

FZambia commented Jan 11, 2024

Had similar problem. Looks like I passed this quest, will add instruction which helped me to get non-empty results with Algolia search widget:

  1. Reconfigure crawler on https://crawler.algolia.com/admin/crawlers/ to use recommended config from https://docsearch.algolia.com/docs/templates/#docusaurus-v3-template
  2. In my case I dropped index, but probably it's possible to just restart crawler
  3. Go to index configuration and add docusaurus_tag and language to Attributes for faceting and make them searchable. As far as I understood all fields which present in query under facetFilters must be enabled here.

After that in my case queries finally started to return results.

@timothymcmackin
Copy link
Author

This is what fixed it for me: I changed this code:

  initialIndexSettings: {
    "Tezos docs crawler": {

to:

  initialIndexSettings: {
    tezosdocs: {

Then I deleted the index and re-indexed.

@slorber
Copy link
Collaborator

slorber commented Apr 5, 2024

EDIT: see Troubleshooting section added to our docs here:

https://docusaurus.io/docs/search#algolia-troubleshooting

No search result?

For anyone passing by, if you don't get any Algolia search results:

  • make sure that your Algolia index has the fields in the screenshot below
  • If you don't see these fields, then you have an index config problem
  • You should check your crawler config, make sure it matches the recommended one, and then delete/recreate your index based on the newly updated/fixed crawler config (Algolia team recommendation)

image

See also: #10007 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution closed: working as intended This issue is intended behavior, there's no need to take any action.
Projects
None yet
Development

No branches or pull requests

3 participants