Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Words of messages in some languages not found on search result #24765

Open
luixxiul opened this issue Mar 8, 2023 · 4 comments
Open

Words of messages in some languages not found on search result #24765

luixxiul opened this issue Mar 8, 2023 · 4 comments
Labels
I18n O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect

Comments

@luixxiul
Copy link

luixxiul commented Mar 8, 2023

Steps to reproduce

Run the test below on timeline.spec.ts.

it("should highlight search result words of various languages", () => {
    // "Test" in Arabic, Hebrew, and Hindi
    const stringAr = "اِمْتِحَان‎";
    const stringHe = "מִבְחָן";
    const stringHi = "आज़माइश";

    cy.visit("/#/room/" + roomId);

    // Wait until configuration is finished
    cy.contains(
        ".mx_RoomView_body .mx_GenericEventListSummary .mx_GenericEventListSummary_summary",
        "created and configured the room.",
    ).should("exist");

    // Arabic
    cy.sendEvent(roomId, null, "m.room.message" as EventType, {
        msgtype: "m.text" as MsgType,
        body: stringAr,
    });
    // Hebrew
    cy.sendEvent(roomId, null, "m.room.message" as EventType, {
        msgtype: "m.text" as MsgType,
        body: stringHe,
    });
    // Hindi
    cy.sendEvent(roomId, null, "m.room.message" as EventType, {
        msgtype: "m.text" as MsgType,
        body: stringHi,
    });

    // Ensure the last message was sent
    cy.get(".mx_EventTile_last .mx_EventTile_receiptSent").should("be.visible");

    cy.get(".mx_RoomHeader_searchButton").click();

    // Check stringAr is highlighted
    cy.get(".mx_SearchBar_input input").clear().invoke("val", stringAr).trigger("input");
    cy.get(".mx_SearchBar_input input").type("{enter}");
    cy.get(".mx_EventTile:not(.mx_EventTile_contextual) .mx_EventTile_searchHighlight").should("exist");

    // Check stringHe is highlighted
    cy.get(".mx_SearchBar_input input").clear().invoke("val", stringHe).trigger("input");
    cy.get(".mx_SearchBar_input input").type("{enter}");
    cy.get(".mx_EventTile:not(.mx_EventTile_contextual) .mx_EventTile_searchHighlight").should("exist");

    // Check stringHi is highlighted
    cy.get(".mx_SearchBar_input input").clear().invoke("val", stringHi).trigger("input");
    cy.get(".mx_SearchBar_input input").type("{enter}");
    cy.get(".mx_EventTile:not(.mx_EventTile_contextual) .mx_EventTile_searchHighlight").should("exist");
});

Outcome

What did you expect?

The test should run successfully.

What happened instead?

It fails due to no results being found.

Please note that the similar test for other non-European languages like Chinese, Japanese, Korean, and Thai passes. Also the test for the Hebrew string without symbols (מבחן) also passes as well.

Operating system

Debian

Browser information

Electron (cypress)

URL for webapp

No response

Application version

develop branch

Homeserver

No response

Will you send logs?

No

@luixxiul luixxiul changed the title Words of messages in some languages on search result not highlighted Words of messages in some languages not highlighted on search result Mar 8, 2023
@luixxiul
Copy link
Author

luixxiul commented Mar 8, 2023

In fact the issue is not really about highlighting words but finding them.

@luixxiul luixxiul changed the title Words of messages in some languages not highlighted on search result Words of messages in some languages not found on search result Mar 8, 2023
@justjanne
Copy link
Contributor

This sounds like tokenization in search is broken?

@justjanne justjanne added S-Minor Impairs non-critical functionality or suitable workarounds exist O-Occasional Affects or can be seen by some users regularly or most users rarely I18n labels Mar 8, 2023
@luixxiul
Copy link
Author

luixxiul commented Mar 8, 2023

Though I have not found yet how to reproduce consistently (and I am not quite sure if this is really related to this issue), it is almost impossible to search Japanese words in a room created and used actively since a half year.

@richvdh
Copy link
Member

richvdh commented Mar 8, 2023

Assuming this is server-side search, it sounds more like a server-side issue than a client-side one.

matrix-org/synapse#901, possibly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I18n O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect
Projects
None yet
Development

No branches or pull requests

3 participants