Skip to content

Full-text tokenizer can't deal with apostrophe #4633

@Kubera2017

Description

@Kubera2017

What version of Dgraph are you using?

1.1.1

Have you tried reproducing the issue with the latest release?

No

What is the hardware spec (RAM, OS)?

12GB, Ubuntu 19.04

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Set the schema:
    file_content: string @index(fulltext) @lang .
  2. Insert data
    "file_content@en": "unrelated breaks GIT Buccal micron Standard burst College Overall absorptive paracellular measures advance contains mm protein’s chymosin beyond β-lactotensin permanent respective rigid-body apical corneum information murine medium After supported mCherry—a ZOT fluorometer immobilized fully"
  3. Try to search "protein"

Expected behaviour and actual result.

Text search should found "protein" in "protein’s" but can't found it. Neo4j's Lucene can.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/indexesRelated to indexes (indices) and their usage.area/querylangIssues related to the query language specification and implementation.kind/bugSomething is broken.priority/P1Serious issue that requires eventual attention (can wait a bit)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions