Skip to content

Working with all lowercase dataset #32

@ghost

Description

Thanks for the wonderful work here!

I have some text files and want to extract NE from them by running nerTagger.py . However, my files contain all lowercase characters and of course, I can't get any NE result.

For instance:

  • [Normal sentence]: I live in New York.
    Output:
...
"text": "I live in New York.",
"entities": [
                {
                    "text": "New York",
                    "class": "LOC",
                    "score": 1.0,
                    "beginOffset": 10,
                    "endOffset": 17
                },
            ]
...
  • [Lowercase sentence]: i live in new york.
    Output:
...
"text": "i live in new york.",
"entities": []
...

Expected:

...
"text": "i live in new york.",
"entities": [
                {
                    "text": "new york",
                    "class": "LOC",
                    "score": 1.0,
                    "beginOffset": 10,
                    "endOffset": 17
                },
            ]
...

Therefore, should we develop a caseless NER model?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions