Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33

makorin0315 · 2020-09-02T23:58:27Z

NOTE: this is a suggestion/request that came from Dr. Rei Noguchi @ Gunma University Hospital.

BACKGROUND

In iKnow, Negation expansion is normally done using the Path, which for non-Japanese language is the word order in the Sentence. Since we developed Entity Vector as a special-case Path for Japanese, the order of entities within the Path is mostly different from how they appear within the Sentence. For this reason, we have not yet implemented Negation expansion beyond the boundaries of the entity that includes the Negation marker.

For example:

今週はレッスンはない。- There is no lesson this week.
Entity Vector - レッスンない今週
The two particles は are NonRelevant.

Because of the sentence structure, the word ない, which is present form of the Adjectival Verb meaning "doesn't exist" and a Negation marker, does not expand beyond itself. This is a problem, since it's no possible to know "what" is being negated without reading the entire sentence.

SIMPLE EXPANSION EXPERIMENT

Dr. Noguchi used the current iKnow Python interface to experiment with his medical data, which often uses simple sentence structures that almost resembles the format: XXX は (or が) ない (or なかった - past form of the same Adjectival Verb meaning "didn't exist").

XXXはない。
XXXがない。
XXXはなかった。
XXXがなかった。

EXPERIMENT:
In cases like above, expand Negation to the left to the Concept before the particle は or が, i.e., in above examples would be "XXX".

In addition, there are some sentences where XXX are replaced by "XXX1やXXX2”, meaning "XXX1 and/or XXX2". In such case, expand Negation to the left, all the way to the Concept before the particle や, i.e., "XXX1" (the first Concept).

His experiment suggested that, at least for his data, such expansion implementation is normally semantically correct and would give more meaningful result to his machine learning work, since it is clearer what exists and what doesn't exist. (For example: There was no fever vs. Patient had fever.)

INITIAL DISCUSSION

This approach only works when the sentence structure is as simple as above (in clinical or medical text). In more complex sentences, it's possible that XXX is part of a subordinate clause, in which case it would be more desirable to expand even further to the left.
However, we have heard from various customers through the years that, it would be desirable to see the "link" between the Adjectival Verb and what is being modified. This is one of such examples. One idea was to enable Path (i.e., CRC-like Path) instead of Entity Vector and then make は and が PathRelevant, but it's not clear how much language model work is involved after such code change.
Better Negation expansion has been a longstanding task for Japanese. It may be a good idea to start small (such as in this suggestion), and improve further as we go.

TECHNICAL APPROACHES

There are two different ways Negation expansion can be implemented.

No change in Path mechanism, i.e., use Entity Vector
- No technical work involved
- In the language model, add Negation marker to XXX and particle, since NegStop/NegBegin will not do anything.
- This approach is not really creating a span but rather 3 separate entities (Concept, NonRelevant, Concept) with Negation Marker. => Is this acceptable for Dr. Noguchi? If so, is it a good approach in the long run? If not, we may need to make the entire thing a Concept. Is that acceptable...?
Add ability to select EV vs. CRC Path
- technical work is involved
- In the language model, NegStop/NegBegin can be used, thus creating "real" span.
- This was initially suggested back in December, when we observed that certain types of medical/clinical notes use more straightforward (CRC-like) sentence structure.
- It may be a problem if user wants Negation expansion but also want to use EV...

The first approach is quicker, but may not be as useful longer-term. Any comment or additional consideration that I'm missing? @ISC-SDE @bdeboe @JosDenysGitHub @woodfinisc

JosDenysGitHub · 2020-09-03T08:27:04Z

In IRIS, Entity Vectors are emitted as sentence attributes (see the RAW data output), and no Path information is present. I have changed that in iKnow standalone, for simplicity, and used the Path output for emitting Entity Vectors in Japanese.
Since I thought they represent the same thing...
But internally they are separate, meaning we can emit both Path data and Entity Vector data, if that could help.
The current Path construction does not use the previous CRC-mechanism, but simply collects all entities except for NonRelevants (after introducing the PathRelevant type), that means one Path per sentence, the CRC-mechanism can result in multiple Paths.

It would not be that hard to generate Path data, and the corresponding path-expansion mechanism, together with Entity Vectors. The latter would become sentence attributes, the former replace the current EV's.

This would mean an incompatible API change for Japanese of course.

@JosDenysGitHub

makorin0315 · 2020-09-03T13:15:38Z

Thanks, @JosDenysGitHub. That was my next question, i.e., is it possible to emit both EV and Path data, so it's great to hear that it is possible. EVs can still be used to calculate Proximity, correct?

JosDenysGitHub · 2020-09-03T13:38:56Z

EVs are the base for calculating proximity, I guess this should not change ?

makorin0315 · 2020-09-03T16:38:43Z

Correct. Proximity calculation should stay as is.

bdeboe · 2020-09-03T18:32:04Z

Getting Japanese back in line with the other languages to return "regular" paths and emit EVs through a separate mechanism sounds the desirable long-term thing to do. The standalone engine nor the IRIS integration itself would be that much impacted, but applications built on top of IRIS that were expecting EVs from the PathAPI will get something different until they adapt to the new channel for EVs.

@JosDenysGitHub : how does the "sentence attribute" representation of an EV look?

makorin0315 · 2020-09-03T18:37:15Z

In the RAW output, the EV looks like:
<attr type="entity_vector" "レッスン" "ない" "今週">

makorin0315 · 2020-09-08T15:13:08Z

Adding @JosDenysGitHub as an assignee for required engine change - to be worked on after higher priority issues.

Rei-hub · 2020-09-11T11:15:30Z

Hi @makorin0315, I’m really sorry for the late posting, and appreciate your kind support.

I’m Rei Noguchi at Gunma University Hospital, and am researching about analysis of mainly medical text with iKnow.
iKnow is really powerful and useful tool, and I’ve found great value in the product concept.

As discussed above, I strongly expect the function of “negation assignment” (identification of a word modified by a negation), which makes iKnow more powerful.
Negations seem to be more common especially in medical text than in other domains  (e.g. “NO fever”, “Pneumonia pattern was NOT observed on CT imaging”), and these are really critical for grasping right disease states of patients.

As introduced above, I have preliminary implemented a negation assignment algorithm using "iknowpy" as noted in the image attached.

This remains just a hypothetical level and is very simple algorithm, but superficially works well in my situation at this time.
Of course, technically, actual negation assignments are really complicated and there may be many cases unsuitable for my algorithm in general, but at least in Japanese medical text, most of negations may be applicable to the following cases, where my algorithm can work.

Concept - Non-relevant - Concept (e.g. 発熱 - は - 無かった。: Fever was not observed.）
Concept - Concept（e.g. 発熱 - ありません。: Fever unobserved.<- in Japanese medical text, a non-relevant word between concept and negation, namely postpositional particle in Japanese grammar, is often omitted.)
just one Concept (e.g 発熱なし: No fever.)

Based on the preliminary results, in the proposal of #1 from @makorin0315, “3 entities (C-R-C) w/ Negation Marker“ seems to be able to cover the above-mentioned cases, and be acceptable for my situation.
Meanwhile, for other domains or in a long sentence, there may be many cases that don’t fit the above logic.
For these cases, “negation span” in the proposal of #2 will be useful and seems to be preferable for generalization.

I'm looking forward to an implementation, and please let me know if you need a help in validation or discussion.

makorin0315 · 2020-09-11T16:34:15Z

Thank you @Rei-hub for your comment & response. The team has decided that the second approach, i.e., attribute expansion by enabling use of the CRC Path, would be do-able and better. For this approach, we first require some code changes by our developer, after which I can start making the linguistic updates. It make take sometime, but we will keep you posted on our progress.

Rei-hub · 2020-09-14T01:51:07Z

Thank you @makorin0315 for your quick response, and that is really good news for me and all users. The second approach sounds better and reasonable. That will be a major update and need much time, but I look forward to the release. I would be happy if there is anything I could help with. Thank you.

The language model now outputs PathRelevant entities and simple spans for Negation attributes.

Entity Vectors are emitted as sentence attributes. Paths are supported like in other languages.

Paths with attributes are now emitted for Japanese, but there were compilation problems with Japanese paths on Linux.

PathConstruction is changed to PR in metadata.csv.

GenXML.py generates XML output for language model development. It uses iKnowXML.xsl for visualisation.

The updated script emits both Entity Vectors and Paths for Japanese.

The language model now outputs PathRelevant entities and simple spans for Negation attributes.

Update of ref_testing.py to comply with new output for Japanese. Update of raw output.

Version 1.0.12 Japanese-specific changes See issues for more information.

makorin0315 · 2021-04-27T18:18:38Z

Un-doing Close until unit test issue is resolved and fix is validated.

makorin0315 · 2021-04-28T17:47:39Z

With 2d9670b in place, it's now been confirmed that the issue has been resolved in the latest master branch (iknowpy 1.0.12).

@Rei-hub - requested simple Negation expansion is now available for your evaluation..

Rei-hub · 2021-04-28T23:47:08Z

@makorin0315 and all
I'm really happy to hear the good news, and appreciate your prompt response.
I'm now analyzing daily progress notes in electronic medical records for an upcoming conference in June,
so that I will try and evaluate Negation expansion right away, and will report to you the situation and preliminary results soon.

makorin0315 self-assigned this Sep 2, 2020

makorin0315 assigned JosDenysGitHub Sep 8, 2020

makorin0315 changed the title ~~Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & it's conjugated forms)~~ Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) Oct 27, 2020

makorin0315 added the enhancement New feature or request label Nov 6, 2020

makorin0315 added a commit that referenced this issue Apr 16, 2021

Additional linguistic update to resolve #33

ff84ea7

makorin0315 added a commit that referenced this issue Apr 21, 2021

Update to resolve #33 with new reference materials.

aca8f27

ISC-SDE pushed a commit that referenced this issue Apr 22, 2021

Emit Japanese PR entities and negation spans (#33)

83590a9

The language model now outputs PathRelevant entities and simple spans for Negation attributes.

ISC-SDE pushed a commit that referenced this issue Apr 22, 2021

Update Japanese language model to resolve #33

2217117

ISC-SDE pushed a commit that referenced this issue Apr 22, 2021

Update reference_materials to close #33

a455dc8

ISC-SDE pushed a commit that referenced this issue Apr 27, 2021

Emit Entity Vectors + Paths in Japanese (#33)

6d29cc7

Entity Vectors are emitted as sentence attributes. Paths are supported like in other languages.

ISC-SDE pushed a commit that referenced this issue Apr 27, 2021

Fix compilation problems for paths on Linux (#33)

a0c29e7

Paths with attributes are now emitted for Japanese, but there were compilation problems with Japanese paths on Linux.

ISC-SDE pushed a commit that referenced this issue Apr 27, 2021

Enable PathRelevants in Japanese (issue #33)

4f227a9

PathConstruction is changed to PR in metadata.csv.

ISC-SDE added a commit that referenced this issue Apr 27, 2021

Add genXML.py + style sheet to this branch (#33)

eaf3f94

GenXML.py generates XML output for language model development. It uses iKnowXML.xsl for visualisation.

ISC-SDE added a commit that referenced this issue Apr 27, 2021

Update genXML.py for Japanese (#33)

ef61119

The updated script emits both Entity Vectors and Paths for Japanese.

ISC-SDE pushed a commit that referenced this issue Apr 27, 2021

Emit Japanese PR entities and negation spans (#33)

7c1d29a

The language model now outputs PathRelevant entities and simple spans for Negation attributes.

ISC-SDE closed this as completed in 880a1a1 Apr 27, 2021

ISC-SDE pushed a commit that referenced this issue Apr 27, 2021

Update reference_materials to close #33

15be6b2

Update of ref_testing.py to comply with new output for Japanese. Update of raw output.

ISC-SDE added a commit that referenced this issue Apr 27, 2021

Fix #33 and fix #104

cfbbf78

Version 1.0.12 Japanese-specific changes See issues for more information.

makorin0315 reopened this Apr 27, 2021

makorin0315 closed this as completed Apr 28, 2021

Rei-hub mentioned this issue May 26, 2021

Japanese: Request for some improvements of entity extraction algorithm in terms of more accurate analysis of medical colloquial text #137

Closed

makorin0315 mentioned this issue May 27, 2021

Japanese medical (2 of 3): specific disambiguation for phrase はなし if followed by a space #140

Closed

makorin0315 mentioned this issue Feb 17, 2022

Japanese: extension of simple Negation expansion #221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33

Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33

makorin0315 commented Sep 2, 2020

JosDenysGitHub commented Sep 3, 2020

makorin0315 commented Sep 3, 2020

JosDenysGitHub commented Sep 3, 2020

makorin0315 commented Sep 3, 2020

bdeboe commented Sep 3, 2020

makorin0315 commented Sep 3, 2020

makorin0315 commented Sep 8, 2020

Rei-hub commented Sep 11, 2020

makorin0315 commented Sep 11, 2020

Rei-hub commented Sep 14, 2020

makorin0315 commented Apr 27, 2021

makorin0315 commented Apr 28, 2021

Rei-hub commented Apr 28, 2021

Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33

Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33

Comments

makorin0315 commented Sep 2, 2020

BACKGROUND

SIMPLE EXPANSION EXPERIMENT

INITIAL DISCUSSION

TECHNICAL APPROACHES

JosDenysGitHub commented Sep 3, 2020

makorin0315 commented Sep 3, 2020

JosDenysGitHub commented Sep 3, 2020

makorin0315 commented Sep 3, 2020

bdeboe commented Sep 3, 2020

makorin0315 commented Sep 3, 2020

makorin0315 commented Sep 8, 2020

Rei-hub commented Sep 11, 2020

makorin0315 commented Sep 11, 2020

Rei-hub commented Sep 14, 2020

makorin0315 commented Apr 27, 2021

makorin0315 commented Apr 28, 2021

Rei-hub commented Apr 28, 2021