Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33

Closed
makorin0315 opened this issue Sep 2, 2020 · 13 comments
Assignees
Labels
enhancement New feature or request

Comments

@makorin0315
Copy link
Collaborator

NOTE: this is a suggestion/request that came from Dr. Rei Noguchi @ Gunma University Hospital.

BACKGROUND

In iKnow, Negation expansion is normally done using the Path, which for non-Japanese language is the word order in the Sentence. Since we developed Entity Vector as a special-case Path for Japanese, the order of entities within the Path is mostly different from how they appear within the Sentence. For this reason, we have not yet implemented Negation expansion beyond the boundaries of the entity that includes the Negation marker.

For example:

今週レッスンない- There is no lesson this week.
Entity Vector - レッスン ない 今週
The two particles は are NonRelevant.

Because of the sentence structure, the word ない, which is present form of the Adjectival Verb meaning "doesn't exist" and a Negation marker, does not expand beyond itself. This is a problem, since it's no possible to know "what" is being negated without reading the entire sentence.

SIMPLE EXPANSION EXPERIMENT

Dr. Noguchi used the current iKnow Python interface to experiment with his medical data, which often uses simple sentence structures that almost resembles the format: XXX は (or が) ない (or なかった - past form of the same Adjectival Verb meaning "didn't exist").

  • XXXはない。
  • XXXがない。
  • XXXはなかった。
  • XXXがなかった。

EXPERIMENT:
In cases like above, expand Negation to the left to the Concept before the particle は or が, i.e., in above examples would be "XXX".

In addition, there are some sentences where XXX are replaced by "XXX1やXXX2”, meaning "XXX1 and/or XXX2". In such case, expand Negation to the left, all the way to the Concept before the particle や, i.e., "XXX1" (the first Concept).

His experiment suggested that, at least for his data, such expansion implementation is normally semantically correct and would give more meaningful result to his machine learning work, since it is clearer what exists and what doesn't exist. (For example: There was no fever vs. Patient had fever.)

INITIAL DISCUSSION

  • This approach only works when the sentence structure is as simple as above (in clinical or medical text). In more complex sentences, it's possible that XXX is part of a subordinate clause, in which case it would be more desirable to expand even further to the left.
  • However, we have heard from various customers through the years that, it would be desirable to see the "link" between the Adjectival Verb and what is being modified. This is one of such examples. One idea was to enable Path (i.e., CRC-like Path) instead of Entity Vector and then make は and が PathRelevant, but it's not clear how much language model work is involved after such code change.
  • Better Negation expansion has been a longstanding task for Japanese. It may be a good idea to start small (such as in this suggestion), and improve further as we go.

TECHNICAL APPROACHES

There are two different ways Negation expansion can be implemented.

  1. No change in Path mechanism, i.e., use Entity Vector
    • No technical work involved
    • In the language model, add Negation marker to XXX and particle, since NegStop/NegBegin will not do anything.
    • This approach is not really creating a span but rather 3 separate entities (Concept, NonRelevant, Concept) with Negation Marker. => Is this acceptable for Dr. Noguchi? If so, is it a good approach in the long run? If not, we may need to make the entire thing a Concept. Is that acceptable...?
  2. Add ability to select EV vs. CRC Path
    • technical work is involved
    • In the language model, NegStop/NegBegin can be used, thus creating "real" span.
    • This was initially suggested back in December, when we observed that certain types of medical/clinical notes use more straightforward (CRC-like) sentence structure.
    • It may be a problem if user wants Negation expansion but also want to use EV...

The first approach is quicker, but may not be as useful longer-term. Any comment or additional consideration that I'm missing? @ISC-SDE @bdeboe @JosDenysGitHub @woodfinisc

@makorin0315 makorin0315 self-assigned this Sep 2, 2020
@JosDenysGitHub
Copy link
Collaborator

In IRIS, Entity Vectors are emitted as sentence attributes (see the RAW data output), and no Path information is present. I have changed that in iKnow standalone, for simplicity, and used the Path output for emitting Entity Vectors in Japanese.
Since I thought they represent the same thing...
But internally they are separate, meaning we can emit both Path data and Entity Vector data, if that could help.
The current Path construction does not use the previous CRC-mechanism, but simply collects all entities except for NonRelevants (after introducing the PathRelevant type), that means one Path per sentence, the CRC-mechanism can result in multiple Paths.

It would not be that hard to generate Path data, and the corresponding path-expansion mechanism, together with Entity Vectors. The latter would become sentence attributes, the former replace the current EV's.

This would mean an incompatible API change for Japanese of course.

@JosDenysGitHub

@makorin0315
Copy link
Collaborator Author

Thanks, @JosDenysGitHub. That was my next question, i.e., is it possible to emit both EV and Path data, so it's great to hear that it is possible. EVs can still be used to calculate Proximity, correct?

@JosDenysGitHub
Copy link
Collaborator

EVs are the base for calculating proximity, I guess this should not change ?

@makorin0315
Copy link
Collaborator Author

Correct. Proximity calculation should stay as is.

@bdeboe
Copy link
Member

bdeboe commented Sep 3, 2020

Getting Japanese back in line with the other languages to return "regular" paths and emit EVs through a separate mechanism sounds the desirable long-term thing to do. The standalone engine nor the IRIS integration itself would be that much impacted, but applications built on top of IRIS that were expecting EVs from the PathAPI will get something different until they adapt to the new channel for EVs.

@JosDenysGitHub : how does the "sentence attribute" representation of an EV look?

@makorin0315
Copy link
Collaborator Author

In the RAW output, the EV looks like:
<attr type="entity_vector" "レッスン" "ない" "今週">

@makorin0315
Copy link
Collaborator Author

Adding @JosDenysGitHub as an assignee for required engine change - to be worked on after higher priority issues.

@Rei-hub
Copy link

Rei-hub commented Sep 11, 2020

Hi @makorin0315, I’m really sorry for the late posting, and appreciate your kind support.

I’m Rei Noguchi at Gunma University Hospital, and am researching about analysis of mainly medical text with iKnow.
iKnow is really powerful and useful tool, and I’ve found great value in the product concept.

As discussed above, I strongly expect the function of “negation assignment” (identification of a word modified by a negation), which makes iKnow more powerful.
Negations seem to be more common especially in medical text than in other domains
 (e.g. “NO fever”, “Pneumonia pattern was NOT observed on CT imaging”), and these are really critical for grasping right disease states of patients.

As introduced above, I have preliminary implemented a negation assignment algorithm using "iknowpy" as noted in the image attached.
image

This remains just a hypothetical level and is very simple algorithm, but superficially works well in my situation at this time.
Of course, technically, actual negation assignments are really complicated and there may be many cases unsuitable for my algorithm in general, but at least in Japanese medical text, most of negations may be applicable to the following cases, where my algorithm can work.

  1. Concept - Non-relevant - Concept (e.g. 発熱 - は - 無かった。: Fever was not observed.)
  2. Concept - Concept(e.g. 発熱 - ありません。: Fever unobserved.<- in Japanese medical text, a non-relevant word between concept and negation, namely postpositional particle in Japanese grammar, is often omitted.)
  3. just one Concept (e.g 発熱なし: No fever.)

Based on the preliminary results, in the proposal of #1 from @makorin0315, “3 entities (C-R-C) w/ Negation Marker“ seems to be able to cover the above-mentioned cases, and be acceptable for my situation.
Meanwhile, for other domains or in a long sentence, there may be many cases that don’t fit the above logic.
For these cases, “negation span” in the proposal of #2 will be useful and seems to be preferable for generalization.

I'm looking forward to an implementation, and please let me know if you need a help in validation or discussion.

@makorin0315
Copy link
Collaborator Author

Thank you @Rei-hub for your comment & response. The team has decided that the second approach, i.e., attribute expansion by enabling use of the CRC Path, would be do-able and better. For this approach, we first require some code changes by our developer, after which I can start making the linguistic updates. It make take sometime, but we will keep you posted on our progress.

@Rei-hub
Copy link

Rei-hub commented Sep 14, 2020

Thank you @makorin0315 for your quick response, and that is really good news for me and all users. The second approach sounds better and reasonable. That will be a major update and need much time, but I look forward to the release. I would be happy if there is anything I could help with. Thank you.

@makorin0315 makorin0315 changed the title Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & it's conjugated forms) Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) Oct 27, 2020
@makorin0315 makorin0315 added the enhancement New feature or request label Nov 6, 2020
ISC-SDE pushed a commit that referenced this issue Apr 22, 2021
The language model now outputs PathRelevant entities
and simple spans for Negation attributes.
ISC-SDE pushed a commit that referenced this issue Apr 22, 2021
ISC-SDE pushed a commit that referenced this issue Apr 27, 2021
Entity Vectors are emitted as sentence attributes.
Paths are supported like in other languages.
ISC-SDE pushed a commit that referenced this issue Apr 27, 2021
Paths with attributes are now emitted for Japanese,
but there were compilation problems with Japanese paths on Linux.
ISC-SDE pushed a commit that referenced this issue Apr 27, 2021
PathConstruction is changed to PR in metadata.csv.
ISC-SDE added a commit that referenced this issue Apr 27, 2021
GenXML.py generates XML output for language model development.
It uses iKnowXML.xsl for visualisation.
ISC-SDE added a commit that referenced this issue Apr 27, 2021
The updated script emits both Entity Vectors and Paths for Japanese.
ISC-SDE pushed a commit that referenced this issue Apr 27, 2021
The language model now outputs PathRelevant entities
and simple spans for Negation attributes.
ISC-SDE pushed a commit that referenced this issue Apr 27, 2021
Update of ref_testing.py to comply with new output for Japanese.
Update of raw output.
ISC-SDE added a commit that referenced this issue Apr 27, 2021
Version 1.0.12
Japanese-specific changes
See issues for more information.
@makorin0315 makorin0315 reopened this Apr 27, 2021
@makorin0315
Copy link
Collaborator Author

Un-doing Close until unit test issue is resolved and fix is validated.

@makorin0315
Copy link
Collaborator Author

With 2d9670b in place, it's now been confirmed that the issue has been resolved in the latest master branch (iknowpy 1.0.12).

@Rei-hub - requested simple Negation expansion is now available for your evaluation..

@Rei-hub
Copy link

Rei-hub commented Apr 28, 2021

@makorin0315 and all
I'm really happy to hear the good news, and appreciate your prompt response.
I'm now analyzing daily progress notes in electronic medical records for an upcoming conference in June,
so that I will try and evaluate Negation expansion right away, and will report to you the situation and preliminary results soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants