-
Notifications
You must be signed in to change notification settings - Fork 3
creating transformer sentence tagging component #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE)
python/TransformerTagging/plugin-files/descriptor/descriptor.json
line 26 at r1 (raw file):
"description": "Comma-separated list of property names indicating which properties in the feed-forward track or detection to consider translating. If the first property listed is present, then that property will be translated. If it's not, then the next property in the list is considered. At most, one property will be translated.", "type": "STRING", "defaultValue": "TEXT,TRANSCRIPT"
Add TRANSLATION
.
python/TransformerTagging/tests/test_transformer_tagging.py
line 198 at r1 (raw file):
result = comp.get_detections_from_image(job) self.assertEqual(1, len(result))
Let's assert the standard things like:
self.assertEqual(SHORT_SAMPLE_TAGS, props["TAGS"])
self.assertEqual(SHORT_SAMPLE_TRIGGER_SENTENCES, props["TEXT TRAVEL TRIGGER SENTENCES"])
self.assertEqual(SHORT_SAMPLE_OFFSET, props["TEXT TRAVEL TRIGGER SENTENCES OFFSET"])
self.assertEqual(SHORT_SAMPLE_SCORE, props["TEXT TRAVEL TRIGGER SENTENCES SCORE"])
But, minimally, use a different TAG than what's used in the default corpus.
python/TransformerTagging/tests/config/transformer_text_tags_corpus.json
line 1 at r1 (raw file):
[
I don't think this file is necessary. The version in transformer_tagging_component/
should be enough.
python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py
line 119 at r1 (raw file):
properties=job_props, key='FEED_FORWARD_PROP_TO_PROCESS', default_value='TEXT,TRANSCRIPT',
Add TRANSLATION
.
python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py
line 161 at r1 (raw file):
return input_sentences = sent_tokenize(input_text)
How does this handle this kind of text?
This is a sentence
of a good dog.
Is that parsed out as one or two sentences?
Initially, I was thinking that we'd want that to be one sentence because that's how we handle single newlines for breaking up large text for Azure translation, but it may actually be more beneficial for sentence matching to error on the side of smaller rather than larger sentences. It needs to handle partial phrases anyway.
python/TransformerTagging/transformer_tagging_component/transformer_text_tags_corpus.json
line 55 at r1 (raw file):
}, { "text": "This sentence is financ.",
Check spelling: "financ"
Make same change to custom corpus.
We've moved away from using "sample" files like this. It's no longer required since we can do stand-alone testing using the CLI Runner (once the component is Dockerized). If a dev wants to try something out they can modify a unit test. |
This is unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE)
python/TransformerTagging/sample_transformer_tagger.py
line 35 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
We've moved away from using "sample" files like this. It's no longer required since we can do stand-alone testing using the CLI Runner (once the component is Dockerized). If a dev wants to try something out they can modify a unit test.
I removed this in my commit.
python/TransformerTagging/tests/config/transformer_text_tags_corpus.json
line 1 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
I don't think this file is necessary. The version in
transformer_tagging_component/
should be enough.
I removed this in my commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE and @jrobble)
python/TransformerTagging/plugin-files/descriptor/descriptor.json
line 26 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Add
TRANSLATION
.
added
python/TransformerTagging/tests/test_transformer_tagging.py
line 198 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Let's assert the standard things like:
self.assertEqual(SHORT_SAMPLE_TAGS, props["TAGS"]) self.assertEqual(SHORT_SAMPLE_TRIGGER_SENTENCES, props["TEXT TRAVEL TRIGGER SENTENCES"]) self.assertEqual(SHORT_SAMPLE_OFFSET, props["TEXT TRAVEL TRIGGER SENTENCES OFFSET"]) self.assertEqual(SHORT_SAMPLE_SCORE, props["TEXT TRAVEL TRIGGER SENTENCES SCORE"])But, minimally, use a different TAG than what's used in the default corpus.
changed the custom corpus to have different tags and added checks for trigger sentences, offset, and score.
python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py
line 119 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Add
TRANSLATION
.
added
python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py
line 161 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
How does this handle this kind of text?
This is a sentence of a good dog.
Is that parsed out as one or two sentences?
Initially, I was thinking that we'd want that to be one sentence because that's how we handle single newlines for breaking up large text for Azure translation, but it may actually be more beneficial for sentence matching to error on the side of smaller rather than larger sentences. It needs to handle partial phrases anyway.
NLTK treats it as one sentence.
python/TransformerTagging/transformer_tagging_component/transformer_text_tags_corpus.json
line 55 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Check spelling: "financ"
Make same change to custom corpus.
The original regex was financ to match both finance and financial. I've changed the spelling to finance and added an entry for financial.
python/TransformerTagging/tests/data/multiple_tags.txt
line 1 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
This is unused.
was planning to add a test for multiple tags but the custom confidence test returns multiple tags.
Issues:
This change is