Skip to content

Conversation

jrobble
Copy link
Member

@jrobble jrobble commented Sep 15, 2023

Copy link
Member Author

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE)


python/TransformerTagging/plugin-files/descriptor/descriptor.json line 26 at r1 (raw file):

          "description": "Comma-separated list of property names indicating which properties in the feed-forward track or detection to consider translating. If the first property listed is present, then that property will be translated. If it's not, then the next property in the list is considered. At most, one property will be translated.",
          "type": "STRING",
          "defaultValue": "TEXT,TRANSCRIPT"

Add TRANSLATION.


python/TransformerTagging/tests/test_transformer_tagging.py line 198 at r1 (raw file):

        result = comp.get_detections_from_image(job)

        self.assertEqual(1, len(result))

Let's assert the standard things like:

self.assertEqual(SHORT_SAMPLE_TAGS, props["TAGS"])
self.assertEqual(SHORT_SAMPLE_TRIGGER_SENTENCES, props["TEXT TRAVEL TRIGGER SENTENCES"])
self.assertEqual(SHORT_SAMPLE_OFFSET, props["TEXT TRAVEL TRIGGER SENTENCES OFFSET"])
self.assertEqual(SHORT_SAMPLE_SCORE, props["TEXT TRAVEL TRIGGER SENTENCES SCORE"])

But, minimally, use a different TAG than what's used in the default corpus.


python/TransformerTagging/tests/config/transformer_text_tags_corpus.json line 1 at r1 (raw file):

[

I don't think this file is necessary. The version in transformer_tagging_component/ should be enough.


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 119 at r1 (raw file):

                properties=job_props,
                key='FEED_FORWARD_PROP_TO_PROCESS',
                default_value='TEXT,TRANSCRIPT',

Add TRANSLATION.


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 161 at r1 (raw file):

            return

        input_sentences = sent_tokenize(input_text)

How does this handle this kind of text?

This is a sentence
of a good dog.

Is that parsed out as one or two sentences?

Initially, I was thinking that we'd want that to be one sentence because that's how we handle single newlines for breaking up large text for Azure translation, but it may actually be more beneficial for sentence matching to error on the side of smaller rather than larger sentences. It needs to handle partial phrases anyway.


python/TransformerTagging/transformer_tagging_component/transformer_text_tags_corpus.json line 55 at r1 (raw file):

  },
  {
    "text": "This sentence is financ.",

Check spelling: "financ"

Make same change to custom corpus.

@jrobble
Copy link
Member Author

jrobble commented Sep 16, 2023

python/TransformerTagging/sample_transformer_tagger.py line 35 at r1 (raw file):

def main():
    wrapper = TransformerWrapper({})
    detection_props = dict(TEXT="I also have a knife. I have a gun. I took a plane to Florida. I bought some cocaine. "

We've moved away from using "sample" files like this. It's no longer required since we can do stand-alone testing using the CLI Runner (once the component is Dockerized). If a dev wants to try something out they can modify a unit test.

@jrobble
Copy link
Member Author

jrobble commented Sep 16, 2023

python/TransformerTagging/tests/data/multiple_tags.txt line 1 at r1 (raw file):

Vehicles include wagons, bicycles, motor vehicles (motorcycles, cars, trucks, buses, mobility scooters for disabled people), railed vehicles (trains, trams), watercraft (ships, boats, underwater vehicles), amphibious vehicles (screw-propelled vehicles, hovercraft), aircraft (airplanes, helicopters, aerostats) and spacecraft. "An automated teller machine (ATM) is an electronic telecommunications device that enables customers of financial institutions to perform financial transactions, such as cash withdrawals, deposits, funds transfers, balance inquiries or account information inquiries, at any time and without the need for direct interaction with bank staff. The advent of widespread text-messaging has resulted in the cell phone novel, the first literary genre to emerge from the cellular age, via text messaging to a website that collects the novels as a whole. The sword developed from the knife or dagger. A passport holder is normally entitled to enter the country that issued the passport, though some people entitled to a passport may not be full citizens with right of abode (e.g. American nationals or British nationals). "A number of hotels and motels have entered the public consciousness through popular culture.

This is unused.

Copy link
Member Author

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE)


python/TransformerTagging/sample_transformer_tagger.py line 35 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

We've moved away from using "sample" files like this. It's no longer required since we can do stand-alone testing using the CLI Runner (once the component is Dockerized). If a dev wants to try something out they can modify a unit test.

I removed this in my commit.


python/TransformerTagging/tests/config/transformer_text_tags_corpus.json line 1 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

I don't think this file is necessary. The version in transformer_tagging_component/ should be enough.

I removed this in my commit.

Copy link
Contributor

@mcrensh mcrensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE and @jrobble)


python/TransformerTagging/plugin-files/descriptor/descriptor.json line 26 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Add TRANSLATION.

added


python/TransformerTagging/tests/test_transformer_tagging.py line 198 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Let's assert the standard things like:

self.assertEqual(SHORT_SAMPLE_TAGS, props["TAGS"])
self.assertEqual(SHORT_SAMPLE_TRIGGER_SENTENCES, props["TEXT TRAVEL TRIGGER SENTENCES"])
self.assertEqual(SHORT_SAMPLE_OFFSET, props["TEXT TRAVEL TRIGGER SENTENCES OFFSET"])
self.assertEqual(SHORT_SAMPLE_SCORE, props["TEXT TRAVEL TRIGGER SENTENCES SCORE"])

But, minimally, use a different TAG than what's used in the default corpus.

changed the custom corpus to have different tags and added checks for trigger sentences, offset, and score.


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 119 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Add TRANSLATION.

added


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 161 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

How does this handle this kind of text?

This is a sentence
of a good dog.

Is that parsed out as one or two sentences?

Initially, I was thinking that we'd want that to be one sentence because that's how we handle single newlines for breaking up large text for Azure translation, but it may actually be more beneficial for sentence matching to error on the side of smaller rather than larger sentences. It needs to handle partial phrases anyway.

NLTK treats it as one sentence.


python/TransformerTagging/transformer_tagging_component/transformer_text_tags_corpus.json line 55 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Check spelling: "financ"

Make same change to custom corpus.

The original regex was financ to match both finance and financial. I've changed the spelling to finance and added an entry for financial.


python/TransformerTagging/tests/data/multiple_tags.txt line 1 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

This is unused.

was planning to add a test for multiple tags but the custom confidence test returns multiple tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants