creating transformer sentence tagging component #337

jrobble · 2023-09-15T20:50:36Z

Issues:

Create a TransformerTagging component openmpf#1692

This change is

jrobble

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE)

python/TransformerTagging/plugin-files/descriptor/descriptor.json line 26 at r1 (raw file):

          "description": "Comma-separated list of property names indicating which properties in the feed-forward track or detection to consider translating. If the first property listed is present, then that property will be translated. If it's not, then the next property in the list is considered. At most, one property will be translated.",
          "type": "STRING",
          "defaultValue": "TEXT,TRANSCRIPT"

Add TRANSLATION.

python/TransformerTagging/tests/test_transformer_tagging.py line 198 at r1 (raw file):

        result = comp.get_detections_from_image(job)

        self.assertEqual(1, len(result))

Let's assert the standard things like:

self.assertEqual(SHORT_SAMPLE_TAGS, props["TAGS"])
self.assertEqual(SHORT_SAMPLE_TRIGGER_SENTENCES, props["TEXT TRAVEL TRIGGER SENTENCES"])
self.assertEqual(SHORT_SAMPLE_OFFSET, props["TEXT TRAVEL TRIGGER SENTENCES OFFSET"])
self.assertEqual(SHORT_SAMPLE_SCORE, props["TEXT TRAVEL TRIGGER SENTENCES SCORE"])

But, minimally, use a different TAG than what's used in the default corpus.

python/TransformerTagging/tests/config/transformer_text_tags_corpus.json line 1 at r1 (raw file):

I don't think this file is necessary. The version in transformer_tagging_component/ should be enough.

python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 119 at r1 (raw file):

                properties=job_props,
                key='FEED_FORWARD_PROP_TO_PROCESS',
                default_value='TEXT,TRANSCRIPT',

Add TRANSLATION.

python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 161 at r1 (raw file):

            return

        input_sentences = sent_tokenize(input_text)

How does this handle this kind of text?

This is a sentence
of a good dog.

Is that parsed out as one or two sentences?

Initially, I was thinking that we'd want that to be one sentence because that's how we handle single newlines for breaking up large text for Azure translation, but it may actually be more beneficial for sentence matching to error on the side of smaller rather than larger sentences. It needs to handle partial phrases anyway.

python/TransformerTagging/transformer_tagging_component/transformer_text_tags_corpus.json line 55 at r1 (raw file):

  },
  {
    "text": "This sentence is financ.",

Check spelling: "financ"

Make same change to custom corpus.

jrobble · 2023-09-16T02:05:36Z

python/TransformerTagging/sample_transformer_tagger.py line 35 at r1 (raw file):

def main():
    wrapper = TransformerWrapper({})
    detection_props = dict(TEXT="I also have a knife. I have a gun. I took a plane to Florida. I bought some cocaine. "

We've moved away from using "sample" files like this. It's no longer required since we can do stand-alone testing using the CLI Runner (once the component is Dockerized). If a dev wants to try something out they can modify a unit test.

jrobble · 2023-09-16T02:49:52Z

python/TransformerTagging/tests/data/multiple_tags.txt line 1 at r1 (raw file):

Vehicles include wagons, bicycles, motor vehicles (motorcycles, cars, trucks, buses, mobility scooters for disabled people), railed vehicles (trains, trams), watercraft (ships, boats, underwater vehicles), amphibious vehicles (screw-propelled vehicles, hovercraft), aircraft (airplanes, helicopters, aerostats) and spacecraft. "An automated teller machine (ATM) is an electronic telecommunications device that enables customers of financial institutions to perform financial transactions, such as cash withdrawals, deposits, funds transfers, balance inquiries or account information inquiries, at any time and without the need for direct interaction with bank staff. The advent of widespread text-messaging has resulted in the cell phone novel, the first literary genre to emerge from the cellular age, via text messaging to a website that collects the novels as a whole. The sword developed from the knife or dagger. A passport holder is normally entitled to enter the country that issued the passport, though some people entitled to a passport may not be full citizens with right of abode (e.g. American nationals or British nationals). "A number of hotels and motels have entered the public consciousness through popular culture.

This is unused.

jrobble

Reviewable status: 0 of 12 files reviewed, 6 unresolved discussions (waiting on @hhuangMITRE)

python/TransformerTagging/sample_transformer_tagger.py line 35 at r1 (raw file):