Skip to content

Conversation

mcrensh
Copy link
Contributor

@mcrensh mcrensh commented Oct 6, 2022

@mcrensh mcrensh requested a review from hhuangMITRE October 6, 2022 18:36
@mcrensh mcrensh self-assigned this Oct 6, 2022
Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 11 files at r1.
Reviewable status: 9 of 11 files reviewed, 14 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/Dockerfile line 37 at r1 (raw file):

RUN argospm update
    

Could we add a comment here indicating that users can:

  • Download another supported model of interest by adding
    argospm install translate_<INPUT_ISO_LANG>_en.

  • Refer to README on currently supported languages.

I know the model can dynamically install additional languages, but there may be a case where we prefer to have them readily available.

Let's consider adding translate-zh_en (Chinese) to the Dockerfile as well, unless it takes up too much space.


python/ArgosTranslation/README.md line 13 at r1 (raw file):

- `FEED_FORWARD_PROP_TO_PROCESS`: Controls which properties of the feed-forward track or detection are considered for translation. This should be a comma-separated list of property names (default: `"TEXT,TRANSCRIPT"`). The first named property in the list that is present is translated. At most, one property will be translated.

- `LANGUAGE_FEED_FORWARD_PROP`: As above, a comma-separated list of property names (default: `"DECODED_LANGUAGE,LANGUAGE"`), which indicate which property of the feed-forward track or detection may be used to determine the source language of the text to be translated. This language is expected to be provided as an ISO 639-1 language code.

Can we also add support for ISO-639-2 as well?

If you need a reference, please check here for many of the ISO mappings.


python/ArgosTranslation/README.md line 27 at r1 (raw file):

Note: Argos underperforms when translating to and from Chinese

| ISO | Language         |

Hope it's not too much trouble but it would be nice to have ISO-639-2 supported also.

We've standardized the Tika component to produce ISO-639-2 outputs for another prototype component. I was also asked by Jeff to include that info in my component README as well.


python/ArgosTranslation/README.md line 31 at r1 (raw file):

| `ar` | Arabic           |
| `az` | Azerbaijani      |
| `zh` | Chinese          |

Quick question: Does it offer Chinese Simplified vs. Traditional or both?

Update: I checked the docs, only one Chinese option is offered so we can test for that later.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 46 at r1 (raw file):

    def get_detections_from_video(job: mpf.VideoJob) -> Sequence[mpf.VideoTrack]:
        logger.info(f'Received video job')

The following code block:

        try:
            if job.feed_forward_track is None:
                raise mpf.DetectionError.UNSUPPORTED_DATA_TYPE.exception(
                    f'Component can only process feed forward '
                    ' jobs, but no feed forward track provided. ')

            tw = TranslationWrapper(job.job_properties)
            tw.add_translations(job.feed_forward_track.detection_properties)

            for ff_location in job.feed_forward_track.frame_locations.values():
                tw.add_translations(ff_location.detection_properties)
            return [job.feed_forward_track]

        except Exception:
            logger.exception(
                f'Failed to complete job due to the following exception:')
            raise

Is very similar in 3 of the four get_detections, with a larger variation for the generic version.

I recommend creating a helper function to further reduce the amount of repeated code, with the input to the function being:

from_video: job.feed_forward_track and job.feed_forward_track.frame_locations.values()
from_image: job.feed_forward_location and [job.feed_forward_location]
from_audio: job.feed_forward_track and [job.feed_forward_track]

That way you can reuse the helper function from_video for the other two media types. Let me know if this is not clear. I used something similar to reduce the amount of code in TikaTextDetection after I was staring at it for a bit.

I feel that the generic could also be refactored but it does operate slightly differently from the other three so it's fine to keep that one unique.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 69 at r1 (raw file):

    def get_detections_from_image(job: mpf.ImageJob) -> Sequence[mpf.ImageLocation]:
        logger.info(f'Received image job')

Consider refactoring with the above comment.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 89 at r1 (raw file):

    def get_detections_from_audio(job: mpf.AudioJob) -> Sequence[mpf.AudioTrack]:
        logger.info(f'Received audio job')

Consider refactoring with the above comment.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 108 at r1 (raw file):

    @staticmethod
    def get_detections_from_generic(job: mpf.GenericJob) -> Sequence[mpf.GenericTrack]:
        if job.feed_forward_track:

With the above refactor changes we may be able to refactor this too. I'll leave that up to you, or for another reviewer to add their thoughts in.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 153 at r1 (raw file):

                properties=job_props,
                key='LANGUAGE_FEED_FORWARD_PROP',
                default_value='DECODED_LANGUAGE,LANGUAGE',

Can we update this list to also include ISO_LANGUAGE with support for ISO-639-2? This is to ensure compatibility with Tika Language Detection.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 172 at r1 (raw file):

            available_packages = package.get_available_packages()
        except Exception:
            logger.info("Downloading package index.")

Sorry to nitpick, but some log lines have punctuation, and others don't. I personally like to add . to let the debugger know where the single-statement ends.

Thinking about it some more, feel free to ignores this comment, unless another reviewer wants it as well. I don't think it matters since the logger would print out a new line anyways.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 176 at r1 (raw file):

            available_packages = package.get_available_packages()

        available_packages = [y.from_code for y in list(

See next comment. I assume this is fine as is.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 216 at r1 (raw file):

        if self._from_lang not in self.installed_lang_codes:
            logger.info(f"Language {self._from_lang} is not installed. Installing package.")

This block of code, was it adapted from this example?
https://github.com/argosopentech/argos-translate#python

It's fine to reuse, but out of fairness, we can add a line such as:

# Helper code (X lines) adapted from Argos Translate [https://github.com/argosopentech/argos-translate#python] for [PURPOSE]

Or (the below reads a bit easier):

# From Argos Translate for downloading language models.

There was a similar lambda function call above but I assume you wrote that one following this example. I feel that's fine but this one is clearly reused so we want to give them a little credit.

If there were several sections of adapted code/directly copied from their repo, we can add a general descriptor like adapted from ___ and ___.

It's also useful in case some of their functionalities change and we need to update.

So in summary if several sections were adapted, include a single reference at the top of this class, and a short indicator where sections are directly copied. If slightly changed for our purposes, I think a one general reference would be great.


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 32 at r1 (raw file):

          "description": "Comma-separated list of property names indicating which properties in the feed-forward track or detection determine the language from which to translate. If the first property listed is present, then that property will be used. If it's not, then the next property in the list is considered. If none are present, fall back to FROM_LANGUAGE.",
          "type": "STRING",
          "defaultValue": "DECODED_LANGUAGE,LANGUAGE"

Add ISO_LANGUAGE for our Tika component pipeline and support for ISO-639-2 if possible.


python/ArgosTranslation/tests/test_argos_translation.py line 44 at r1 (raw file):

SHORT_OUTPUT = "Where's the library?"
SHORT_OUTPUT_CHINESE = "Thanks."

Leaving a note for myself to test this out.

Also, add a quick note citing that this is text from our US Declaration of Independence, it's well written. :)


python/ArgosTranslation/tests/data/spanish_long.txt line 1 at r1 (raw file):

Sostenemos como evidentes estas verdades: que todos los hombres son creados iguales, que son dotados por su Creador de ciertos derechos inalienables, que entre éstos están la vida, la libertad y la búsqueda de la felicidad. Que para gara ntizar estos derechos se instituyen entre los hombres los gobiernos, que derivan sus poderes legítimos del consentimiento de los gobernados. Que cuando quiera que una forma de gobierno se haga destructora de estos principios, el pueblo tiene el derec ho a reformarla o abolirla e instituir un nuevo gobierno que se funde en dichos principios, y a organizar sus poderes en la forma que a su juicio ofrecerá las mayores probabilidades de alcanzar su seguridad y felicidad.

Oh I almost forgot: If you're cite text from a source, please also add it to the NOTICE file, thanks.

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 14 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 46 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

The following code block:

        try:
            if job.feed_forward_track is None:
                raise mpf.DetectionError.UNSUPPORTED_DATA_TYPE.exception(
                    f'Component can only process feed forward '
                    ' jobs, but no feed forward track provided. ')

            tw = TranslationWrapper(job.job_properties)
            tw.add_translations(job.feed_forward_track.detection_properties)

            for ff_location in job.feed_forward_track.frame_locations.values():
                tw.add_translations(ff_location.detection_properties)
            return [job.feed_forward_track]

        except Exception:
            logger.exception(
                f'Failed to complete job due to the following exception:')
            raise

Is very similar in 3 of the four get_detections, with a larger variation for the generic version.

I recommend creating a helper function to further reduce the amount of repeated code, with the input to the function being:

from_video: job.feed_forward_track and job.feed_forward_track.frame_locations.values()
from_image: job.feed_forward_location and [job.feed_forward_location]
from_audio: job.feed_forward_track and [job.feed_forward_track]

That way you can reuse the helper function from_video for the other two media types. Let me know if this is not clear. I used something similar to reduce the amount of code in TikaTextDetection after I was staring at it for a bit.

I feel that the generic could also be refactored but it does operate slightly differently from the other three so it's fine to keep that one unique.

By generic I meant generic get_detections, that one can be left as is.

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 13 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

        if self._from_lang not in self.installed_lang_codes:
            logger.info(f"Language {self._from_lang} is not installed. Installing package.")
            available_packages = package.get_available_packages()

This block of code, was it adapted from this example?
https://github.com/argosopentech/argos-translate#python

It's fine to reuse, but out of fairness, we can add a line such as:

Helper code (X lines) adapted from Argos Translate (https://github.com/argosopentech/argos-translate#python) for [PURPOSE]

Or (the below reads a bit easier):

From Argos Translate for downloading language models. (https://github.com/argosopentech/argos-translate#python)

There was a similar lambda function call above but I assume you wrote that one following this example. This section looks like it's been copied from their example so we want to add a quick cite here.

If there were several sections of adapted code/directly copied from their repo, we can add a general descriptor at the top of the class.

It's also useful in case some of their functionalities change and we need to update.

So in summary if several sections were adapted, include a single reference at the top of this class, and a short indicator where sections are directly copied to track potential changes.

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 13 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

        if self._from_lang not in self.installed_lang_codes:
            logger.info(f"Language {self._from_lang} is not installed. Installing package.")
            available_packages = package.get_available_packages()

This block of code, was it adapted from this example?
https://github.com/argosopentech/argos-translate#python

It's fine to reuse, but out of fairness, we can add a line such as:

Helper code (X lines) adapted from Argos Translate (https://github.com/argosopentech/argos-translate#python) for [PURPOSE]

Or :

From Argos Translate for downloading language models. (https://github.com/argosopentech/argos-translate#python)

There was a similar lambda function call above but I assume you wrote that one following this example. This section looks like it's been copied from their example so we want to add a quick cite here.

If there were several sections of adapted code/directly copied from their repo, we can add a general descriptor at the top of the class.

It's also useful in case some of their functionalities change and we need to update.

So in summary if several sections were adapted, include a single reference at the top of this class, and a short indicator where sections are directly copied to track potential changes.

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 13 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

This block of code, was it adapted from this example?
https://github.com/argosopentech/argos-translate#python

It's fine to reuse, but out of fairness, we can add a line such as:

Helper code (X lines) adapted from Argos Translate (https://github.com/argosopentech/argos-translate#python) for [PURPOSE]

Or :

From Argos Translate for downloading language models. (https://github.com/argosopentech/argos-translate#python)

There was a similar lambda function call above but I assume you wrote that one following this example. This section looks like it's been copied from their example so we want to add a quick cite here.

If there were several sections of adapted code/directly copied from their repo, we can add a general descriptor at the top of the class.

It's also useful in case some of their functionalities change and we need to update.

So in summary if several sections were adapted, include a single reference at the top of this class, and a short indicator where sections are directly copied to track potential changes.

I apologize for the other retracted versions, please ignore them and refer to this active comment.

Copy link
Contributor Author

@mcrensh mcrensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 13 unresolved discussions (waiting on @hhuangMITRE)


python/ArgosTranslation/Dockerfile line 37 at r1 (raw file):

Download another supported model of interest by adding
Added a comment on how to download models in the Dockerfile. We had discussed having Chinese installed but due to its very bad performance on Chinese we chose to leave it out

@mcrensh
Copy link
Contributor Author

mcrensh commented Jan 5, 2023

python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 32 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Add ISO_LANGUAGE for our Tika component pipeline and support for ISO-639-2 if possible.

Added ISO_LANGAUGE to default values of LANGUAGE_FEED_FORWARD_PROP

Copy link
Contributor Author

@mcrensh mcrensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 13 unresolved discussions (waiting on @hhuangMITRE and @mcrensh)


python/ArgosTranslation/README.md line 13 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Can we also add support for ISO-639-2 as well?

If you need a reference, please check here for many of the ISO mappings.

Added mapping for ISO-639-2


python/ArgosTranslation/README.md line 27 at r1 (raw file):

ISO-639-2
Added ISO-639-2 mappings


python/ArgosTranslation/README.md line 31 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Quick question: Does it offer Chinese Simplified vs. Traditional or both?

Update: I checked the docs, only one Chinese option is offered so we can test for that later.

I remember when I was investigating how bad it was at translating Chinese I was able to pass in both simplified and traditional, but I'm not sure if it is better at one vs the other since in general the translations were...not great.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 46 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

By generic I meant generic get_detections, that one can be left as is.

Refactored the get detections for video, audio, and image. Left generic alone. I ended up tweaking your suggestion a bit because there was issues passing job.feed_forward_track.frame_locations.values() when job.feed_foward_track is null, and I wanted to keep the null check in the method to avoid repeating it in all the methods.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 69 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Consider refactoring with the above comment.

Refactored


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 89 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Consider refactoring with the above comment.

Refactored


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 108 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

With the above refactor changes we may be able to refactor this too. I'll leave that up to you, or for another reviewer to add their thoughts in.

I've left generic as is


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 153 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Can we update this list to also include ISO_LANGUAGE with support for ISO-639-2? This is to ensure compatibility with Tika Language Detection.

The list of supported languages in the code is still ISO-639-1, but I've added mappings for ISO-639-2. I've also added the ISO-639-2 codes to the README


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 172 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Sorry to nitpick, but some log lines have punctuation, and others don't. I personally like to add . to let the debugger know where the single-statement ends.

Thinking about it some more, feel free to ignores this comment, unless another reviewer wants it as well. I don't think it matters since the logger would print out a new line anyways.

Added punctuation. I agree punctuation or not, it should be consistent.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 176 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

See next comment. I assume this is fine as is.

I didn't add a citation for this section, but I did for the section you commented more about.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 216 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

This block of code, was it adapted from this example?
https://github.com/argosopentech/argos-translate#python

It's fine to reuse, but out of fairness, we can add a line such as:

# Helper code (X lines) adapted from Argos Translate [https://github.com/argosopentech/argos-translate#python] for [PURPOSE]

Or (the below reads a bit easier):

# From Argos Translate for downloading language models.

There was a similar lambda function call above but I assume you wrote that one following this example. I feel that's fine but this one is clearly reused so we want to give them a little credit.

If there were several sections of adapted code/directly copied from their repo, we can add a general descriptor like adapted from ___ and ___.

It's also useful in case some of their functionalities change and we need to update.

So in summary if several sections were adapted, include a single reference at the top of this class, and a short indicator where sections are directly copied. If slightly changed for our purposes, I think a one general reference would be great.

I just used the example code so we don't need to add a general descriptor, I just added the second comment you suggested. I didn't see a need to cite since it's pretty barebones, I don't really see how I would write it differently except for the lamba function, but the logic would be the same in the end. It was an example demonstration how to use the API to download packages, which there isn't really any other way to go about doing from what I can see. If there was more involved in the logic I would've cited it, but I've now added the comment you suggested.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

This block of code, was it adapted from this example?
https://github.com/argosopentech/argos-translate#python

It's fine to reuse, but out of fairness, we can add a line such as:

Helper code (X lines) adapted from Argos Translate (https://github.com/argosopentech/argos-translate#python) for [PURPOSE]

Or (the below reads a bit easier):

From Argos Translate for downloading language models. (https://github.com/argosopentech/argos-translate#python)

There was a similar lambda function call above but I assume you wrote that one following this example. This section looks like it's been copied from their example so we want to add a quick cite here.

If there were several sections of adapted code/directly copied from their repo, we can add a general descriptor at the top of the class.

It's also useful in case some of their functionalities change and we need to update.

So in summary if several sections were adapted, include a single reference at the top of this class, and a short indicator where sections are directly copied to track potential changes.

See comment above.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

I apologize for the other retracted versions, please ignore them and refer to this active comment.

I just used the example code so we don't need to add a general descriptor, I added the second comment you suggested. I didn't see a need to cite since it's pretty barebones, I don't really see how I would write it differently except for the lamba function, but the logic would be the same in the end. It was an example demonstration how to use the API to download packages, which there isn't really any other way to go about doing from what I can see. If there was more involved in the logic I would've cited it, but I've now added the comment you suggested. The code I used is mainly necessary API calls, there's no way to avoid copying most of their example.


python/ArgosTranslation/tests/test_argos_translation.py line 44 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Leaving a note for myself to test this out.

Also, add a quick note citing that this is text from our US Declaration of Independence, it's well written. :)

Added a NOTICE, the English text is not word for word the same as the English version since it's translated output from Argos so I didn't add a comment. The NOTICE does cite the Declaration of Independence.


python/ArgosTranslation/tests/data/spanish_long.txt line 1 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Oh I almost forgot: If you're cite text from a source, please also add it to the NOTICE file, thanks.

Added a NOTICE

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please review the requested changes and let me know if you have any questions.

Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/Dockerfile line 37 at r1 (raw file):

Previously, mcrensh wrote…

Download another supported model of interest by adding
Added a comment on how to download models in the Dockerfile. We had discussed having Chinese installed but due to its very bad performance on Chinese we chose to leave it out

Got it, thanks, in that case, I'm fine with leaving it out.


python/ArgosTranslation/README.md line 13 at r1 (raw file):

Previously, mcrensh wrote…

Added mapping for ISO-639-2

I appreciate it, adding support makes it easier as we have one (potentially more) components that will return language detections in that format. Also good to standardize as well.


python/ArgosTranslation/README.md line 27 at r1 (raw file):

Previously, mcrensh wrote…

ISO-639-2
Added ISO-639-2 mappings

Thanks, I appreciate it!


python/ArgosTranslation/README.md line 31 at r1 (raw file):

Previously, mcrensh wrote…

I remember when I was investigating how bad it was at translating Chinese I was able to pass in both simplified and traditional, but I'm not sure if it is better at one vs the other since in general the translations were...not great.

Ah, got it. Some other services we tested also showed ... not great Chinese -> English translations.


python/ArgosTranslation/README.md line 17 at r2 (raw file):

- `DEFAULT_SOURCE_LANGUAGE`: The default source language to use if none of the property names listed in `LANGUAGE_FEED_FORWARD_PROP` are present in a feed-forward track or detection. This language is used when running a generic job with a raw text file (hence no feed-forward tracks).

For consistency, just like Whisper, please include info on which properties are output by Argos Translate.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 46 at r1 (raw file):

Previously, mcrensh wrote…

Refactored the get detections for video, audio, and image. Left generic alone. I ended up tweaking your suggestion a bit because there was issues passing job.feed_forward_track.frame_locations.values() when job.feed_foward_track is null, and I wanted to keep the null check in the method to avoid repeating it in all the methods.

Gotcha, yep, that makes sense. Thanks for making the refactor!


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 153 at r1 (raw file):

Previously, mcrensh wrote…

The list of supported languages in the code is still ISO-639-1, but I've added mappings for ISO-639-2. I've also added the ISO-639-2 codes to the README

Thanks, again I appreciate the effort made to support the additional ISO format.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 172 at r1 (raw file):

Previously, mcrensh wrote…

Added punctuation. I agree punctuation or not, it should be consistent.

Thanks, and moving forward (if it's ok), I'm happy to make comment/typo changes as well. I would like to avoid nitpicking as often.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 176 at r1 (raw file):

Previously, mcrensh wrote…

I didn't add a citation for this section, but I did for the section you commented more about.

Yep, I agree if it's a barebones example, not much can be done to update, and paraphrasing code isn't really needed either.

Thinking back to what I mentioned, I think the proper course of action is to mention that we used the Argos library for this component in a LICENSE file. Regardless, we'll need to reference the licensing the Argos team is working with.

Please add a LICENSE file for this component, containing the OpenMPF license, and possibly the Argos license as well. In that LICENSE file, we can mention that the Argos library is used in argos_translation_component.py.

UPDATE: I see Argos has two open-source libraries available. I think either of them is fine, include a copy of that in our LICENSE file, thanks! If you need a sample, refer to the Tika/Tesseract or other open-source python component LICENSE files.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 216 at r1 (raw file):

Previously, mcrensh wrote…

I just used the example code so we don't need to add a general descriptor, I just added the second comment you suggested. I didn't see a need to cite since it's pretty barebones, I don't really see how I would write it differently except for the lamba function, but the logic would be the same in the end. It was an example demonstration how to use the API to download packages, which there isn't really any other way to go about doing from what I can see. If there was more involved in the logic I would've cited it, but I've now added the comment you suggested.

Got it, thanks! As mentioned above, I forgot that a LICENSE file would also properly attribute our usage of Argos Translate (library), and is also needed for our components. Please refer to the above comment. I'll mark these as accepted.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

Previously, mcrensh wrote…

I just used the example code so we don't need to add a general descriptor, I added the second comment you suggested. I didn't see a need to cite since it's pretty barebones, I don't really see how I would write it differently except for the lamba function, but the logic would be the same in the end. It was an example demonstration how to use the API to download packages, which there isn't really any other way to go about doing from what I can see. If there was more involved in the logic I would've cited it, but I've now added the comment you suggested. The code I used is mainly necessary API calls, there's no way to avoid copying most of their example.

Agreed. Please refer to above comment for adding in a LICENSE file. We will also need that in place when this lands.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 8 at r2 (raw file):

# 52.227-14, Alt. IV (DEC 2007).                                            #
#                                                                           #
# Copyright 2022 The MITRE Corporation. All Rights Reserved.                #

Update to 2023, or let me know if I can update it, thanks!


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 196 at r2 (raw file):

            logger.warning("No text to translate found in track.")
            return

From Brian's component:

        # Certain components that process videos and don't do tracking will create tracks with a
        # single detection. The track and detection will have the same set of properties, so
        # it is relatively likely we will run in to duplicate text.

Brian then uses a dictionary to hold past TEXT results and reuse them if the same tracks are encountered again during a job. I think for this code, we can include a similar functionality to save processing time:

  1. Store input text and associated translation settings (i.e. key = "FR->EN:<input_FR_text>") in a dictionary along with its processed translation.
  2. Retrieve processed translation from dict if we encounter it again.

I'd imagine it would also be useful for scenarios where we're processing many sentence fragments/tracks at once. It's likely we'd run into repeats.

For reference, please check the code here: https://github.com/openmpf/openmpf-components/blob/3723d2479d104bdd88960b5648155b2eb369d337/python/AzureTranslation/acs_translation_component/acs_translation_component.py
(lines 183, 236, 271).


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 208 at r2 (raw file):

                    self._from_lang = self.iso_map[lang]
                else:
                    raise mpf.DetectionError.DETECTION_FAILED.exception(

Not sure if we should return detection failed or SKIPPED TRANSLATION, so I'm just leaving a comment here to discuss later.


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 32 at r1 (raw file):

Previously, mcrensh wrote…

Added ISO_LANGAUGE to default values of LANGUAGE_FEED_FORWARD_PROP

Thanks!


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 3 at r2 (raw file):

{
  "componentName": "ArgosTranslation",
  "componentVersion": "7.0",

We'll likely need to update these numbers to match the latest version of OpenMPF.


python/ArgosTranslation/tests/test_argos_translation.py line 44 at r1 (raw file):

Previously, mcrensh wrote…

Added a NOTICE, the English text is not word for word the same as the English version since it's translated output from Argos so I didn't add a comment. The NOTICE does cite the Declaration of Independence.

Ah, that makes sense, somehow I auto-read the first line as "We hold these truths as self-evident" rather than "We hold as evident these truths".

Overall, reading this over more thoroughly and comparing the paraphrased version to our original document, I'm impressed with how well the Argos translation holds up, it's on-par with the other translations I've seen.

Copy link
Contributor Author

@mcrensh mcrensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 13 files reviewed, 7 unresolved discussions (waiting on @hhuangMITRE and @mcrensh)


python/ArgosTranslation/README.md line 17 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

For consistency, just like Whisper, please include info on which properties are output by Argos Translate.

Added a section on output properties


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 46 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Gotcha, yep, that makes sense. Thanks for making the refactor!

Done.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 172 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Thanks, and moving forward (if it's ok), I'm happy to make comment/typo changes as well. I would like to avoid nitpicking as often.

Yep you can make those changes!


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 176 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Yep, I agree if it's a barebones example, not much can be done to update, and paraphrasing code isn't really needed either.

Thinking back to what I mentioned, I think the proper course of action is to mention that we used the Argos library for this component in a LICENSE file. Regardless, we'll need to reference the licensing the Argos team is working with.

Please add a LICENSE file for this component, containing the OpenMPF license, and possibly the Argos license as well. In that LICENSE file, we can mention that the Argos library is used in argos_translation_component.py.

UPDATE: I see Argos has two open-source libraries available. I think either of them is fine, include a copy of that in our LICENSE file, thanks! If you need a sample, refer to the Tika/Tesseract or other open-source python component LICENSE files.

Added LICENSE


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 216 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Got it, thanks! As mentioned above, I forgot that a LICENSE file would also properly attribute our usage of Argos Translate (library), and is also needed for our components. Please refer to the above comment. I'll mark these as accepted.

Added LICENSE


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 217 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Agreed. Please refer to above comment for adding in a LICENSE file. We will also need that in place when this lands.

Done.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 8 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Update to 2023, or let me know if I can update it, thanks!

Updated, feel free to update it next time!


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 196 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

From Brian's component:

        # Certain components that process videos and don't do tracking will create tracks with a
        # single detection. The track and detection will have the same set of properties, so
        # it is relatively likely we will run in to duplicate text.

Brian then uses a dictionary to hold past TEXT results and reuse them if the same tracks are encountered again during a job. I think for this code, we can include a similar functionality to save processing time:

  1. Store input text and associated translation settings (i.e. key = "FR->EN:<input_FR_text>") in a dictionary along with its processed translation.
  2. Retrieve processed translation from dict if we encounter it again.

I'd imagine it would also be useful for scenarios where we're processing many sentence fragments/tracks at once. It's likely we'd run into repeats.

For reference, please check the code here: https://github.com/openmpf/openmpf-components/blob/3723d2479d104bdd88960b5648155b2eb369d337/python/AzureTranslation/acs_translation_component/acs_translation_component.py
(lines 183, 236, 271).

Added caching of past results


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 3 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

We'll likely need to update these numbers to match the latest version of OpenMPF.

Done.

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @mcrensh)


python/ArgosTranslation/Dockerfile line 37 at r3 (raw file):

RUN argospm update

I'm running into Docker build issues for the latest version of this component. @mcrensh please confirm if you can build the latest images with tests passing. I will look into the issue on my end as well.


python/ArgosTranslation/tests/test_argos_translation.py line 215 at r3 (raw file):

                1: mpf.ImageLocation(0, 10, 10, 10, -1, dict(TRANSCRIPT=SPANISH_SHORT_SAMPLE, LANGUAGE='ES'))
            },
            dict(TEXT=SPANISH_SHORT_SAMPLE, LANGUAGE='ES'))

Consider switching one of the input language codes to ISO-639-2 to confirm that the iso mapper is working as expected: i.e. "ES" -> "ESP"

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed docker build issues. @mcrensh please review changes and let me know if you have any questions.

Reviewable status: 11 of 13 files reviewed, all discussions resolved (waiting on @mcrensh)


python/ArgosTranslation/Dockerfile line 37 at r3 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

I'm running into Docker build issues for the latest version of this component. @mcrensh please confirm if you can build the latest images with tests passing. I will look into the issue on my end as well.

Updated, fixed build issues.


python/ArgosTranslation/tests/test_argos_translation.py line 215 at r3 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Consider switching one of the input language codes to ISO-639-2 to confirm that the iso mapper is working as expected: i.e. "ES" -> "ESP"

Correction: "ES" -> "SPA".

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through, it doesn't seem that we've added/requested Brian or Jeff to look at this yet, so I've added them now.

Reviewable status: 11 of 13 files reviewed, 1 unresolved discussion (waiting on @brosenberg42, @jrobble, and @mcrensh)


python/ArgosTranslation/Dockerfile line 42 at r5 (raw file):

    argospm install translate-fr_en && \
    argospm install translate-ru_en && \
    argospm install translate-es_en

As Brian noted in the other channel, some environments can't allow for model downloads.

Although the zh_en model isn't great, I'd request that it be added anyway if we need to check back on it in the future.

Copy link
Contributor

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 13 files reviewed, all discussions resolved (waiting on @brosenberg42 and @jrobble)


python/ArgosTranslation/Dockerfile line 42 at r5 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

As Brian noted in the other channel, some environments can't allow for model downloads.

Although the zh_en model isn't great, I'd request that it be added anyway if we need to check back on it in the future.

If there are other languages of interest that are supported, we should consider adding them also. Thanks!

Copy link
Member

@brosenberg42 brosenberg42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 13 files reviewed, 8 unresolved discussions (waiting on @hhuangMITRE, @jrobble, and @mcrensh)


python/ArgosTranslation/README.md line 23 at r6 (raw file):

All translations are either to English or from English. When trying to translate from one non-English language to another, Argos will automatically pivot between languages using the currently installed packages. For example, for Spanish->French Argos would pivot from Spanish->English to English->French. This is associated with a drop in accuracy and increase in runtime. 

Lanuage packages are downloaded dynamically as needed. In addition, when building a Docker image the Dockerfile pre-installs German, French, Russian, and Spanish.

Language is spelled wrong here.


python/ArgosTranslation/setup.cfg line 29 at r6 (raw file):

[metadata]
name = ArgosTranslation
version = 0.1

Change this to 7.2.


python/ArgosTranslation/setup.cfg line 35 at r6 (raw file):

install_requires =
    mpf_component_api>=7.1
    mpf_component_util>=7.1

Change these to 7.2


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 127 at r6 (raw file):

                properties=job_props,
                key='LANGUAGE_FEED_FORWARD_PROP',
                default_value='DECODED_LANGUAGE,LANGUAGE',

This should be: default_value='ISO_LANGUAGE,DECODED_LANGUAGE,LANGUAGE'


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 172 at r6 (raw file):

        }

        self._translation_cache: Dict[str, (str, str)] = {}

This should be: self._translation_cache: Dict[str, Tuple[str, str]] = {}


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 32 at r6 (raw file):

          "description": "Comma-separated list of property names indicating which properties in the feed-forward track or detection determine the language from which to translate. If the first property listed is present, then that property will be used. If it's not, then the next property in the list is considered. If none are present, fall back to FROM_LANGUAGE.",
          "type": "STRING",
          "defaultValue": "DECODED_LANGUAGE,LANGUAGE,ISO_LANGUAGE"

This is in the wrong order. It should be: ISO_LANGUAGE,DECODED_LANGUAGE,LANGUAGE


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 38 at r6 (raw file):

          "description": "Optional property that indicates the source language of the text.",
          "type": "STRING",
          "defaultValue": "es"

I don't think it makes sense to assume the input text is in Spanish. I think the default value should be "". The component should raise mpf.DetectionError.MISSING_PROPERTY.exception("<description>") when there isn't a match for LANGUAGE_FEED_FORWARD_PROP and DEFAULT_SOURCE_LANGUAGE is empty.


python/ArgosTranslation/tests/test_argos_translation.py line 235 at r6 (raw file):

        detection2 = result.frame_locations[1]
        self.assertEqual(SPANISH_SHORT_SAMPLE, detection2.detection_properties['TRANSCRIPT'])
        self.assertEqual(SHORT_OUTPUT, detection2.detection_properties['TRANSLATION'])

These tests don't currently run as part of the docker build. To get them to run, you need to add:

if __name__ == '__main__':
    unittest.main(verbosity=2)

After adding that and running the docker build, the unit tests fail. They also fail when I run them outside of docker.

Copy link
Contributor Author

@mcrensh mcrensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 13 files reviewed, 8 unresolved discussions (waiting on @brosenberg42, @hhuangMITRE, and @jrobble)


python/ArgosTranslation/Dockerfile line 42 at r5 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

If there are other languages of interest that are supported, we should consider adding them also. Thanks!

I've added Arabic as a downloaded model


python/ArgosTranslation/README.md line 23 at r6 (raw file):

Previously, brosenberg42 wrote…

Language is spelled wrong here.

Done.


python/ArgosTranslation/setup.cfg line 29 at r6 (raw file):

Previously, brosenberg42 wrote…

Change this to 7.2.

Done.


python/ArgosTranslation/setup.cfg line 35 at r6 (raw file):

Previously, brosenberg42 wrote…

Change these to 7.2

Done.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 127 at r6 (raw file):

Previously, brosenberg42 wrote…

This should be: default_value='ISO_LANGUAGE,DECODED_LANGUAGE,LANGUAGE'

Done.


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 172 at r6 (raw file):

Previously, brosenberg42 wrote…

This should be: self._translation_cache: Dict[str, Tuple[str, str]] = {}

Done.


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 32 at r6 (raw file):

Previously, brosenberg42 wrote…

This is in the wrong order. It should be: ISO_LANGUAGE,DECODED_LANGUAGE,LANGUAGE

Done.


python/ArgosTranslation/plugin-files/descriptor/descriptor.json line 38 at r6 (raw file):

Previously, brosenberg42 wrote…

I don't think it makes sense to assume the input text is in Spanish. I think the default value should be "". The component should raise mpf.DetectionError.MISSING_PROPERTY.exception("<description>") when there isn't a match for LANGUAGE_FEED_FORWARD_PROP and DEFAULT_SOURCE_LANGUAGE is empty.

Added this behavior and added a test for it


python/ArgosTranslation/tests/test_argos_translation.py line 215 at r3 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Correction: "ES" -> "SPA".

Switched second one to SPA


python/ArgosTranslation/tests/test_argos_translation.py line 235 at r6 (raw file):

Previously, brosenberg42 wrote…

These tests don't currently run as part of the docker build. To get them to run, you need to add:

if __name__ == '__main__':
    unittest.main(verbosity=2)

After adding that and running the docker build, the unit tests fail. They also fail when I run them outside of docker.

Added. The Chinese model seems to have been updated, updated the expected translation output for Chinese translation

Copy link
Member

@brosenberg42 brosenberg42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a PR in the openmp-docker repo to add argos to docker-compose.components.yml.

Reviewable status: 7 of 13 files reviewed, 2 unresolved discussions (waiting on @hhuangMITRE, @jrobble, and @mcrensh)


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 227 at r8 (raw file):

            if self._from_lang == "":
                raise mpf.DetectionError.MISSING_PROPERTY.exception("LANGUAGE_FEED_FORWARD_PROP mismatch and no DEFAULT_SOURCE_LANGUAGE provided.")

Change this to:

raise mpf.DetectionError.MISSING_PROPERTY.exception(
    'None of the properties from "LANGUAGE_FEED_FORWARD_PROP" '
    f'({self._lang_prop_names}) were found in the feed forward track and no '
    '"DEFAULT_SOURCE_LANGUAGE" was provided.')

Copy link
Member

@brosenberg42 brosenberg42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 11 files at r1, 1 of 6 files at r2, 3 of 8 files at r3, 3 of 6 files at r7, 1 of 2 files at r8, 2 of 2 files at r9, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jrobble)


python/ArgosTranslation/argos_translation_component/argos_translation_component.py line 208 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Not sure if we should return detection failed or SKIPPED TRANSLATION, so I'm just leaving a comment here to discuss later.

SKIPPED TRANSLATION is used when the translation does not occur because the text is already in the target language. Raising the exception here is the correct behavior.

@brosenberg42 brosenberg42 self-requested a review September 1, 2023 13:00
Copy link
Member

@brosenberg42 brosenberg42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @hhuangMITRE from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jrobble)

@mcrensh mcrensh merged commit 77ae9c3 into develop Sep 1, 2023
@mcrensh mcrensh deleted the feature/argos-translate branch September 1, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants