Skip to content

Conversation

Chris7C
Copy link
Contributor

@Chris7C Chris7C commented May 7, 2024

Copy link
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1.
Reviewable status: 1 of 4 files reviewed, 4 unresolved discussions (waiting on @Chris7C)


python/TransformerTagging/README.md line 15 at r1 (raw file):

are called "trigger sentences". These sentences are grouped by "tag" based on which entry 
in the corpus they matched against.

Needs to be added to descriptor.json.


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 41 at r1 (raw file):

from nltk.tokenize.punkt import PunktSentenceTokenizer
import pandas as pd
import re

Not used.


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 152 at r1 (raw file):

            # split input sentence further on newline or carriage return if flag is set
            if (config.split_on_newline):
                for new_sentence in probe_str.splitlines(keepends=True):

Can you just set probe_list = probe_str.splitlines(keepends=True) ?


python/TransformerTagging/transformer_tagging_component/transformer_tagging_component.py line 157 at r1 (raw file):
I took a closer look at KeywordTagging and found how it trims the whitespace before and after the trigger word: https://github.com/openmpf/openmpf-components/blob/develop/cpp/KeywordTagging/KeywordTagging.cpp#L159

I think we should do that in TransformerTagging for consistency. I found this post that talks about counting the leading and trailing chars to strip by using lstrip() and rstrip(): https://stackoverflow.com/questions/52581881/keeping-track-of-number-of-characters-removed-in-strings-python-strip/52581987#52581987


Related, I think I mentioned that we should strip whitespace from the probe before attempting to get a match score. Consider this:

>>> test =  '     \n   B   '
>>> out = test.splitlines(keepends=True)
>>> out
['     \n', '   B   ']
>>> len(out[0].strip())
0

In this case ' \n' isn't even worth using since the stripped version has a length of 0.

Copy link
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Chris7C)

Copy link
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Chris7C)

@jrobble jrobble merged commit d7629cd into develop May 8, 2024
@jrobble jrobble deleted the feat/transformer-tagging-newlines branch May 8, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants