Skip to content

Deduplicate kbts#551

Merged
johnml1135 merged 2 commits intomainfrom
deduplicate_kbts
Dec 3, 2024
Merged

Deduplicate kbts#551
johnml1135 merged 2 commits intomainfrom
deduplicate_kbts

Conversation

@Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Dec 2, 2024

Fixes #538


This change is Reviewable

@johnml1135
Copy link
Collaborator

src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 105 at r1 (raw file):

                        .AlignRows(targetTermCorpora.ChooseFirst());
                    foreach (
                        ParallelTextRow row in parallelKeyTermsCorpus.DistinctBy(row => row.SourceText + row.TargetText)

Should this be a tuple rather than a concatenated string?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 105 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Should this be a tuple rather than a concatenated string?

I went ahead and switched it. I'm not sure what's most efficient under the hood.

@johnml1135
Copy link
Collaborator

src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs line 105 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I went ahead and switched it. I'm not sure what's most efficient under the hood.

It was more about the off chance that there could be a conflict where there would be "hello world " + "day" and "hello " + "world day". Not likely, but the tuple solution appears cleaner to me.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit aa7f31f into main Dec 3, 2024
@johnml1135 johnml1135 deleted the deduplicate_kbts branch December 3, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicated Key Terms

2 participants