Skip to content

MODLD-586: LCCN Normalization | Auto-add white spaces#64

Merged
askhat-abishev merged 3 commits intomasterfrom
MODLD-586
Dec 5, 2024
Merged

MODLD-586: LCCN Normalization | Auto-add white spaces#64
askhat-abishev merged 3 commits intomasterfrom
MODLD-586

Conversation

@askhat-abishev
Copy link
Contributor

No description provided.

package org.folio.linked.data.preprocessing.lccn;

@FunctionalInterface
public interface LccnNormalizer<T> {
Copy link
Contributor

@pkjacob pkjacob Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to overcomplicate this by adding <T>? LCCN is always going to be a string. So, instead of T, we can hardcode the type as String right? Or do you foresee any need to support other datatypes for LCCN in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought that in future we can normalize something other than just String. It should have been called just Normalizer but I forgot to rename it properly.


public LccnPatternValidator(SpecProvider specProvider, List<LccnNormalizer<String>> lccnNormalizers) {
this.specProvider = specProvider;
this.lccnNormalizer = lccnNormalizers.stream().reduce(LccnNormalizer.identity(), LccnNormalizer::andThen);
Copy link
Contributor

@pkjacob pkjacob Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @askhat-abishev,
Although this code will work, I think we can improve it. Currently, we first apply the StructureA normalizer and then pass the output to the StructureB normalizer (or vice versa). While this approach might work in this case, it isn’t logically correct. We should apply only one of the normalizers, determined by the pattern of the incoming LCCN.

So, I think a more clean structure for LccnNormalizer is as follows

public interface LccnNormalizer extends Predicate<String>, UnaryOperator<String> {
  // Return normalized LCCN value if it is valid, otherwise return empty Optional
  default Optional<String> normalize(String t) {
    if (this.test(t)) { // In `test` method, check if LCCN's pattern match corresponding regex
      return Optional.of(this.apply(t)); // Do actual normalization in `apply` method.
    }
    return Optional.empty();
  }
}

Then apply the normalization in LccnPatternValidator as follows

  private String normalize(String lccn) {
    return lccnNormalizers
      .stream()
      .flatMap(normalizer -> normalizer.normalize(lccn).stream())
      .findFirst()
      .orElse(lccn);
  }

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.util.regex.Pattern;
import org.folio.linked.data.preprocessing.lccn.LccnNormalizer;

public abstract class AbstractSpaceAdder implements LccnNormalizer<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor - AbstractLccnNormalizer is a better name I think.
Similarly, LccnNormalizerStructrueA and LccnNormalizerStructrueB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 5, 2024

@askhat-abishev askhat-abishev merged commit 4000b75 into master Dec 5, 2024
@askhat-abishev askhat-abishev deleted the MODLD-586 branch December 5, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants