MODLD-586: LCCN Normalization | Auto-add white spaces#64
MODLD-586: LCCN Normalization | Auto-add white spaces#64askhat-abishev merged 3 commits intomasterfrom
Conversation
src/main/java/org/folio/linked/data/validation/dto/LccnPatternValidator.java
Outdated
Show resolved
Hide resolved
src/test/java/org/folio/linked/data/preprocessing/lccn/SpaceAdderStructureaTest.java
Outdated
Show resolved
Hide resolved
fd1a425 to
5954251
Compare
| package org.folio.linked.data.preprocessing.lccn; | ||
|
|
||
| @FunctionalInterface | ||
| public interface LccnNormalizer<T> { |
There was a problem hiding this comment.
Do we need to overcomplicate this by adding <T>? LCCN is always going to be a string. So, instead of T, we can hardcode the type as String right? Or do you foresee any need to support other datatypes for LCCN in future?
There was a problem hiding this comment.
Yes, I thought that in future we can normalize something other than just String. It should have been called just Normalizer but I forgot to rename it properly.
|
|
||
| public LccnPatternValidator(SpecProvider specProvider, List<LccnNormalizer<String>> lccnNormalizers) { | ||
| this.specProvider = specProvider; | ||
| this.lccnNormalizer = lccnNormalizers.stream().reduce(LccnNormalizer.identity(), LccnNormalizer::andThen); |
There was a problem hiding this comment.
Hi @askhat-abishev,
Although this code will work, I think we can improve it. Currently, we first apply the StructureA normalizer and then pass the output to the StructureB normalizer (or vice versa). While this approach might work in this case, it isn’t logically correct. We should apply only one of the normalizers, determined by the pattern of the incoming LCCN.
So, I think a more clean structure for LccnNormalizer is as follows
public interface LccnNormalizer extends Predicate<String>, UnaryOperator<String> {
// Return normalized LCCN value if it is valid, otherwise return empty Optional
default Optional<String> normalize(String t) {
if (this.test(t)) { // In `test` method, check if LCCN's pattern match corresponding regex
return Optional.of(this.apply(t)); // Do actual normalization in `apply` method.
}
return Optional.empty();
}
}
Then apply the normalization in LccnPatternValidator as follows
private String normalize(String lccn) {
return lccnNormalizers
.stream()
.flatMap(normalizer -> normalizer.normalize(lccn).stream())
.findFirst()
.orElse(lccn);
}
What do you think?
| import java.util.regex.Pattern; | ||
| import org.folio.linked.data.preprocessing.lccn.LccnNormalizer; | ||
|
|
||
| public abstract class AbstractSpaceAdder implements LccnNormalizer<String> { |
There was a problem hiding this comment.
minor - AbstractLccnNormalizer is a better name I think.
Similarly, LccnNormalizerStructrueA and LccnNormalizerStructrueB
e395f92 to
b5bcf2b
Compare
|



No description provided.