Question re mixed alpha and numeric needles #9

HamptonNorth · 2022-10-27T11:10:59Z

I have option interLeft set to 2 (strict) so searching starts at beginning of needles/words.

An example UK post codes (think ZIP code) is CW3 5BQ. uFuzzy treats 'CW3' as 2 needles 'CW' and '3'. It fails to find the complete postcode string 'CW3 5BQ 'and also fails to find '5bq'.

Is there any option to stop splitting mixed alpha and numric words into multiple needles

Test string:

[
  "Line with UK postcodes. A typical UK post code is CW3 5BQ. Some numbers 3 5 53 and 7. Some letters C, w, wW, W, WC and CW.  I should be able to match CW3 but not CW5 and I should be able to match 5BQ "
]

The text was updated successfully, but these errors were encountered:

leeoniya · 2022-10-27T13:11:11Z

Is there any option to stop splitting mixed alpha and numric words into multiple needles

there is an undocumented option for how terms can be split, though setting it to null or '' likely wouldnt work. you can probably provide some non-regex punct char so it never matches, like ~. i'll push a fix in a bit that allows it to be an empty string or null to skip this.

uFuzzy/dist/uFuzzy.d.ts

Line 68 in 07dcd4c

intraSplit?: PartialRegExp; // '[A-Za-z][0-9]|[0-9][A-Za-z]|[a-z][A-Z]'

it's a good question whether setting interLft and/or interRgt to strict should automatically skip term splitting. i don't think so because the inter/intra terminology is relative to the supplied terms, so they would have to be intraLft and intraRgt.

side note:
in your example i don't think you need to set interLft to strict though. even if it internally splits the term into two, they can still be immediately adjacent in the match. the limit on that adjacency is interIns, so if you set that to 0, it should match cw3, even if internally it's represented as cw 3, though the splitting could have additional undesirable effects on rank order and match strictness of non-postal-code terms.

leeoniya · 2022-10-27T13:57:27Z

f67efb2 should allow intraSplit to be '' or null to prevent term splitting.

it also adds a new intraBound option that's used for the "boosting" aspects of matching any terms as substrings at those case-change and alpha-num boundaries. this way it can be opted out of separately from splitting.

HamptonNorth · 2022-10-27T15:25:12Z

Added intraSplit to my options, set to ''

My UK postcode search all works - thank you

leeoniya added the question Further information is requested label Oct 27, 2022

HamptonNorth closed this as completed Oct 27, 2022

leeoniya mentioned this issue Oct 31, 2022

Add a changelog #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question re mixed alpha and numeric needles #9

Question re mixed alpha and numeric needles #9

HamptonNorth commented Oct 27, 2022 •

edited by leeoniya

leeoniya commented Oct 27, 2022

leeoniya commented Oct 27, 2022 •

edited

HamptonNorth commented Oct 27, 2022

Question re mixed alpha and numeric needles #9

Question re mixed alpha and numeric needles #9

Comments

HamptonNorth commented Oct 27, 2022 • edited by leeoniya

leeoniya commented Oct 27, 2022

leeoniya commented Oct 27, 2022 • edited

HamptonNorth commented Oct 27, 2022

HamptonNorth commented Oct 27, 2022 •

edited by leeoniya

leeoniya commented Oct 27, 2022 •

edited