Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Apr 22, 2018

the first change prevents checking the null byte (which is not secure when it occurs inside the string and should be changed anyways).

the second change removes a tautologic check:
at line 1905 len1 is still always <= len2 because the same operations where done to both and at the begin len1 is also definetly <= len2.
Especially this means that if len1 > 0 also len2 > 0 is always true.

the first change prevents checking the null byte (which is not secure when it occurs inside the string and should be changed anyways).

the second change removes a tautologic check:
at line 1905 len1 is still always <= len2 because the same operations where done to both and at the begin len1 is also definetly <= len2.
Especially this means that if `len1 > 0` also `len2 > 0` is always true.
dec(len1)
dec(len2)
# strip common suffix:
while len1 > 0 and len2 > 0 and a[s+len1-1] == b[s+len2-1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this check. Invariant or not, this check makes things easier to read / ensure it's correct / maintainable.

@data-man
Copy link
Contributor

Please, test this:

proc editDistanceNew*(a, b: string): int =
  ## Returns the edit distance between `a` and `b`.
  ##
  ## This uses the `Levenshtein`:idx: distance algorithm with only a linear
  ## memory overhead.  This implementation is highly optimized!
  var 
    aLen = a.len
    bLen = b.len
  if aLen > bLen:
    # make `b` the longer sequence
    return editDistanceNew(b, a)

  # skip common prefix:
  var s = 0
  while s < aLen and a[s] == b[s]:
    inc(s)
    dec(aLen)
    dec(bLen)

  # skip common suffix:
  while aLen > 0 and bLen > 0 and a[s+aLen-1] == b[s+bLen-1]:
    dec(aLen)
    dec(bLen)

  # trivial cases:
  if aLen == 0: return bLen
  if bLen == 0: return aLen

  # another special case:
  if aLen == 1:
    for j in s..s+bLen-1:
      if a[s] == b[j]: return bLen - 1

    return bLen

  inc(aLen)
  inc(bLen)
  var row: seq[int]
  newSeq(row, bLen)

  for i in 0..bLen - 1: row[i] = i

  for i in 1 .. aLen- 1:
    var char1 = a[s + i - 1]
    var prevCost = i - 1
    var newCost = i

    for j in 1 .. bLen - 1:
      var char2 = b[s + j - 1]

      if char1 == char2:
        newCost = prevCost
      else:
        newCost = min(newCost, min(prevCost, row[j])) + 1

      prevCost = row[j]
      row[j] = newCost

  result = row[bLen - 1]

@Araq
Copy link
Member

Araq commented May 6, 2018

This was done in the "access the null terminator is now error" commits.

@Araq Araq closed this May 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants