## Introduction to Minimum Edit Distance
Minimum Edit Distance is a way to measure how different two strings are. It tells us how many changes (insertions, deletions, or substitutions) are needed to turn one string into another. This is useful in applications like spell checkers, speech recognition, and DNA sequence analysis.

---

## Sentence Segmentation: Delimiters and Punctuation
When processing text, we need to break it into sentences. This is done using delimiters—special characters that indicate sentence boundaries.

### **Unambiguous Delimiters**
Some punctuation marks clearly indicate the end of a sentence:
- **Exclamation mark (!)**
- **Question mark (?)**

**For example:**
- "Where are you going?" (Ends with '?', so it's a sentence)
- "That's amazing!" (Ends with '!', so it's a sentence)

### **Ambiguous Delimiters**
The period ('.') is sometimes tricky because it can be part of abbreviations, names, or numbers.

**Example of ambiguity:**
- "Dr. Smith is here." ('Dr.' is an abbreviation, not the end of a sentence)
- "The U.S.A. is a country." (Periods appear within the name)

A system must distinguish between real sentence boundaries and abbreviations.

---

## Coreference Resolution
Coreference occurs when multiple words refer to the same thing.

**Example:** "Stanford Arizona Cactus Garden" and "Stanford University Arizona Cactus Garden" might refer to the same place.

AI models need to understand that these names might represent the same entity in a given context.

---

## Minimum Edit Distance and String Similarity
The Minimum Edit Distance helps compare two strings by counting the number of edits (insertions, deletions, substitutions) needed to transform one string into another.

### **Levenshtein Distance**
A common way to measure Minimum Edit Distance is the Levenshtein Distance, where:
- Inserting a character costs **1**
- Deleting a character costs **1**
- Substituting one character for another costs **2** (unless they are the same, in which case it costs **0**)

**Example: Changing 'INTENTION' to 'EXECUTION'**

We can transform "INTENTION" into "EXECUTION" using the following steps:

- Substitute 'I' → 'E' (cost **2**)
- Substitute 'N' → 'X' (cost **2**)
- Substitute 'T' → 'C' (cost **2**)
- Insert 'U' (cost **1**)
- Insert 'T' (cost **1**)

**Total edit distance = 8**

---

In [3]:
# Example: Computing Minimum Edit Distance (Levenshtein Distance) in Python
import numpy as np
def min_edit_distance(str1, str2):
    m, n = len(str1), len(str2)
    dp = np.zeros((m+1, n+1))
    for i in range(m+1):
        for j in range(n+1):
            if i == 0:
                dp[i][j] = j
            elif j == 0:
                dp[i][j] = i
            elif str1[i-1] == str2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = min(dp[i-1][j] + 1,  # Deletion
                              dp[i][j-1] + 1,  # Insertion
                              dp[i-1][j-1] + 2)  # Substitution
    return dp[m][n]

# Example usage:
print(min_edit_distance('INTENTION', 'EXECUTION'))  # Output: 8

8.0


## Applications of Minimum Edit Distance
- **Spell Checkers** – Suggesting corrections for misspelled words (e.g., "recieve" → "receive").
- **Speech Recognition** – Comparing spoken words to written text.
- **DNA Sequence Analysis** – Identifying similarities in genetic sequences.
- **Plagiarism Detection** – Checking similarity between texts.

---

## Conclusion
Minimum Edit Distance is a powerful tool in Natural Language Processing (NLP) and computational linguistics. By understanding insertions, deletions, and substitutions, AI systems can improve text processing tasks like spell-checking, speech-to-text conversion, and more.

Which application do you find most useful? Let’s discuss! 🚀