# 🔤 Autocorrector Feature Using NLP in Python
This Jupyter Notebook demonstrates how to implement an autocorrector using Natural Language Processing (NLP).

## 📌 Overview
The notebook follows these key steps:
- **Data Preprocessing**: Tokenization, text cleaning, and frequency analysis.
- **Error Correction**: Implementing spelling correction techniques.
- **Evaluation**: Testing and improving the model.

## 🛠 Requirements
- Python 3
- NLTK / TextBlob / SymSpell (any suitable NLP library)
- Jupyter Notebook


## 📂 Importing Required Libraries
Let's start by importing the necessary libraries for NLP processing and autocorrection.

In [1]:
import nltk

## 📊 Data Preprocessing
In this section, we clean and preprocess the text data. This includes:
- Removing unnecessary symbols and punctuations
- Converting text to lowercase
- Tokenization and frequency analysis


In [2]:

import re

w = []

with open('final.txt', 'r', encoding="utf8") as f:
    file_name_data = f.read()
    file_name_data = file_name_data.lower()
    w = re.findall('\w+', file_name_data)

main_set = set(w)

  w = re.findall('\w+', file_name_data)


## 🔍 Implementing the Autocorrection Algorithm
We use NLP techniques to detect and correct misspelled words. Possible approaches include:
- **Edit Distance** (Levenshtein distance)
- **Probability-based corrections** (using word frequencies)
- **Pre-trained models** for contextual spelling correction

In [3]:

def counting_words(words):
	word_count = {}
	for word in words:
		if word in word_count:
			word_count[word] += 1
		else:
			word_count[word] = 1
	return word_count


## 🏆 Evaluating the Model
We test the autocorrector with sample sentences and analyze its accuracy. Further improvements can be made using deep learning-based language models like BERT or GPT.

In [4]:

def prob_cal(word_count_dict):
	probs = {}
	m = sum(word_count_dict.values())
	for key in word_count_dict.keys():
		probs[key] = word_count_dict[key] / m
	return probs


In [5]:

import nltk
from nltk.stem import WordNetLemmatizer

# Download necessary NLTK data
nltk.download("wordnet")
nltk.download("omw-1.4")

lemmatizer = WordNetLemmatizer()

def LemmWord(word):
    return [lemmatizer.lemmatize(wd) for wd in word.split()]




[nltk_data] Downloading package wordnet to C:\Users\Jagrat
[nltk_data]     Phugat\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to C:\Users\Jagrat
[nltk_data]     Phugat\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [6]:

def DeleteLetter(word):
	delete_list = []
	split_list = []

	for i in range(len(word)):
		split_list.append((word[0:i], word[i:]))

	for a, b in split_list:
		delete_list.append(a + b[1:])
	return delete_list


In [7]:

def Switch_(word):
	split_list = []
	switch_l = []

	for i in range(len(word)):
		split_list.append((word[0:i], word[i:]))

	switch_l = [a + b[1] + b[0] + b[2:] for a, b in split_list if len(b) >= 2]
	return switch_l


In [8]:
def Replace_(word):
	split_l = []
	replace_list = []

	for i in range(len(word)):
		split_l.append((word[0:i], word[i:]))
	alphs = 'abcdefghijklmnopqrstuvwxyz'
	replace_list = [a + l + (b[1:] if len(b) > 1 else '')
					for a, b in split_l if b for l in alphs]
	return replace_list


In [9]:
def insert_(word):
	split_l = []
	insert_list = []

	for i in range(len(word) + 1):
		split_l.append((word[0:i], word[i:]))

	alphs = 'abcdefghijklmnopqrstuvwxyz'
	insert_list = [a + l + b for a, b in split_l for l in alphs]
	return insert_list


In [10]:

def colab_1(word, allow_switches=True):
	colab_1 = set()
	colab_1.update(DeleteLetter(word))
	if allow_switches:
		colab_1.update(Switch_(word))
	colab_1.update(Replace_(word))
	colab_1.update(insert_(word))
	return colab_1

def colab_2(word, allow_switches=True):
	colab_2 = set()
	edit_one = colab_1(word, allow_switches=allow_switches)
	for w in edit_one:
		if w:
			edit_two = colab_1(w, allow_switches=allow_switches)
			colab_2.update(edit_two)
	return colab_2


In [11]:

def get_corrections(word, probs, vocab, n=2):
	suggested_word = []
	best_suggestion = []
	suggested_word = list(
		(word in vocab and word) or colab_1(word).intersection(vocab)
		or colab_2(word).intersection(
			vocab))

	best_suggestion = [[s, probs[s]] for s in list(reversed(suggested_word))]
	return best_suggestion


In [12]:

my_word = input("Enter any Word: ")

word_count = counting_words(main_set)

def probab_cal(word_count):
    total_words = sum(word_count.values())
    probs = {word: count/total_words for word, count in word_count.items()}
    return probs

probs = probab_cal(word_count)

tmp_corrections = get_corrections(my_word, probs, main_set, 2)
for i, word_prob in enumerate(tmp_corrections):
	if(i < 3):
		print(word_prob[0])
	else:
		break


marry
marrie


## 📝 Conclusion & Future Enhancements
This project successfully demonstrates a basic NLP-based autocorrector. Future improvements can include:
- Enhancing accuracy with larger text corpora
- Implementing deep learning models for contextual corrections
- Integrating the feature into real-world applications like chatbots