In this study, we aim for the algorithm to step in and correct the errors in case of any missing letter or word drawing problems or grammatical errors.



*   **Punctuatıon Correction:**  To correct punctuation marks. 
*   **Grammar Correction:** To correct grammatical mistakes. 
*   **Spell Checker:** To correct word mistakes 



In [1]:
!pip install happytransformer 
!pip install textblob
!pip install sentencepiece
!pip install transformers



In [2]:
from happytransformer import HappyTextToText
from happytransformer import TTSettings
from nltk.tokenize import sent_tokenize, word_tokenize
from textblob import TextBlob
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
import re
import string


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\mirza\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mirza\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


#  **PUNCTUATION CORRECTION**

*   Our goal here is to correct punctuation errors in the text.

### Process;

1.   In order to make maximum correction in the intro text, we separate the text **by words** (Tokenize).
2.   Create a pipeline, detect and label with the "ner" variable, define our model with the model variable, and define tokenize variable and set up our model.
3. Finally,  arrange the tokenized text with the determined punctuation marks and obtain the final version of our text.



In [3]:
tokenizer = AutoTokenizer.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large") #Tokenize model

In [4]:
model = AutoModelForTokenClassification.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large") #Model

- Both models you see above are from the Hugging Face (Open source AI community) page
- For information about the model. [https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large] 

In [5]:
# 1 and 2

raw_text = "He are moving here I am doing fine How is you? How is go your day? Matt like fish the collection of letters was original used by the ancient Romans"

punctation_model = pipeline('ner', model=model, tokenizer=tokenizer) # Create  model with Pipeline.
processed_text = punctation_model(raw_text) # Apply the model and pass it to the new variable.
processed_text

[{'entity': '0',
  'score': 0.99992776,
  'index': 1,
  'word': '▁He',
  'start': 0,
  'end': 2},
 {'entity': '0',
  'score': 0.9998388,
  'index': 2,
  'word': '▁are',
  'start': 2,
  'end': 6},
 {'entity': '0',
  'score': 0.9687802,
  'index': 3,
  'word': '▁moving',
  'start': 6,
  'end': 13},
 {'entity': '.',
  'score': 0.69415295,
  'index': 4,
  'word': '▁here',
  'start': 13,
  'end': 18},
 {'entity': '0',
  'score': 0.9999883,
  'index': 5,
  'word': '▁I',
  'start': 18,
  'end': 20},
 {'entity': '0',
  'score': 0.9999676,
  'index': 6,
  'word': '▁am',
  'start': 20,
  'end': 23},
 {'entity': '0',
  'score': 0.99926156,
  'index': 7,
  'word': '▁doing',
  'start': 23,
  'end': 29},
 {'entity': '.',
  'score': 0.8090562,
  'index': 8,
  'word': '▁fine',
  'start': 29,
  'end': 34},
 {'entity': '0',
  'score': 0.9999875,
  'index': 9,
  'word': '▁How',
  'start': 34,
  'end': 38},
 {'entity': '0',
  'score': 0.9999906,
  'index': 10,
  'word': '▁is',
  'start': 38,
  'end': 41},

In [6]:
#3

ripe_text = '' # Create a variable to assign the operation that takes place in the for loop.

for i in processed_text:
  result = i['word'].replace('▁',' ') + i['entity'].replace('0','') # Place punctuation marks for each word.
  ripe_text += result
  print(result)

 He
 are
 moving
 here.
 I
 am
 doing
 fine.
 How
 is
 you?
??
 How
 is
 go
 your
 day
??
 Matt,
 like
 fish.
 the
 collection
 of
 letters
 was
 original
 used
 by
 the
 an
cient
 Roman.
s.


In [7]:
print(ripe_text)

 He are moving here. I am doing fine. How is you??? How is go your day?? Matt, like fish. the collection of letters was original used by the ancient Roman.s.


# **PUNCTUATION CORRECTION TEST**

In [8]:
raw_text = "while you made the first two of the contract payments as agreed you have refused to make the fincel $3 600 payment"

punctation_model = pipeline('ner', model=model, tokenizer=tokenizer) 
processed_text = punctation_model(raw_text) 

ripe_text = '' 

for i in processed_text:
  result = i['word'].replace('▁',' ') + i['entity'].replace('0','') 
  ripe_text += result
print(ripe_text)

 while you made the first two of the contract payments as agreed, you have refused to make the fincel $3 600 payment.


# **GRAMMAR CORRECTION**

* Our aim here is to correct grammatical mistakes in the text.

### Process;

1. In order to make maximum correction in the introductory text, we separate the text by sentences (Tokenize).
2. Make predictions for each of the separated sentences with the for loop.
3. Combine the predicted sentences to put them back together.

In [9]:
happy_tt = HappyTextToText("T5", "vennify/t5-base-grammar-correction") #Model

02/20/2022 15:42:02 - INFO - happytransformer.happy_transformer -   Using model: cpu


In [10]:
raw_text = "He are moving here. I am doing fine. How is you?. How is go your they?. Matt like fish. the collection of letters was original used by the ancient Romans." #Tahminleme için kullanacağımız metnimiz
print(raw_text)

He are moving here. I am doing fine. How is you?. How is go your they?. Matt like fish. the collection of letters was original used by the ancient Romans.


In [11]:
#1

text_tokenize = sent_tokenize(raw_text) # Here separate (Tokenize) the sentences in the text.
#What is important to note here is that the model pays attention to punctuation when tokenizing sentences. 
#If punctuation marks are incorrect or absent, efficiency is reduced.

print(text_tokenize)

['He are moving here.', 'I am doing fine.', 'How is you?.', 'How is go your they?.', 'Matt like fish.', 'the collection of letters was original used by the ancient Romans.']


In [12]:
#2

args = TTSettings(num_beams=5, min_length=1) #Use it so that we can fine-tune model.

processed_text= [] # Create a variable to assign the operation that takes place in the for loop.

for i in range(0, len(text_tokenize)):
  modeled_text= happy_tt.generate_text(text_tokenize[i], args=args) 
  processed_text.append(modeled_text.text)
  print(modeled_text.text)


He is moving here.
I am doing fine.
How are you?
How is it going?
Matt likes fish.
The collection of letters was originally used by the ancient Romans.


In [13]:
#3

ripe_text= " ".join(map(str, processed_text)) # Reinstating tokenized and processed text.
print(ripe_text)

He is moving here. I am doing fine. How are you? How is it going? Matt likes fish. The collection of letters was originally used by the ancient Romans.


In [19]:
!pip install wrapt
!pip install anyio
!pip install gradio
import gradio as gr



# **GRAMMAR CORRECTION TEST**

In [14]:
raw_text = "I an Mirza, whose passport is number U12345. I would to litter travel your country between dates the 28.08.2022- - 05.09.2022 for sghtSeing and vacton."

text_tokenize = sent_tokenize(raw_text)

args = TTSettings(num_beams=5, min_length=1)

processed_text= []

for i in range(0, len(text_tokenize)):
  modeled_text= happy_tt.generate_text(text_tokenize[i], args=args) 
  processed_text.append(modeled_text.text)

ripe_text= " ".join(map(str, processed_text)) 
print(ripe_text)

I am Mirza, whose passport number is U12345. I would travel your country between the 28.08.2022- - 05.09.2022 for sightseeing and vacation.


# **SPELL CHECKER**
* Our goal here is to correct punctuation errors in the text.

### Process;

1. In order to make maximum correction in the intro text, we separate the text by words and remove the punctuation marks (Tokenize).
2. Apply the words of the seperated text to our model.
3. Put the words passing through the model back together



In [15]:
# 1 ve 2

unpunc_model = re.compile("["+ re.escape(string.punctuation) + "]") # Adjust our model to remove punctuation

unpunc_text = unpunc_model.sub("", raw_text) # Removing punctuation marks from the intro text.

tokenize_text = word_tokenize(unpunc_text)# Separated the words of the text  the punctuation removed.

In [16]:
tokenize_text

['I',
 'an',
 'Mirza',
 'whose',
 'passport',
 'is',
 'number',
 'U12345',
 'I',
 'would',
 'to',
 'litter',
 'travel',
 'your',
 'country',
 'between',
 'dates',
 'the',
 '28082022',
 '05092022',
 'for',
 'sghtSeing',
 'and',
 'vacton']

In [17]:
# 3

raw_text = "I am doig fine. How is you?. How is go your thy?. Matt like fsh. the colection of leters was original used by the ancient Romans."

processed_text = [] # Create a variable to assign the operation that takes place in the for loop.

for i in range(0, len(tokenize_text)):
  modeled_text= TextBlob(tokenize_text[i]) #Create mdodel
  correct_word= modeled_text.correct() # Implementing model
  processed_text.append(correct_word) #Transfer the predicted words to the variable we specified.


ripe_text= " ".join(map(str, processed_text)) #Combine the words in the variable passed.
print(ripe_text)

I an Erza whose passport is number U12345 I would to litter travel your country between dates the 28082022 05092022 for sghtSeing and action


# **SPELL CHECKER TEST**

In [18]:
unpunc_model = re.compile("["+ re.escape(string.punctuation) + "]") 

unpunc_text = unpunc_model.sub("", raw_text) 

tokenize_text = word_tokenize(unpunc_text)

raw_text = "while you made the first two of the contract payments as agreed you have refused to make the fincel $3 600 payment"

processed_text = [] 

for i in range(0, len(tokenize_text)):
  modeled_text= TextBlob(tokenize_text[i]) 
  correct_word= modeled_text.correct() 
  processed_text.append(correct_word) 


ripe_text= " ".join(map(str, processed_text)) 

print(ripe_text)

I am doing fine Now is you Now is go your thy Watt like fish the collection of letters was original used by the ancient Romans
