## **Installing Packages**

In [1]:
!pip install textblob
!pip install sentencepiece  
!pip install transformers
!pip install language-tool-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 5.3 MB/s 
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.97
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
[K     |████████████████████████████████| 5.3 MB 5.0 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 20.4 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.1-cp37-cp37m-manylinux_2_17_x86_64.man

Restart Runtime after installing 

## **Importing Packages**

In [46]:

import re
import nltk 
import spacy
import numpy as np
import pandas as pd
import seaborn as sn 
from textblob import Word
import matplotlib.pyplot as plt

from nltk.tag import pos_tag
from nltk.corpus import wordnet
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('stopwords')
nlp = spacy.load("en_core_web_sm")

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# **Reading Data From Google Drive**

In [3]:
from google.colab import drive
drive.mount('/content/drive')
Data_Essay_01 = pd.read_csv("/content/drive/MyDrive/IntelliTech-DataSet/EssaySet01.csv")
Data_Essay_01.head()

Mounted at /content/drive


Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score
0,1,"Dear local newspaper, I think effects computer...",4.0,4.0,8.0
1,2,"Dear @CAPS1 @CAPS2, I believe that using compu...",5.0,4.0,9.0
2,3,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",4.0,3.0,7.0
3,4,"Dear Local Newspaper, @CAPS1 I have found that...",5.0,5.0,10.0
4,5,"Dear @LOCATION1, I know having computers has a...",4.0,4.0,8.0


# **Feature Extraction**

## **Essay Pre Processing**

In [4]:
def Remove_NER(Essay):
  """
    Removes Named Entity Recognition (NER) from each essay

    Args:
      Sentence: Essay of each student 
    
    Returns: 
      String

  """
  return ' '.join (word for word in Essay.split(' ') if not word.startswith('@'))

def Remove_Punctuations(sentence):
  """
    Removes punctuations from text
    Args:
      sentence: Essay of each student
    
    Returns: 
      String
  """
  punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
  newSentence = ""
  for word in sentence:
      if (word in punctuations):
          newSentence = newSentence + " "
      else: 
          newSentence = newSentence + word
  return newSentence

def LowerCase_Words(Essay):
  """
    Lower case all the words in an essay

    Args:
      Sentence: Essay of each student
    
    Returns: 
      String
  """
  return re.sub('[0-9]+','', Essay).lower() 

def Tokenize_Essay(Essay):
    """
      Create Tokens of each Essay

      Args:
        Essay: Essay of each student
      
      Returns: 
        String
    """
    Preprocessed = Remove_Punctuations(Essay)
    return " ".join(word_tokenize(Preprocessed))

def Remove_White_Spaces(Essay):
  """
    Removes Extra White Spaces

    Args:
      Essay: Essay of each student
    
    Returns: 
      String
  """
  return " ".join(Essay.split())

Removing NERs, Punctuations and Lower Casing

In [5]:
Data_Essay_01['Preprocessed_Essay'] = Data_Essay_01['Essay'].apply(Remove_NER)
Data_Essay_01['Preprocessed_Essay'] = Data_Essay_01['Preprocessed_Essay'].apply(Tokenize_Essay)
Data_Essay_01.head()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay
0,1,"Dear local newspaper, I think effects computer...",4.0,4.0,8.0,Dear local newspaper I think effects computers...
1,2,"Dear @CAPS1 @CAPS2, I believe that using compu...",5.0,4.0,9.0,Dear I believe that using computers will benef...
2,3,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",4.0,3.0,7.0,Dear More and more people use computers but no...
3,4,"Dear Local Newspaper, @CAPS1 I have found that...",5.0,5.0,10.0,Dear Local Newspaper I have found that many ex...
4,5,"Dear @LOCATION1, I know having computers has a...",4.0,4.0,8.0,Dear I know having computers has a positive ef...


## **Basic Count Features**

#### 1. Counting Sentences per Essay

In [6]:
def Sentence_Count(Essay):
    """
    Counts sentences in an essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int
      
  """
    sentence_no = nltk.sent_tokenize(Essay)
    return len(sentence_no)
  
Data_Essay_01['Sent_Count'] = Data_Essay_01['Essay'].apply(Sentence_Count)
Data_Essay_01

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay,Sent_Count
0,1,"Dear local newspaper, I think effects computer...",4.0,4.0,8.0,Dear local newspaper I think effects computers...,16
1,2,"Dear @CAPS1 @CAPS2, I believe that using compu...",5.0,4.0,9.0,Dear I believe that using computers will benef...,20
2,3,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",4.0,3.0,7.0,Dear More and more people use computers but no...,14
3,4,"Dear Local Newspaper, @CAPS1 I have found that...",5.0,5.0,10.0,Dear Local Newspaper I have found that many ex...,27
4,5,"Dear @LOCATION1, I know having computers has a...",4.0,4.0,8.0,Dear I know having computers has a positive ef...,30
...,...,...,...,...,...,...,...
1778,1783,"Dear @CAPS1, @CAPS2 several reasons on way I t...",4.0,4.0,8.0,Dear several reasons on way I that advances in...,21
1779,1784,Do a adults and kids spend to much time on the...,3.0,4.0,7.0,Do a adults and kids spend to much time on the...,18
1780,1785,My opinion is that people should have computer...,4.0,4.0,8.0,My opinion is that people should have computer...,18
1781,1786,"Dear readers, I think that its good and bad to...",1.0,1.0,2.0,Dear readers I think that its good and bad to ...,1


#### 2. Counting Words per Essay

**Observation:** These word count are more than the original count coz of nltk tokenization. Punctations are treated as seperate words.


In [7]:
def Word_Count(Essay):
  """
    Counts words in an essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int
      
  """
  #cleaned_essay = re.sub('[^a-zA-Z]','',essay) 
  word_no = nltk.word_tokenize(Essay)
  return len(word_no)
 
Data_Essay_01['Word_Count'] = Data_Essay_01['Essay'].apply(Word_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay,Sent_Count,Word_Count
630,633,"Dear local newspaper, Computers have a good ef...",4.0,4.0,8.0,Dear local newspaper Computers have a good eff...,22,436


#### 3. Counting Characters per Essay

In [8]:
def Char_Count(Essay):
  """
    Counts characters in an essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int
      
  """
  #cleaned_essay = re.sub('[^a-zA-Z]',' ',Essay) 
  return len([character for character in Essay])

Data_Essay_01['Char_Count'] = Data_Essay_01['Essay'].apply(Char_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay,Sent_Count,Word_Count,Char_Count
186,187,"Dear,newspaper I think that people should have...",3.0,2.0,5.0,Dear newspaper I think that people should have...,3,116,582


#### 4. Average Word Length of Essay

In [9]:
def Avg_Word_Count(Essay):
  """
    Calculates Average Word Count In An Essay Set

    Args:
      Essay: Essay of each student 
    
    Returns: 
      float
      
  """
  word_list = nltk.word_tokenize(Essay)
  total = sum(map(len, word_list))/len(word_list)
  return total

Data_Essay_01['Avg_Word_Count'] = Data_Essay_01['Essay'].apply(Avg_Word_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay,Sent_Count,Word_Count,Char_Count,Avg_Word_Count
203,204,You want my opinion on the effects computers h...,5.0,5.0,10.0,You want my opinion on the effects computers h...,33,433,2078,3.951501


## **Parts Of Speech Counts**

In [10]:
def Pos_Tag_Count(Essay):
  """
    Counts Parts of Speech in an Essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int,int,int,int,int,int
      
  """
  tagged_doc = nlp(Essay)

  adj_count=0
  verb_count=0
  noun_count=0
  pNoun_count=0
  adverb_count=0
  conj_count=0

  for token in tagged_doc:

    if(token.pos_ == 'ADJ'):
      adj_count+=1
    
    elif(token.pos_ =='NOUN'):
      noun_count+=1

    elif (token.pos_ =='PRON'):
      pNoun_count+=1

    elif (token.pos_ =='VERB'):
      verb_count+=1

    elif (token.pos_ =='ADV'):
      adverb_count+=1
    
    elif(token.pos_=='CCONJ'):
      conj_count+=1

  return verb_count,noun_count, adj_count, conj_count, adverb_count,pNoun_count

In [11]:
Data_Essay_01['Verb_Count'], Data_Essay_01['Noun_Count'], Data_Essay_01['Adj_Count'], Data_Essay_01['Conj_Count'], Data_Essay_01['Adverb_Count'], Data_Essay_01['pNoun_Count']=zip(*Data_Essay_01["Preprocessed_Essay"].map(Pos_Tag_Count))
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay,Sent_Count,Word_Count,Char_Count,Avg_Word_Count,Verb_Count,Noun_Count,Adj_Count,Conj_Count,Adverb_Count,pNoun_Count
1445,1450,"Click click, another information page opens up...",5.0,5.0,10.0,Click click another information page opens up ...,21,433,2189,4.177829,37,91,39,17,22,33


# **Evaluating Writing Attributes**

## **Style**

### **Mechanics**

#### Counting Spelling Mistakes

In [12]:
def Check_Spelling(Sentence):
  """
    Checks spelling of each word

    Args:
      word: Words (Tokens) of each essay 
    
    Returns: 
      int
      
  """
  count = 0
  Sentence = word_tokenize(Sentence)
  for word in Sentence:
    word = Word(word)
  
    result = word.spellcheck()

    # result [0][0] contains the bool value if the spelling is correct or not
    # result [0][1] contains the confidence for the suggest correct spelling

    if word != result[0][0]:
      # print(f'Spelling of "{word}" is not correct!')
      # print(f'Correct spelling of "{word}": "{result[0][0]}" (with {result[0][1]} confidence).')
      count = count + 1

  return count

In [13]:
Data_Essay_01["Preprocessed_Essay"] = Data_Essay_01["Essay"].apply(Remove_NER)
Data_Essay_01["Preprocessed_Essay"] = Data_Essay_01["Preprocessed_Essay"].apply(Remove_Punctuations)

In [14]:
Data_Essay_01["Spelling_Mistakes_Count"]  = Data_Essay_01["Preprocessed_Essay"].map(Check_Spelling)

KeyboardInterrupt: ignored

#### Checking Punctuation Mistakes **(Incomplete)**

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification , pipeline

tokenizer = AutoTokenizer.from_pretrained('oliverguhr/fullstop-punctuation-multilang-large')

In [None]:
model = AutoModelForTokenClassification.from_pretrained('oliverguhr/fullstop-punctuation-multilang-large')

In [None]:
pun = pipeline('ner' , model = model , tokenizer = tokenizer)

Correcting Spelling Mistakes

In [48]:
def Correct_Spelling(Sentence):
  """
    Checks spelling of each word

    Args:
      word: Words (Tokens) of each essay 
    
    Returns: 
      int
      
  """
  Tokens = word_tokenize(Sentence)
  newTokens = []
  for word in Tokens:
    word = Word(word)
  
    result = word.spellcheck()

    # result [0][0] contains the bool value if the spelling is correct or not
    # result [0][1] contains the confidence for the suggest correct spelling
    
    if word != result[0][0]:
      if(result[0][1] > 0.9 and not(wordnet.synsets(word)) and not("/" in word) and not("@" in word)):
        newTokens.append(result[0][0])
        print(word , result[0][0])
      else: 
        newTokens.append(word)
    else:
      newTokens.append(word)
  return ' '.join(newTokens)

In [58]:
text = Correct_Spelling(Remove_White_Spaces(Data_Essay_01["Essay"][1]))

coordibates coordinate
myspace space
coordibates coordinate
If Of
n't not
n't not
appling applying
n't not


In [59]:
Data_Essay_01["Essay"][1] = Correct_Spelling(Remove_White_Spaces(Data_Essay_01["Essay"][1]))

coordibates coordinate
myspace space
coordibates coordinate
If Of
n't not
n't not
appling applying
n't not


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [57]:
# This needs to be executed for complete essay01 ---> Ahsan
Data_Essay_01['Essay_NoSpellingMistakes'] = Data_Essay_01['Essay'].apply(Remove_White_Spaces)
Data_Essay_01['Essay_NoSpellingMistakes'] = Data_Essay_01['Essay_NoSpellingMistakes'].apply(Correct_Spelling)

troble trouble
buisness business
myspace space
If Of
isnt isn
forbidde forbidden
troble trouble
coordibates coordinate
myspace space
coordibates coordinate
If Of
n't not
n't not
appling applying
n't not
n't not
n't not
n't not
benifit benefit
studdies studies
reasearch research
advertisments advertisements
imposible impossible
commucated communicated
reasearch research
reasearch research
posible possible
amagine imagine
n't not
buissness business
catalouge catalogue
castomer customer
coustomers customers
resturant restaurant
convinences convinces
reaserch research
Computors Computers
computors computers
If Of
n't not
n't not
n't not
computors computers
If Of
convience convince
n't not
n't not
confrence conference
parttners partners
conviente convient
subssiquently subsequently
conviente convient
camputer computer
posibility possibility
everthing everything
wouldbe would
phisically physically
ardthritis arthritis
myspace space
n't not
embarrasing embarrassing
're are
n't not
n't not
han

KeyboardInterrupt: ignored

In [54]:
print(text)

Dear @ CAPS1 @ CAPS2 , I believe that using computers will benefit us in many ways like talking and becoming friends will others through websites like facebook and mysace . Using computers can help us find coordinate , locations , and able ourselfs to millions of information . Also computers will benefit us by helping with jobs as in planning a house plan and typing a @ NUM1 page report for one of our jobs in less than writing it . Now lets go into the wonder world of technology . Using a computer will help us in life by talking or making friends on line . Many people have space , facebooks , aim , these all benefit us by having conversations with one another . Many people believe computers are bad but how can you make friends if you can never talk to them ? I am very fortunate for having a computer that can help with not only school work but my social life and how I make friends . Computers help us with finding our locations , coordinate and millions of information online . Of we did 

Checking Punctuation Mistakes

In [None]:
output = pun(text)

In [None]:
new_string = ''

for n in output:
  result = n['word'].replace('▁' , ' ') + n['entity'].replace('0', '')
  new_string += result

new_string

#### Checking Capitalization Mistakes

In [None]:
import nltk
nltk.download('averaged_perceptron_tagger')

In [None]:
from nltk.tag import pos_tag
def Check_Captialization(Essay):
  """
    Checks capitalization in each sentence of an essay

    Args:
    Essay: Words (Tokens) of each essay 

    Returns: 
    int

  """
  count = 0

  words = word_tokenize(Essay)

  # Checking Capital Letters in start of every sentence & start of every quote
  for i in range(len(words) - 1):
    
    if words[i] == '.' or words[i] == '"':
        match = words[i+1]
        if match != words[i+1].title():
          count = count + 1

  # Checking if all proper nouns are capital or not
  tagged_sent = pos_tag(words)

  for word,pos in tagged_sent:
    if(pos == 'NNP'):
      if word != word.title():
        count = count + 1

  return count

In [None]:
Check_Captialization(Remove_NER(Data_Essay_01["Essay"][9]))

In [None]:
Remove_NER(Data_Essay_01["Essay"][12])

#### Checking Spelling Mistakes via Language Tool

In [42]:
def Spelling_Error_Correct(essays):
    
    matches = tool.check(essays)
    is_bad_rule = lambda rule: rule.category == 'GRAMMAR'
    matches = [rule for rule in matches if not is_bad_rule(rule)]
    # print(matches[0].category)
    language_tool_python.utils.correct(essays, matches)   # to correct it
    return essays

In [43]:
Data_Essay_01['essayWithoutSpellingMistakes'] = Data_Essay_01['Essay'].apply(Spelling_Error_Correct)

### **Grammar Error Detection**

In [16]:
# from nltk.translate.bleu_score import sentence_bleu
# reference = result.text.split()

# candidate = 'Dear local newspaper, @CAPS1 best friend, @LOCATION2, was once a nerd with no hand-eye coordination, @CAPS2, he started to use a computer and now he has better hand-eye coordination than me.'.split()
# print('BLEU score -> {}'.format(sentence_bleu(reference, candidate )))

In [17]:
df1 = Data_Essay_01[['Essay', 'Sent_Count']]
df1.head()

Unnamed: 0,Essay,Sent_Count
0,"Dear local newspaper, I think effects computer...",16
1,"Dear @CAPS1 @CAPS2, I believe that using compu...",20
2,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",14
3,"Dear Local Newspaper, @CAPS1 I have found that...",27
4,"Dear @LOCATION1, I know having computers has a...",30


In [18]:
# Grammar Error via CFG

# def grammar_error(essays_1,sent_count):
#     sentences = nltk.sent_tokenize(essays_1[1])
#     for sent in range(0,sent_count):
#        wrong =1
#        sent_split = sentences[sent].split()  
#        tagged = nltk.pos_tag(sent_split) 
#        tags = [x[1].lower() for x in tagged] 

#        try:
#         parser = nltk.RecursiveDescentParser(grammar)
        
#         for tree in parser.parse(tags):
#             s = tree
#             wrong =0
#             print("Correct Grammar!!!!")
#             print("*"*20)
        
#         if wrong ==1:
#             print("Wrong Grammar!!!")
#             print("*"*20)
    
#        except ValueError:
#         print("Sorry! Some words are not covered in the grammar yet :)")

    
# essays_1 = df1['Essay_Clean'].tolist()
# sent_count = df1['Clean_Sent_Count'].tolist()
# grammar_error(essays_1,sent_count[1])

In [15]:
import language_tool_python
tool = language_tool_python.LanguageTool('en-US')

Downloading LanguageTool 5.7: 100%|██████████| 225M/225M [00:04<00:00, 49.3MB/s]
INFO:language_tool_python.download_lt:Unzipping /tmp/tmpi8eljz8g.zip to /root/.cache/language_tool_python.
INFO:language_tool_python.download_lt:Downloaded https://www.languagetool.org/download/LanguageTool-5.7.zip to /root/.cache/language_tool_python.


In [19]:
df1 = Data_Essay_01[['Essay', 'Sent_Count']]
df1['Essay'] = df1['Essay'].apply(Remove_White_Spaces)   # to avoid whitespace error
df1['Essay']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


0       Dear local newspaper, I think effects computer...
1       Dear @CAPS1 @CAPS2, I believe that using compu...
2       Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...
3       Dear Local Newspaper, @CAPS1 I have found that...
4       Dear @LOCATION1, I know having computers has a...
                              ...                        
1778    Dear @CAPS1, @CAPS2 several reasons on way I t...
1779    Do a adults and kids spend to much time on the...
1780    My opinion is that people should have computer...
1781    Dear readers, I think that its good and bad to...
1782    Dear - Local Newspaper I agree thats computers...
Name: Essay, Length: 1783, dtype: object

In [62]:
def Grammar_Errors(essays):
    
    matches = tool.check(essays)
    is_bad_rule = lambda rule: rule.category == 'GRAMMAR'
    matches = [rule for rule in matches if is_bad_rule(rule)]
    # print(matches[0].category)
    errors = []
    #language_tool_python.utils.correct(text, matches)   # to correct it
    for i in range(0, len(matches)):
      errors.append(matches[i].ruleId)  # or category of the error (Misc, Whitespace, Typography)
    return len(matches), errors

In [38]:
Grammar_Errors("Dear local newspaper, I think effects computers have on people are great learning skills/affects because they give us time to chat with friends/new people, helps us learn about the globe(astronomy) and keeps us out of troble! Thing about! Dont you think so? How would you feel if your teenager is always on the phone with friends! Do you ever time to chat with your friends or buisness partner about things. Well now - there's a new way to chat the computer, theirs plenty of sites on the internet to do so: @ORGANIZATION1, @ORGANIZATION2, @CAPS1, facebook, myspace ect. Just think now while your setting up meeting with your boss on the computer, your teenager is having fun on the phone not rushing to get off cause you want to use it. How did you learn about other countrys/states outside of yours? Well I have by computer/internet, it's a new way to learn about what going on in our time! You might think your child spends a lot of time on the computer, but ask them so question about the economy, sea floor spreading or even about the @DATE1's you'll be surprise at how much he/she knows. Believe it or not the computer is much interesting then in class all day reading out of books. If your child is home on your computer or at a local library, it's better than being out with friends being fresh, or being perpressured to doing something they know isnt right. You might not know where your child is, @CAPS2 forbidde in a hospital bed because of a drive-by. Rather than your child on the computer learning, chatting or just playing games, safe and sound in your home or community place. Now I hope you have reached a point to understand and agree with me, because computers can have great effects on you or child because it gives us time to chat with friends/new people, helps us learn about the globe and believe or not keeps us out of troble. Thank you for listening.")

GRAMMAR


(2, ['CAUSE_BECAUSE', 'BE_VBP_IN'])

In [72]:
text = Grammar_Errors(Correct_Spelling(Remove_White_Spaces(Data_Essay_01["Essay"][3])))
text

benifit benefit
studdies studies
reasearch research
advertisments advertisements
imposible impossible
commucated communicated
reasearch research
reasearch research
posible possible
amagine imagine
n't not
buissness business
catalouge catalogue
castomer customer
coustomers customers
resturant restaurant
convinences convinces
reaserch research


(0, [])

In [22]:
# Data_Essay_01['Grammar_Errors'], Data_Essay_01['Grammar_Error_List'] = zip(*df1_copy['Essay'].map(grammar_errors))
Data_Essay_01['Grammar_Errors'], Data_Essay_01['Grammar_Error_List'] = zip(*df1['Essay'].map(Grammar_Errors))

In [23]:
Data_Essay_01.head(20)

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Preprocessed_Essay,Sent_Count,Word_Count,Char_Count,Avg_Word_Count,Verb_Count,Noun_Count,Adj_Count,Conj_Count,Adverb_Count,pNoun_Count,Grammar_Errors,Grammar_Error_List
0,1,"Dear local newspaper, I think effects computer...",4.0,4.0,8.0,Dear local newspaper I think effects computer...,16,386,1875,3.984456,55,74,18,14,15,48,2,"[CAUSE_BECAUSE, BE_VBP_IN]"
1,2,"Dear @CAPS1 @CAPS2, I believe that using compu...",5.0,4.0,9.0,Dear I believe that using computers will benef...,20,464,2288,4.030172,71,97,19,18,19,49,4,"[ON_COMPOUNDS, NODT_DOZEN, YOU_HAV, DT_DT]"
2,3,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",4.0,3.0,7.0,Dear More and more people use computers but ...,14,313,1541,4.035144,42,69,17,16,11,25,5,"[THE_SUPERLATIVE, PHRASE_REPETITION, ITS_TO_IT..."
3,4,"Dear Local Newspaper, @CAPS1 I have found that...",5.0,5.0,10.0,Dear Local Newspaper I have found that many e...,27,611,3165,4.328969,71,126,39,17,21,33,0,[]
4,5,"Dear @LOCATION1, I know having computers has a...",4.0,4.0,8.0,Dear I know having computers has a positive ef...,30,517,2569,4.071567,61,107,30,15,34,41,1,[ARN_T]
5,6,"Dear @LOCATION1, I think that computers have a...",4.0,4.0,8.0,Dear I think that computers have a negative af...,15,274,1276,3.762774,42,37,9,9,18,41,3,"[BEEN_PART_AGREEMENT, PRP_PAST_PART, PRP_THE]"
6,7,Did you know that more and more people these d...,5.0,5.0,10.0,Did you know that more and more people these d...,30,580,2808,3.982759,69,115,23,16,34,73,1,[PROGRESSIVE_VERBS]
7,8,@PERCENT1 of people agree that computers make ...,5.0,5.0,10.0,of people agree that computers make life less ...,39,556,2724,4.03777,88,120,35,20,26,59,0,[]
8,9,"Dear reader, @ORGANIZATION1 has had a dramatic...",4.0,5.0,9.0,Dear reader has had a dramatic effect on huma...,35,512,2402,3.833984,70,95,31,22,30,56,4,"[DOSNT, SINGULAR_VERB_AFTER_THESE_OR_THOSE, SI..."
9,10,In the @LOCATION1 we have the technology of a ...,5.0,4.0,9.0,In the we have the technology of a computer S...,26,561,2632,3.798574,77,93,37,26,31,67,4,"[CYBER_COMPOUNDS, TOO_ADJECTIVE_TO, PRONOUN_NO..."


In [24]:
out = Data_Essay_01['Grammar_Error_List'].explode().value_counts()
out

HE_VERB_AGR             228
A_NNS                   139
ITS_TO_IT_S              96
MD_BASEFORM              91
BEEN_PART_AGREEMENT      86
                       ... 
SIMPLE_TO_USE_HYPHEN      1
GOOD_GOOF                 1
DO_YOU_FASCINATED         1
DOES_YOU                  1
SHUTDOWN                  1
Name: Grammar_Error_List, Length: 324, dtype: int64

In [25]:
out.to_csv('GrammarErrors.csv')

### **Lexical Sophistication**

## **Content**

### **Latent Semantic Analysis (LSA)**

Content analysis generally implies only a high-level semantic analysis and comparison with source text and graded essays

## **Semantic**
Semantic metrics assess the correctness of content connotation

## **Semantic Coherence & Consistency**

## **Connectivity**