## **Installing Packages**

In [None]:
!pip install textblob
!pip install sentencepiece  
!pip install transformers
!pip install textstat
!pip install language-tool-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting language-tool-python
  Downloading language_tool_python-2.7.1-py3-none-any.whl (34 kB)
Installing collected packages: language-tool-python
Successfully installed language-tool-python-2.7.1


Restart Runtime after installing 

## **Importing Packages**

In [None]:
import re
import nltk 
import spacy
import textstat
import numpy as np
import pandas as pd
import seaborn as sn 
from textblob import Word
import matplotlib.pyplot as plt
import language_tool_python

from nltk.tag import pos_tag
from nltk.corpus import wordnet
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('stopwords')
nlp = spacy.load("en_core_web_sm")
tool = language_tool_python.LanguageTool('en-US')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Downloading LanguageTool 5.7: 100%|██████████| 225M/225M [00:12<00:00, 17.8MB/s]
INFO:language_tool_python.download_lt:Unzipping /tmp/tmp9940ntp8.zip to /root/.cache/language_tool_python.
INFO:language_tool_python.download_lt:Downloaded https://www.languagetool.org/download/LanguageTool-5.7.zip to /root/.cache/language_tool_python.


# **Reading Data From Google Drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive')
Data_Essay_01 = pd.read_csv("/content/drive/MyDrive/IntelliTech-DataSet/EssaySet01.csv")
Data_Essay_01.head()

Mounted at /content/drive


Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score
0,1,"Dear local newspaper, I think effects computer...",4.0,4.0,8.0
1,2,"Dear @CAPS1 @CAPS2, I believe that using compu...",5.0,4.0,9.0
2,3,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",4.0,3.0,7.0
3,4,"Dear Local Newspaper, @CAPS1 I have found that...",5.0,5.0,10.0
4,5,"Dear @LOCATION1, I know having computers has a...",4.0,4.0,8.0


# **Feature Extraction**

## **Essay Pre Processing**

In [None]:
def Remove_NER(Essay):
  """
    Removes Named Entity Recognition (NER) from each essay

    Args:
      Sentence: Essay of each student 
    
    Returns: 
      String

  """
  return ' '.join (word for word in Essay.split(' ') if not word.startswith('@'))

def Remove_Punctuations(sentence):
  """
    Removes punctuations from text
    Args:
      sentence: Essay of each student
    
    Returns: 
      String
  """
  punctuations = '''!()-[]{};:"\,/'<>.?@#$%^&*_~'''
  newSentence = ""
  for word in sentence:
      if (word in punctuations):
          newSentence = newSentence + " "
      else: 
          newSentence = newSentence + word
  return newSentence

def LowerCase_Words(Essay):
  """
    Lower case all the words in an essay

    Args:
      Sentence: Essay of each student
    
    Returns: 
      String
  """
  return re.sub('[0-9]+','', Essay).lower() 

def Tokenize_Essay(Essay):
    """
      Create Tokens of each Essay

      Args:
        Essay: Essay of each student
      
      Returns: 
        String
    """
    Preprocessed = Remove_Punctuations(Essay)
    return " ".join(word_tokenize(Preprocessed))

def Remove_White_Spaces(Essay):
  """
    Removes Extra White Spaces

    Args:
      Essay: Essay of each student
    
    Returns: 
      String
  """
  return " ".join(Essay.split())

def Remove_Special_Characters(Essay):
  """
    Removes Special Characters from Essay

    Args:
      Essay: Essay of each student
    
    Returns: 
      String
  """
  new_text = re.sub(r"[^a-zA-Z0-9 ]", "", Essay)
  return new_text

## **Basic Count Features**

This section will cover:


*   Counting Sentences per Essay
*   Counting Words per Essay
*   Counting Characters per Essay
*   Average Words per Essay
*   Counting Syllables


#### Counting Sentences per Essay

In [None]:
def Sentence_Count(Essay):
    """
    Counts sentences in an essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int
  """
    sentence_no = nltk.sent_tokenize(Essay)
    return len(sentence_no)

In [None]:
Data_Essay_01['Sent_Count'] = Data_Essay_01['Essay'].apply(Sentence_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Sent_Count
1053,1056,Many people say that computer is help them in ...,3.0,3.0,6.0,18


#### Counting Words per Essay

**Observation:** These word count are more than the original count coz of nltk tokenization. Punctations are treated as seperate words.


In [None]:
def Word_Count(Essay):
  """
    Counts words in an essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int  
  """ 
  word_no = nltk.word_tokenize(Essay)
  return len(word_no)

In [None]:
Data_Essay_01['Word_Count'] = Data_Essay_01['Essay'].apply(Word_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Sent_Count,Word_Count
585,588,"Dear the newspaper, Computers are great resour...",5.0,5.0,10.0,32,601


#### Counting Characters per Essay

In [None]:
def Char_Count(Essay):
  """
    Counts characters in an essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int
  """
  return len([character for character in Essay])

In [None]:
Data_Essay_01['Char_Count'] = Data_Essay_01['Essay'].apply(Char_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Sent_Count,Word_Count,Char_Count
317,319,Computers definitely have a positive impact on...,4.0,5.0,9.0,29,449,2171


#### Average Word Length of Essay

In [None]:
def Avg_Word_Count(Essay):
  """
    Calculates Average Word Count In An Essay Set

    Args:
      Essay: Essay of each student 
    
    Returns: 
      float
      
  """
  word_list = nltk.word_tokenize(Essay)
  total = sum(map(len, word_list))/len(word_list)
  return total

In [None]:
Data_Essay_01['Avg_Word_Count'] = Data_Essay_01['Essay'].apply(Avg_Word_Count)
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Sent_Count,Word_Count,Char_Count,Avg_Word_Count
32,33,"Dear, @ORGANIZATION1 I think the effects that ...",3.0,3.0,6.0,12,191,917,3.931937


#### Counting Syllables

In [None]:
def Syllable_Count(text):
  return textstat.syllable_count(text, lang='en_US')

## **Parts Of Speech Counts**

This section will cover:


*   Counting Nouns per Essay
*   Counting Adjectives per Essay
*   Counting Proper Nouns per Essay
*   Counting Adverbs per Essay
*   Counting Conjunctions per Essay

Removing NERs, Punctuations and Lower Casing

In [None]:
Data_Essay_01['Preprocessed_Essay'] = Data_Essay_01['Essay'].apply(Remove_NER)
Data_Essay_01['Preprocessed_Essay'] = Data_Essay_01['Preprocessed_Essay'].apply(Tokenize_Essay)
Data_Essay_01.head()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Sent_Count,Word_Count,Char_Count,Avg_Word_Count,Preprocessed_Essay
0,1,"Dear local newspaper, I think effects computer...",4.0,4.0,8.0,16,386,1875,3.984456,Dear local newspaper I think effects computers...
1,2,"Dear @CAPS1 @CAPS2, I believe that using compu...",5.0,4.0,9.0,20,464,2288,4.030172,Dear I believe that using computers will benef...
2,3,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",4.0,3.0,7.0,14,313,1541,4.035144,Dear More and more people use computers but no...
3,4,"Dear Local Newspaper, @CAPS1 I have found that...",5.0,5.0,10.0,27,611,3165,4.328969,Dear Local Newspaper I have found that many ex...
4,5,"Dear @LOCATION1, I know having computers has a...",4.0,4.0,8.0,30,517,2569,4.071567,Dear I know having computers has a positive ef...


In [None]:
def Pos_Tag_Count(Essay):
  """
    Counts Parts of Speech in an Essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int,int,int,int,int,int    
  """
  tagged_doc = nlp(Essay)

  adj_count=0
  verb_count=0
  noun_count=0
  pNoun_count=0
  adverb_count=0
  conj_count=0

  for token in tagged_doc:

    if(token.pos_ == 'ADJ'):
      adj_count+=1
    
    elif(token.pos_ =='NOUN'):
      noun_count+=1

    elif (token.pos_ =='PRON'):
      pNoun_count+=1

    elif (token.pos_ =='VERB'):
      verb_count+=1

    elif (token.pos_ =='ADV'):
      adverb_count+=1
    
    elif(token.pos_=='CCONJ'):
      conj_count+=1

  return verb_count,noun_count, adj_count, conj_count, adverb_count,pNoun_count

In [None]:
Data_Essay_01['Verb_Count'], Data_Essay_01['Noun_Count'], Data_Essay_01['Adj_Count'], Data_Essay_01['Conj_Count'], Data_Essay_01['Adverb_Count'], Data_Essay_01['pNoun_Count']=zip(*Data_Essay_01["Preprocessed_Essay"].map(Pos_Tag_Count))
Data_Essay_01.sample()

Unnamed: 0,ID,Essay,Rater_1 Score,Rater_2 Score,Total Score,Sent_Count,Word_Count,Char_Count,Avg_Word_Count,Preprocessed_Essay,Verb_Count,Noun_Count,Adj_Count,Conj_Count,Adverb_Count,pNoun_Count
916,919,"Dear @LOCATION1, I understand this focus situa...",5.0,6.0,11.0,22,443,2139,3.934537,Dear I understand this focus situation the eff...,63,89,18,7,28,58


# **Evaluating Writing Attributes**

This section will cover:


*   Style
*   Content
*   Semantic
*   Semantic Coherence & Consistency 
*   Connectivity
*   Readibility Scores


## **Style**

This section will cover:


*   Mechanics
*   Grammar
*   Lexical Sophistication



### **Mechanics**

This section will cover:


*   Counting Spelling Mistakes
*   Checking Punctuations
*   Counting Punctuations
*   Checking Capitalization



#### Counting Spelling Mistakes

In [None]:
def Check_Spelling(Sentence):
  """
    Checks spelling of each word

    Args:
      word: Words (Tokens) of each essay 
    
    Returns: 
      int
  """
  count = 0
  Sentence = word_tokenize(Sentence)
  for word in Sentence:
    word = Word(word)
  
    result = word.spellcheck()

    # result [0][0] contains the bool value if the spelling is correct or not
    # result [0][1] contains the confidence for the suggest correct spelling

    if word != result[0][0]:
      if(result[0][1] > 0.95 and not(wordnet.synsets(word)) and not("/" in word)):
        print(word , result[0][0])
        count = count + 1

  return count

In [None]:
Data_Essay_01["Preprocessed_Essay"] = Data_Essay_01["Essay"].apply(Remove_NER)
Data_Essay_01["Preprocessed_Essay"] = Data_Essay_01["Preprocessed_Essay"].apply(Remove_Punctuations)

In [None]:
Data_Essay_01["Spelling_Mistakes_Count"]  = Data_Essay_01["Preprocessed_Essay"].map(Check_Spelling)
Data_Essay_01.sample()

troble trouble
buisness business
myspace space
forbidde forbidden
troble trouble
coordibates coordinate
myspace space
coordibates coordinate
appling applying
benifit benefit
studdies studies
reasearch research
advertisments advertisements
imposible impossible
reasearch research
reasearch research
posible possible
amagine imagine
buissness business
catalouge catalogue
castomer customer
coustomers customers
resturant restaurant
convinences convinces
reaserch research
Computors Computers
computors computers
computors computers
convience convince
confrence conference
parttners partners
conviente convient
subssiquently subsequently
conviente convient
camputer computer
posibility possibility
everthing everything
wouldbe would
phisically physically
ardthritis arthritis
myspace space
embarrasing embarrassing
feild field
everyones everyone
Ther Her
atleast least
obease obese
problum problem
socioty society
obease obese
cumputers computers
amarica america
exersise exercise
computeres computers
m

KeyboardInterrupt: ignored

#### Checking Punctuation Mistakes **(Incomplete)**

Correcting Spelling Mistakes via TextBlob

In [None]:
def Correct_Spelling(Sentence):
  """
    Checks spelling of each word

    Args:
      word: Words (Tokens) of each essay 
    
    Returns: 
      int
      
  """
  Tokens = word_tokenize(Sentence)
  newTokens = []
  for word in Tokens:
    word = Word(word)
  
    result = word.spellcheck()

    # result [0][0] contains the bool value if the spelling is correct or not
    # result [0][1] contains the confidence for the suggest correct spelling
    
    if word != result[0][0]:
      if(result[0][1] > 0.9 and not(wordnet.synsets(word)) and not("/" in word) and not("@" in word)):
        newTokens.append(result[0][0])
        print(word , result[0][0])
      else: 
        newTokens.append(word)
    else:
      newTokens.append(word)
  return ' '.join(newTokens)

In [None]:
Data_Essay_01['Essay_NoWhiteSpace'] = Data_Essay_01['Essay'].apply(Remove_White_Spaces)

In [None]:
Data_Essay_01["Essay_Corrected"]  = Data_Essay_01["Essay_Corrected"].map(Correct_Spelling)

troble trouble
buisness business
myspace space
If Of
isnt isn
forbidde forbidden
troble trouble
coordibates coordinate
myspace space
coordibates coordinate
If Of
n't not
n't not
appling applying
n't not
n't not
n't not
n't not
benifit benefit
studdies studies
reasearch research
advertisments advertisements
imposible impossible
commucated communicated
reasearch research
reasearch research
posible possible
amagine imagine
n't not
buissness business
catalouge catalogue
castomer customer
coustomers customers
resturant restaurant
convinences convinces
reaserch research
Computors Computers
computors computers
If Of
n't not
n't not
n't not
computors computers
If Of
convience convince
n't not
n't not
confrence conference
parttners partners
conviente convient
subssiquently subsequently
conviente convient
camputer computer
posibility possibility
everthing everything
wouldbe would
phisically physically
ardthritis arthritis
myspace space
n't not
embarrasing embarrassing
're are
n't not
n't not
han

KeyboardInterrupt: ignored

Correcting Spelling Mistakes via LanguageTool

In [None]:
def Spelling_Error_Correct(essays):
    matches = tool.check(essays)
    is_bad_rule = lambda rule: rule.category == 'GRAMMAR'
    matches = [rule for rule in matches if not is_bad_rule(rule)]
    # print(matches[0].category)
    language_tool_python.utils.correct(essays, matches)   # to correct it
    return essays

In [None]:
Data_Essay_01['Essay_SpellingCorrected_LT'] = Data_Essay_01['Essay_NoWhiteSpace'].apply(Spelling_Error_Correct)

Checking Punctuation Mistakes

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification , pipeline

In [None]:
tokenizer = AutoTokenizer.from_pretrained('oliverguhr/fullstop-punctuation-multilang-large')
model = AutoModelForTokenClassification.from_pretrained('oliverguhr/fullstop-punctuation-multilang-large')
pun = pipeline('ner' , model = model , tokenizer = tokenizer)

In [None]:
tags = pun(text)

Updated_string = ''

for output in tags:
  result = output['word'].replace('▁' , ' ') + output['entity'].replace('0', '')
  Updated_string += result

Updated_string

#### Counting Number Of Punctuations

In [None]:
def Count_Punctuations(Essay):
  """
    Counts Punctuations used in an Essay

    Args:
      Essay: Essay of each student 
    
    Returns: 
      int,int,int,int,int
      
  """
  count_fullstops = 0
  count_exclamation = 0
  count_comma = 0
  count_hyphens = 0
  count_questionmark = 0

  tokens = word_tokenize(Essay)

  for word in tokens:
    if word == ".":
      count_fullstops += 1
    elif word == "!":
      count_exclamation += 1
    elif word == "?":
      count_questionmark += 1
    elif word == ",":
      count_comma += 1
    elif word == "-":
      count_hyphens += 1

  return count_fullstops , count_exclamation , count_comma , count_questionmark , count_hyphens

In [None]:
Data_Essay_01["Count_Fullstops"] , Data_Essay_01["Count_Exclamation"] , Data_Essay_01["Count_Comma"] , Data_Essay_01["Count_Questionmark"] , Data_Essay_01["Count_Hyphens"] = zip(*Data_Essay_01["Essay"].map(Count_Punctuations))
Data_Essay_01.sample()

#### Checking Capitalization Mistakes

In [None]:
def Check_Capitalization(Essay):
  """
    Checks capitalization in each sentence of an essay

    Args:
    Essay: Words (Tokens) of each essay 

    Returns: 
    int

  """
  count = 0

  words = word_tokenize(Essay)
  temp = []
  # Checking Capital Letters in start of every sentence & start of every quote
  for i in range(len(words) - 1):
    
    if (i == 0):
      if words[i] != words[i].title():
        count = count + 1
    elif words[i] == '.' or words[i] == '"':
      if not("@" in words[i+1]):
        match = words[i+1]
        if match != words[i+1].title():
          count = count + 1
      else:
        i = i + 2
    else:
      for character in words[i]:
        if character.isupper():
          temp.append(words[i])
          count = count + 1
    
  # Checking if all proper nouns are capital or not
  tagged_sent = pos_tag(words)

  for word,pos in tagged_sent:
    if(pos == 'NNP'):
      if (word in temp):
        count = count - 1
        temp.remove(word)
      if word != word.title():
        count = count + 1

  return count

In [None]:
Data_Essay_01["Capitalization_Errors"] = Data_Essay_01["Essay"].apply(Check_Capitalization)
Data_Essay_01.sample()

### **Grammar Error Detection**

In [None]:
# from nltk.translate.bleu_score import sentence_bleu
# reference = result.text.split()

# candidate = 'Dear local newspaper, @CAPS1 best friend, @LOCATION2, was once a nerd with no hand-eye coordination, @CAPS2, he started to use a computer and now he has better hand-eye coordination than me.'.split()
# print('BLEU score -> {}'.format(sentence_bleu(reference, candidate )))

In [None]:
df1 = Data_Essay_01[['Essay_SpellingCorrected_LT', 'Sent_Count']]
# df1['Essay_Spelling_Corrected_LT'] = df1['Essay_SpellingCorrected_LT'].apply(Remove_White_Spaces)   # to avoid whitespace error
df1.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Essay_SpellingCorrected_LT,Sent_Count,Essay_Spelling_Corrected_LT
0,"Dear local newspaper, I think effects computer...",16,"Dear local newspaper, I think effects computer..."
1,"Dear @CAPS1 @CAPS2, I believe that using compu...",20,"Dear @CAPS1 @CAPS2, I believe that using compu..."
2,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl...",14,"Dear, @CAPS1 @CAPS2 @CAPS3 More and more peopl..."
3,"Dear Local Newspaper, @CAPS1 I have found that...",27,"Dear Local Newspaper, @CAPS1 I have found that..."
4,"Dear @LOCATION1, I know having computers has a...",30,"Dear @LOCATION1, I know having computers has a..."


In [None]:
def Grammar_Errors(essays):
    
    matches = tool.check(essays)
    is_bad_rule = lambda rule: rule.category == 'GRAMMAR'
    matches = [rule for rule in matches if is_bad_rule(rule)]
    # print(matches[0].category)
    errors = []
    #language_tool_python.utils.correct(text, matches)   # to correct it
    for i in range(0, len(matches)):
      errors.append(matches[i].ruleId)  # or category of the error (Misc, Whitespace, Typography)
    return len(matches), errors

In [None]:
Grammar_Errors("Dear local newspaper, I think effects computers have on people are great learning skills/affects because they give us time to chat with friends/new people, helps us learn about the globe(astronomy) and keeps us out of troble! Thing about! Dont you think so? How would you feel if your teenager is always on the phone with friends! Do you ever time to chat with your friends or buisness partner about things. Well now - there's a new way to chat the computer, theirs plenty of sites on the internet to do so: @ORGANIZATION1, @ORGANIZATION2, @CAPS1, facebook, myspace ect. Just think now while your setting up meeting with your boss on the computer, your teenager is having fun on the phone not rushing to get off cause you want to use it. How did you learn about other countrys/states outside of yours? Well I have by computer/internet, it's a new way to learn about what going on in our time! You might think your child spends a lot of time on the computer, but ask them so question about the economy, sea floor spreading or even about the @DATE1's you'll be surprise at how much he/she knows. Believe it or not the computer is much interesting then in class all day reading out of books. If your child is home on your computer or at a local library, it's better than being out with friends being fresh, or being perpressured to doing something they know isnt right. You might not know where your child is, @CAPS2 forbidde in a hospital bed because of a drive-by. Rather than your child on the computer learning, chatting or just playing games, safe and sound in your home or community place. Now I hope you have reached a point to understand and agree with me, because computers can have great effects on you or child because it gives us time to chat with friends/new people, helps us learn about the globe and believe or not keeps us out of troble. Thank you for listening.")

(2, ['CAUSE_BECAUSE', 'BE_VBP_IN'])

In [None]:
# Data_Essay_01['Grammar_Errors'], Data_Essay_01['Grammar_Error_List'] = zip(*df1_copy['Essay'].map(grammar_errors))
Data_Essay_01['Grammar_Error_Count'], Data_Essay_01['Grammar_Error_List'] = zip(*df1['Essay_SpellingCorrected_LT'].map(Grammar_Errors))

In [None]:
Data_Essay_01.columns

Index(['ID', 'Essay', 'Rater_1 Score', 'Rater_2 Score', 'Total Score',
       'Sent_Count', 'Word_Count', 'Char_Count', 'Avg_Word_Count',
       'Preprocessed_Essay', 'Verb_Count', 'Noun_Count', 'Adj_Count',
       'Conj_Count', 'Adverb_Count', 'pNoun_Count', 'Essay_Corrected',
       'Essay_Correct_LT', 'Grammar_Errors', 'Grammar_Error_List',
       'Essay_SpellingCorrected_LT', 'Grammar_Error_Count',
       'Essay_NoWhiteSpace'],
      dtype='object')

In [None]:
out = Data_Essay_01['Grammar_Error_List'].explode().value_counts()
out

HE_VERB_AGR             228
A_NNS                   139
ITS_TO_IT_S              96
MD_BASEFORM              91
BEEN_PART_AGREEMENT      86
                       ... 
SIMPLE_TO_USE_HYPHEN      1
GOOD_GOOF                 1
DO_YOU_FASCINATED         1
DOES_YOU                  1
SHUTDOWN                  1
Name: Grammar_Error_List, Length: 324, dtype: int64

In [None]:
out.to_csv('GrammarErrors.csv')

In [None]:
features = Data_Essay_01[['Sent_Count', 'Word_Count', 'Char_Count', 'Avg_Word_Count','Verb_Count', 'Noun_Count', 'Adj_Count',
       'Conj_Count', 'Adverb_Count', 'pNoun_Count', 'Grammar_Error_Count', 'Grammar_Error_List']]
features.to_csv("EssaySet01_Features.csv")     

### **Lexical Sophistication**

In [None]:
pip install taaled
#RESOURCES FOR LEXICAL SOPHISTICATION 
#https://eli-data-mining-group.github.io/Pitt-ELI-Corpus/publications/Naismith_2019.pdf
#https://pypi.org/project/taaled/
#https://github.com/LCR-ADS-Lab/pylats

SyntaxError: ignored

## **Content**

This section will cover:


*   Latent Semantic Analysis (LSA)


### **Latent Semantic Analysis (LSA)**

Content analysis generally implies only a high-level semantic analysis and comparison with source text and graded essays

## **Semantic**
Semantic metrics assess the correctness of content connotation

## **Semantic Coherence & Consistency**

## **Connectivity**

## **Readibility Scores**

We section will cover:
1.   Flesch Reading Ease
2.   Flesch-Kincaid Grade Level
3.   Gunning Fog Index
4.   Dale Chall Readability Formula
5.   Shannon Entropy
6.   Simpson's Index

In [None]:
def Flesch_Reading_East_Score(scount,NoOfsentences,total_Words):
 return (206.835-1.015*(total_Words/float(NoOfsentences))-84.6*(scount / float(total_Words)))

In [None]:
for index, row in Data_Essay_01.iterrows():
  Data_Essay_01['Flesch_Reading_Score'][index]=Flesch_Reading_East_Score(row["Syllable_Count"],row["Sent_Count"],row["Word_Count"])