# **Introduction**

This project provides a great start to build your own auto-grading system. It aims to cut the tedious hours spent in correcting the answers of students on tests/exams. The proposed module presents the final grade attained on any given test within seconds.

## **Installing necessary Packages**



In [None]:
!pip install nltk 



In [None]:
!pip install gensim



## **Importing necessary Packages**

In [None]:
import nltk
import gensim
import numpy as np
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
from nltk.tokenize import word_tokenize, sent_tokenize

**Tokenization** is a necessary first step in many NLP tasks (word counting, corpus generation, spell check, etc) 

The method **word_tokenize()** is use to split the sentence into words as shown in the example below. Similiarly, **sent_tokenize** is used to split two or more sentences.

In [None]:
data = "i am an indian"
print(word_tokenize(data))

['i', 'am', 'an', 'indian']


## **Sample Text Files**

**Note:** These files are created only for the simplicity of testing purpose. The user is expected to attach the respective text files for **Correct Answer** and **Students' Answers**

In [None]:
 #creating txt sample for correct answer

with open('predicted_answers.txt', 'w') as writefile:
  writefile.write("Saturn is yellow planet.")
  writefile.write("\nMars is the second smallest planet in our Solar system.")
  writefile.write("\nSaturn is the sixth planet from the Sun.")

In [None]:
#creating txt Sample 1 for students' answer

with open('student_answers.txt', 'w') as writefile:
  writefile.write("Mars is the fourth planet in our solar system.")
  writefile.write("\nIt is second-smallest planet in the Solar System after Mercury.")
  writefile.write("\nSaturn is yellow planet.") 

In [None]:
#creating txt Sample 2 for students' answer

with open('predicted_answers.txt', 'w') as writefile:
  writefile.write("Mars is the second smallest planet in our Solar system.")
  writefile.write("\nSaturn is the sixth planet from the Sun.")
  writefile.write("\nSaturn is yellow planet.")

## **Tokenization of student answers**

**Step 1:** Splitting different answers using sent_tokenize()

In [None]:
ans_doc= []

with open('student_answers.txt') as f:
  tokens= sent_tokenize(f.read())
  for line in tokens:
    ans_doc.append(line)

In [None]:
print(ans_doc)
print("\nTotal number of answers: ", len(ans_doc))

['Mars is the fourth planet in our solar system.', 'It is second-smallest planet in the Solar System after Mercury.', 'Saturn is yellow planet.']

Total number of answers:  3


**Step 2:** Tokenizing answer words to create a dictionary

In [None]:
ans_dict = [[w.lower() for w in word_tokenize(text)] 
            for text in ans_doc]

In [None]:
print(ans_dict)

[['mars', 'is', 'the', 'fourth', 'planet', 'in', 'our', 'solar', 'system', '.'], ['it', 'is', 'second-smallest', 'planet', 'in', 'the', 'solar', 'system', 'after', 'mercury', '.'], ['saturn', 'is', 'yellow', 'planet', '.']]


**Step 3:** Giving Unique ID to each word

In [None]:
ans_id = gensim.corpora.Dictionary(ans_dict)
print(ans_id.token2id)

{'.': 0, 'fourth': 1, 'in': 2, 'is': 3, 'mars': 4, 'our': 5, 'planet': 6, 'solar': 7, 'system': 8, 'the': 9, 'after': 10, 'it': 11, 'mercury': 12, 'second-smallest': 13, 'saturn': 14, 'yellow': 15}


**Step 4:** Creating a bag of words. 
It assigns frequency of words at each index.

In [None]:
ans_corpus= [ans_id.doc2bow(a) for a in ans_dict]
print(ans_corpus)

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1)], [(0, 1), (2, 1), (3, 1), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1)], [(0, 1), (3, 1), (6, 1), (14, 1), (15, 1)]]


**Step 5:** TFIDF-
Assign weights to words. The higher the frequecny, lower is the weight.

In [None]:
tf_idf1 = gensim.models.TfidfModel(ans_corpus)

for doc in tf_idf1[ans_corpus]:
    print([[ans_id[id], np.around(freq, decimals=2)] for id, freq in doc])

[['fourth', 0.53], ['in', 0.2], ['mars', 0.53], ['our', 0.53], ['solar', 0.2], ['system', 0.2], ['the', 0.2]]
[['in', 0.17], ['solar', 0.17], ['system', 0.17], ['the', 0.17], ['after', 0.47], ['it', 0.47], ['mercury', 0.47], ['second-smallest', 0.47]]
[['saturn', 0.71], ['yellow', 0.71]]


## **Sentence Tokenization of Predicted Answers (correct)**

**Step 1:** Splitting answers of the student based on different questions.

In [None]:
pred_doc= []

with open('predicted_answers.txt') as f:
  tokens= sent_tokenize(f.read())
  for line in tokens:
    pred_doc.append(line)

In [None]:
print(pred_doc)
print("\nTotal number of answers: ", len(pred_doc))

['Saturn is yellow planet.', 'Mars is the second smallest planet in our Solar system.', 'Saturn is the sixth planet from the Sun.']

Total number of answers:  3


## **Checking Similarity**

**Creating Similarity object:**

In [None]:
sims = gensim.similarities.Similarity('sample_data/',tf_idf1[ans_corpus],num_features=len(ans_id))

**Similarity Algorithm:**

In [None]:
arr=[]
for line in pred_doc:
    pred= [w.lower() for w in word_tokenize(line)]
    pred_corpus = ans_id.doc2bow(pred)
    print("Tokenized words for predicted ans:\n",pred)
    print("Bag of Words with frequency:\n",pred_corpus)
    pred_tfidf= tf_idf1[pred_corpus]
    print("Similarity:",sims[pred_tfidf],"\n")
    arr.append(sims[pred_tfidf])
print(arr)


Tokenized words for predicted ans:
 ['saturn', 'is', 'yellow', 'planet', '.']
Bag of Words with frequency:
 [(0, 1), (3, 1), (6, 1), (14, 1), (15, 1)]
Similarity: [0.         0.         0.99999994] 

Tokenized words for predicted ans:
 ['mars', 'is', 'the', 'second', 'smallest', 'planet', 'in', 'our', 'solar', 'system', '.']
Bag of Words with frequency:
 [(0, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1)]
Similarity: [0.8472902  0.16020904 0.        ] 

Tokenized words for predicted ans:
 ['saturn', 'is', 'the', 'sixth', 'planet', 'from', 'the', 'sun', '.']
Bag of Words with frequency:
 [(0, 1), (3, 1), (6, 1), (9, 2), (14, 1)]
Similarity: [0.11641413 0.10281226 0.56890744] 

[array([0.        , 0.        , 0.99999994], dtype=float32), array([0.8472902 , 0.16020904, 0.        ], dtype=float32), array([0.11641413, 0.10281226, 0.56890744], dtype=float32)]


  result = numpy.hstack(shard_results)


## **Final Grade**

**Finding the Average Similarity of answers:**

In [None]:
temp = 0
for i in range(len(arr)):
  for j in range(len(arr)):
    if i==j:
      temp+= arr[i][i]

avg_similarity= temp/len(arr)
print("Average Similarity= ",avg_similarity)

Average Similarity=  0.24303882817427316


**Similarity in terms of percentage:**

In [None]:
perc_similarity= round(avg_similarity * 100)
print(perc_similarity,"%")

24 %


**Evaluation of Final Score:**

In [None]:
max_marks= int(input("Enter the maximum attainable marks on this test: "))

Enter the maximum attainable marks on this test: 50


In [None]:
student_score = (perc_similarity/100) * 50
print("The final score is ", student_score, "/", max_marks)

The final score is  12.0 / 50


## **CSV for taking input answers**
Additional section which provides a medium to get input answers.

In [None]:
# Creating CSV
import csv

with open('answers.csv', 'w') as csvfile:
    filewriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
    filewriter.writerow(['Answer'])

In [None]:
# Taking input from user for answers
ans1 = input()
with open('answers.csv', 'a') as f:
    writer = csv.writer(f)
    # the input answers will be saved as each line in one cell of the same row.
    writer.writerow(ans1.splitlines( ))

Hi, this is the first sample answer


In [None]:
# Displaying the contents of answer.csv file
f = open('answers.csv', 'r')
if f.mode == 'r':
    contents = f.read()
print(contents)

Answer
"Hi, this is the first sample answer"



# **Future Work**

The proposed module will prove to be a great helping hand in the educational domain. It has huge scope for future development as is discussed in this section. 

---


Here are some points that can be considered to further optimize this project:


1.   **Prediction of correct answers:** The proposed algorithm requires the user to provide a file consisting of correct answers. Instead what if the algorithm could itself predict the coorect answer and then evaluate the results? This would further automate the entire process.

2. **Convert handwritten answers to text:** Ideally the answers will handwritten by the student will be evaluated with the help of NLP techniques 

3.   **Designing an Interactive UI:** Web app or mobile app for ease of use to appear for tests and get instant results and performance feedback



