Skip to content

sparecoder/PDF2Exam

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF2Exam - PDF Examination Question Extraction Tool

Table of Contents

Features

PDF2Exam is a convenient tool to extract images in pdf tests. PDF2Exam can cut:

  • Question's title and question's content
  • Answer options's title and Answer options's content
  • Correct answers
  • Explanation

Setup

Install Python 3.
Install fitz.

pip3 install fitz

Install all requirements.

pip3 install -r requirements.txt 

How to run

See the video on how to run the code.

Demo Video

Run the command below.

python app.py

Go to "http://localhost:5000/". Click "Choose file" and then hit "Submit." PDF2Exam will make a copy of your file in the folder uploads.

Upload image

After the file finishes processing, you can view the cropped images like below.

Display image

Demo

Uncropped full question

Here is image of a question before cutting. The result will have cropped images of question's title, question's content, answer option's title and answer option's content.

Uncropped full question image

Cropped Question's Content

Question's title will be deleted. Question's content are kept.

Cropped Question

Cropped Question's Title

Only question title word ("Câu", "Cau", "Bài", "Question") and question number (1,2,3) are included.

Question Title

Cropped answer Option's Title and Content

Answer Option's title Answer Option's content
Cropped Title A Cropped Content A
Cropped Title B Cropped Content A
Cropped Title C Cropped Content A
Cropped Title D Cropped Content A

Cropped Explanation

The explanation image will be cropped from the title of the question to the end of that question's explanation.

Explain

Understanding returned value

At the end of function extract_pdf, the json is returned. The returned values includes information about questions, answers, question's title, and correct answers.

{
   "questions":"data_question"[0],
   "answers":"data_question"[1],
   "titles":"data_question"[2],
   "correct_options":"correct_answers",
   "explains":"coor_explains_result"
}

Question

The value of each question contains the page the question is on and the question' coordinates (x0, y0,x1,y1) and base 64 image.

{
   "question_1":[
      {
         "page":0,
         "coor":[
            36.0,
            338.5330810546875,
            557.1121826171875,
            369.5158386230469,
            "data:<Base 64 image>"
         ]
      }
   ]
}

Answer Options

The answer options will have each option as a list. For the example below, there is only one options. The option's list includes option's title's coordinates (x0,y0,x1,y1) and base 64 image, option's content's coordinates (x0,y0,x1,y1) and base 64 image, and the page the option is on.

{
   "question_1":{
      "options":[
         [
            54.0,
            381.00201416015625,
            68.66400146484375,
            397.62200927734375,
            "data:<Base 64 image>",
            68.66400146484375,
            369.5158386230469,
            189.02000427246094,
            409.63446044921875,
            "data:<Base 64 image>",
            0
         ]
      ]
   }
}

Question's title

Question's title will be returned in a dictionary. The value contains question's title's coordinates (x0,y0,x1,y1) and a base 64 image.

{
   "question_1":[
      36.0,
      341.0419921875,
      70.33200073242188,
      357.6620178222656,
      "data:<Base 64 image>"
   ]
}

Correct Answer

Correct Answers will be returned in a dictionary. The key is the question and the value is the correct answer to that question

{
   "question_1":"D",
   "question_2":"C",
   "question_3":"B"
}

Explanation

Explanation will be returned in a dictionary. The key is the question and the value is the coordinates (x0,y0,x1,y1) and a base 64 image.

{
   "question_1":[
      106.22000122070312,
      44.600006103515625,
      457.7799987792969,
      372.468994140625,
      "data:<Base 64 image>"
   ]
}

About

Tool for extracting examination questions from PDF documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.9%
  • HTML 4.5%
  • CSS 3.4%
  • JavaScript 0.2%