PDF2Exam - PDF Examination Question Extraction Tool

Features

PDF2Exam is a convenient tool to extract images in pdf tests. PDF2Exam can cut:

Question's title and question's content
Answer options's title and Answer options's content
Correct answers
Explanation

Setup

Install Python 3.
Install fitz.

pip3 install fitz

Install all requirements.

pip3 install -r requirements.txt

How to run

See the video on how to run the code.

Run the command below.

python app.py

Go to "http://localhost:5000/". Click "Choose file" and then hit "Submit." PDF2Exam will make a copy of your file in the folder uploads.

After the file finishes processing, you can view the cropped images like below.

Demo

Uncropped full question

Here is image of a question before cutting. The result will have cropped images of question's title, question's content, answer option's title and answer option's content.

Cropped Question's Content

Question's title will be deleted. Question's content are kept.

Cropped Question's Title

Only question title word ("Câu", "Cau", "Bài", "Question") and question number (1,2,3) are included.

Cropped answer Option's Title and Content

Answer Option's title	Answer Option's content

Cropped Explanation

The explanation image will be cropped from the title of the question to the end of that question's explanation.

Understanding returned value

At the end of function extract_pdf, the json is returned. The returned values includes information about questions, answers, question's title, and correct answers.

{
   "questions":"data_question"[0],
   "answers":"data_question"[1],
   "titles":"data_question"[2],
   "correct_options":"correct_answers",
   "explains":"coor_explains_result"
}

Question

The value of each question contains the page the question is on and the question' coordinates (x0, y0,x1,y1) and base 64 image.

{
   "question_1":[
      {
         "page":0,
         "coor":[
            36.0,
            338.5330810546875,
            557.1121826171875,
            369.5158386230469,
            "data:<Base 64 image>"
         ]
      }
   ]
}

Answer Options

The answer options will have each option as a list. For the example below, there is only one options. The option's list includes option's title's coordinates (x0,y0,x1,y1) and base 64 image, option's content's coordinates (x0,y0,x1,y1) and base 64 image, and the page the option is on.

{
   "question_1":{
      "options":[
         [
            54.0,
            381.00201416015625,
            68.66400146484375,
            397.62200927734375,
            "data:<Base 64 image>",
            68.66400146484375,
            369.5158386230469,
            189.02000427246094,
            409.63446044921875,
            "data:<Base 64 image>",
            0
         ]
      ]
   }
}

Question's title

Question's title will be returned in a dictionary. The value contains question's title's coordinates (x0,y0,x1,y1) and a base 64 image.

{
   "question_1":[
      36.0,
      341.0419921875,
      70.33200073242188,
      357.6620178222656,
      "data:<Base 64 image>"
   ]
}

Correct Answer

Correct Answers will be returned in a dictionary. The key is the question and the value is the correct answer to that question

{
   "question_1":"D",
   "question_2":"C",
   "question_3":"B"
}

Explanation

Explanation will be returned in a dictionary. The key is the question and the value is the coordinates (x0,y0,x1,y1) and a base 64 image.

{
   "question_1":[
      106.22000122070312,
      44.600006103515625,
      457.7799987792969,
      372.468994140625,
      "data:<Base 64 image>"
   ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
extractPDF		extractPDF
images		images
pdf-example		pdf-example
static		static
templates		templates
uploads		uploads
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF2Exam - PDF Examination Question Extraction Tool

Table of Contents

Features

Setup

How to run

Demo

Uncropped full question

Cropped Question's Content

Cropped Question's Title

Cropped answer Option's Title and Content

Cropped Explanation

Understanding returned value

Question

Answer Options

Question's title

Correct Answer

Explanation

About

Uh oh!

Releases

Packages

Languages

License

sparecoder/PDF2Exam

Folders and files

Latest commit

History

Repository files navigation

PDF2Exam - PDF Examination Question Extraction Tool

Table of Contents

Features

Setup

How to run

Demo

Uncropped full question

Cropped Question's Content

Cropped Question's Title

Cropped answer Option's Title and Content

Cropped Explanation

Understanding returned value

Question

Answer Options

Question's title

Correct Answer

Explanation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages