Sparktech-Hackathon-Textract

Extract information from pdfs. Turn unstructured data into structured data. http://www.sparktech.ro/textract/

Dependencies : Python 2.7.x Libraries : sklearn.svm glob numpy

How to run the project : run "main.py" python script this script will output the tables found in the test pdf files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calculate_distance_to_neighbours.py		calculate_distance_to_neighbours.py
calculate_mean_distance.py		calculate_mean_distance.py
calculate_row_length.py		calculate_row_length.py
extractXML.py		extractXML.py
get_label_for_row.py		get_label_for_row.py
get_training_and_test_sets.py		get_training_and_test_sets.py
has_bold_word.py		has_bold_word.py
main.py		main.py
read_pdf_from_json.py		read_pdf_from_json.py
sort_row_by_x.py		sort_row_by_x.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

calculate_distance_to_neighbours.py

calculate_distance_to_neighbours.py

calculate_mean_distance.py

calculate_mean_distance.py

calculate_row_length.py

calculate_row_length.py

extractXML.py

extractXML.py

get_label_for_row.py

get_label_for_row.py

get_training_and_test_sets.py

get_training_and_test_sets.py

has_bold_word.py

has_bold_word.py

main.py

main.py

read_pdf_from_json.py

read_pdf_from_json.py

sort_row_by_x.py

sort_row_by_x.py

Repository files navigation

Sparktech-Hackathon-Textract

About

Releases

Packages

Contributors 3

Languages

License

mihaighidoveanu/Sparktech-Hackathon-Textract

Folders and files

Latest commit

History

Repository files navigation

Sparktech-Hackathon-Textract

About

Resources

License

Stars

Watchers

Forks

Languages