Skip to content

fomightez/tapi_2021_ocr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Images to Text: A Gentle Introduction to Optical Character Recognition with PyTesseract

Binder

Description

A 2021 Text Analysis Pedagogy Institute course.

Instructor: Hannah Jacobs

This course will introduce the concept of “Optical Character Recognition” (OCR), various tools available for performing OCR, and important considerations for successfully OCRing digitized text. Using Tesseract in Python, we’ll walk through the entire process using a variety of examples to show the range of challenges scholars can face when performing OCR. By the end of the course, participants should be able to use the course’s Jupyter Notebooks to perform OCR on their own; they should be able to identify possible technical challenges presented by specific texts and propose potential solutions; and they should be able to assess the degree of accuracy they have achieved in performing OCR.

Land Acknowledgment

These materials were prepared and are presented on the ancestral homelands of the Haliwa-Saponi, Sappony, and Occaneechi Band of the Saponi nations, whose lands are now known as Durham, North Carolina. This acknowledgement reminds us of the significance of place even in a virtual space, and of our ongoing need to build a more inclusive and equitable society.

Learn more about land acknowledgments. Learn about the Occaneechi Band of the Saponi Nation Homeland Preservation project.

Lessons

Binder

License

These materials are licensed under a Creative Commons BY license. You are free to share and adapt the materials for your own teaching so long as credit is given to the creators, the material is labeled with a CC BY License, and you indicate if changes were made.

Citation

Use the following text with specific lessons replacing the bracketed phrases:

This lesson is based on [Lesson name] and [repository link] from the 2021 Text Analysis Pedagogy CC BY, [Instructor-First-Name] [Instructor-Last-Name].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%