# Environment Setup

If you do not have these modules installed, you will need to install them at the command line using a BASH shell, Terminal, or the Anaconda Command Prompt.

- `tesseract`
- `pytesseract`
- `opencv`
- `pillow`
- `pdf2image`

## tesseract

"Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages."

You will need to install Tesseract through the command line or by downloading the executable file.
- INSTALL: ["Introduction," Tesseract documentation](https://tesseract-ocr.github.io/tessdoc/Installation.html)

NOTE: Windows users may need to go through some additional steps with security permissions and environment variables to be able to use `tesseract`.
- Bharath Sivakumar, "[Installing and Using Tesseract 4 on Windows 10](https://medium.com/quantrium-tech/installing-and-using-tesseract-4-on-windows-10-4f7930313f82)" *Quantrium* (8 July 2020)

## pytesseract

"Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.

`python-tesseract` is a wrapper for [Google’s Tesseract-OCR Engine](https://github.com/tesseract-ocr/tesseract). It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file."

You can install `pytesseract` using `pip` or `conda` install methods.
- INSTALL: [`pytesseract` documentation, PyPi](https://pypi.org/project/pytesseract/)

## opencv

"OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products."

You can install the Python wraper for OpenCV (`opencv-python`) using `pip` or `conda` install methods.
- INSTALL: [`opencv-python` documentation, PyPi](https://pypi.org/project/opencv-python/)

## Pillow

"Pillow is the friendly PIL fork by Alex Clark and Contributors. PIL is the Python Imaging Library by Fredrik Lundh and Contributors.

The Python Imaging Library adds image processing capabilities to your Python interpreter.

This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities."

You can install the Pillow PIL fork using the `pip` install method.
- INSTALL: ["Installation," Pillow documentation](https://pillow.readthedocs.io/en/latest/installation.html)

## pdf2image

"A python (3.6+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object"

NOTE: Windows users will have to download and install `poppler` separately from `pdf2image`.

You can install the `pdf2image` using the `pip` install method.

Installation documentation:
- [`pdf2image`, PyPi](https://pypi.org/project/pdf2image/)
- [`pdf2image`, GitHub](https://github.com/Belval/pdf2image)

### poppler troubleshooting for Windows users

[`python-poppler`, PyPi](https://pypi.org/project/python-poppler/)

Matthew Earl Miller, "[Poppler On Windows](https://towardsdatascience.com/poppler-on-windows-179af0e50150)" *Towards Data Science* (9 January 2020)

## Putting It All Together

To recap:
1. Install tesseract
2. Install Python packages
3. IF NEEDED: Survive Windows troubleshooting *adventures*...

# Load Modules

In [None]:
# import modules
import pytesseract
from PIL import Image
import sys
from pdf2image import convert_from_path
import cv2 as opencv
import os
import io

# Things Windows Users Might Have to Do...

In [None]:
# assign tesseract to path
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

In [None]:
# assign poppler path
poppler_test = r'C:\Users\katie\Downloads\poppler-0.68.0_x86\poppler-0.68.0\bin'

# Load Single PDF and Convert to Images

The following code shows a workflow for converting a single PDF into a series of PNG images to prepare for OCR.

In [None]:
# file path for single PDF
pdf = r'Football-1990s.pdf'

# save all PDF pages to variable; Mac/Linux users may not need poppler_path parameter
pages = convert_from_path(pdf, 500, poppler_path = poppler_test)

# for loop that saves each page of PDF as image file
for i, image in enumerate(pages):
    fname = 'image' +str(i)+'.png'
    image.save(fname, "PNG")

# Run OCR on Single Image

The following code shows a workflow for running OCR on a single image using a combination of Tesseract and OpenCV.

In [None]:
# load single image using opencv
img = opencv.imread(r'image1.png')

# extract text from single image using pytesseract
text = pytesseract.image_to_string(img)

# print text from single image
print(text)

In [None]:
# clean text using regular expressions
text = text.replace('-\n', '')

# print cleaned text
print(text)

# Write OCR Output to `txt` File

The following code shows a workflow for writing the OCR output for a single image to a `txt` file.

In [None]:
# create blank output text file
output_file = "sample_output_file.txt"

# open output file in append mode
f = open(output_file, 'a')

# append text variable contents to txt file
f.write(text)

# close txt file
f.close()

# Looping Through Multiple Images

The following code shows a workflow for appending the output from multiple images to a single `txt` file. This workflow can be used for multi-page PDFs that have multiple single images but for the purposes of future text analysis need to be a single `txt` file.

Test workflow with list

In [None]:
# create empty list
test_list = []

# for loop that iterates over png files and appends ocr output to list
for filename in os.listdir():
    if filename.endswith(".png"):
        img = opencv.imread(filename)
        text = pytesseract.image_to_string(img)
        text = text.replace('-\n', '')
        test_list.append(text)

# show list
print(test_list)   

Test workflow with txt file

In [None]:
# create output text file
sample = "sample.txt"

# open in append mode
f = open(sample, 'a')

# for loop that iterates over png files and appends ocr output to file
for filename in os.listdir():
    if filename.endswith(".png"):
        img = Image.open(filename)
        text = pytesseract.image_to_string(img)
        text = text.replace('-\n', '')
        f.write(text)

# close file
f.close()  

# Looping Through Multiple PDFs

The following code shows a workflow for converting multiple PDFs to image files, using a for loop and subdirectories, running OCR on the images, appending the OCR output from multiple images to a single `txt` file for each original PDF.

This workflow can be used for multi-page PDFs that have multiple single images but for the purposes of future text analysis need to be a single `txt` file.

## For loops that create subdirectories and convert PDF to images in each folder.

In [None]:
# get list of PDF files for creating subdirectories

# import modules
import os
from os import walk

# set path
path = r"C:\Users\katie\jupyter-notebooks\archives\scholastic_football_review"

# create empty list
file_names = []

# for loop that gets file name and appends to list
for x in os.listdir(path):
    if x.endswith(".pdf"):
        file_names.append(os.path.splitext(x)[0])

# show list of file names
file_names

In [None]:
# create empty list for subdirectory paths
subdirectory_list = []

# create subdirectories
for file in file_names:
    directory = file
    parent_dir = r"C:\Users\katie\jupyter-notebooks\archives\scholastic_football_review"
    mode = 0o666
    path = os.path.join(parent_dir, directory)
    subdirectory_list.append(str(path))
    try:
        os.makedirs(path, mode)
    except:
        continue

# show list of subdirectories
print(subdirectory_list)

In [None]:
# for loop that converts PDF to image files and saves images to respective subdirectory
for file in subdirectory_list:
    pdf = file + ".pdf"
    pages = convert_from_path(pdf, 500, poppler_path = poppler_test)
    for i, image in enumerate(pages):
        fname = "\image" + str(i) + '.png'
        fpath = file + fname
        image.save(fpath, "PNG")

## Testing on single subdirectory: worklow that runs OCR on subdirectory contents and appends output to txt file in parent directory

In [None]:
# select first subdirectory
test = subdirectory_list[0]

# set subdirectory as path
path = os.path.normpath(test)

In [None]:
# test that os.listdir() is working correctly
for filename in os.listdir(path):
    print(filename)

In [None]:
# test using empty list and pytesseract image method

# empty list
test_list = []

# for loop that takes single subdirectory images and appends OCR output to list
for filename in os.listdir(path):
    if filename.endswith(".png"):
        text = str(((pytesseract.image_to_string(Image.open(filename)))))
        text = text.replace('-\n', '')
        test_list.append(text)

# show list
print(test_list)

In [None]:
# test using empty list and opencv imread method

# empty list
test_list = []

# for loop that take single subdirectory images and appends OCR output to list
for filename in os.listdir(path):
    if filename.endswith(".png"):
        img = opencv.imread(filename)
        text = pytesseract.image_to_string(img)
        text = text.replace('-\n', '')
        test_list.append(text)

# show list
print(test_list)

In [None]:
# test with txt file and pytesseract image method

football = "pyt_test.txt"

f = open(football, 'a')

# for loop that takes single subdirectory images and appends OCR output to list
for filename in os.listdir(path):
    if filename.endswith(".png"):
        text = str(((pytesseract.image_to_string(Image.open(filename)))))
        text = text.replace('-\n', '')
        f.write(text)

# close file
f.close()

In [None]:
# test with txt file and opencv imread method

football = "opencv_test.txt"

f = open(football, 'a')

# for loop that takes single subdirectory images and appends OCR output to list
for filename in os.listdir(path):
    if filename.endswith(".png"):
        img = opencv.imread(filename)
        text = pytesseract.image_to_string(img)
        text = text.replace('-\n', '')
        f.write(text)

# close file
f.close()

## Iteration: Run OCR on contents of each subdirectory and append to txt file in parent directory

We can start this workflow by creating a dictionary from the `file_names` and `subdirectory_list` lists.

The key-value pairs in this dictionary connect the file name (minus extension, i.e. `Football-1901s`) for each PDF with the subdirectory path associated with that file name (i.e. `C:\Users\katie\jupyter-notebooks\archives\scholastic_football_review\Football-1901s`).

This lets us create `.txt` files with the same file name as the PDF and also access the images in each subdirectory.

In [None]:
# creat dictionary from file names and directories

scholastic_dict = {file_names[i]: subdirectory_list[i] for i in range(len(file_names))}

scholastic_dict

In [None]:
# using pytesseract image method
path = r"C:\Users\katie\jupyter-notebooks\archives\scholastic_football_review\\"

for key, value in scholastic_dict.items():
    if key == "Football-1901s":
        football = key + ".txt"
        f = open(football, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1902s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1902s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1903s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1904s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1905s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1906s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1907s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1908s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1909s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1910":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1910c":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1910s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1911s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1912s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1913s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1914s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1915s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1916s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1917s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1918s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1919":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1919s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1920":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1920s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1921":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1921s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1922s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1924":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1925":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1926":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1927":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1928":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1929":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1930":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1930s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1931":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1906s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1932":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1935s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1939s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1940s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1941":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1942s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1943s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1944s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1945s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1946s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1947s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1948s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1949s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1950s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1951s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1952s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1953s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1954s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1955s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1956s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1957s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1958s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1959s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1960s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1961s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1962s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1963s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1964s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1965s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1966s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1967s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1968s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1969s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1970s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1971s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1972s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1973s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1974s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1975s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1976s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1977s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1978s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1979s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1980":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1981":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1982":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1983":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1983s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1984":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1984s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1985":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1985s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1986":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1986s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1987s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1988":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1988s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1989s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1990s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1991s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1992s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1993s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1994s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1995s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1996s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1997s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1998s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1999s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2000s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2001s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2002s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2003s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2004s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2005s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2006s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2007s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2008s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2010s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()   

In [None]:
# using opencv imread method
path = r"C:\Users\katie\jupyter-notebooks\archives\scholastic_football_review\\"

for key, value in scholastic_dict.items():
    if key == "Football-1901s":
        football = key + ".txt"
        f = open(football, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1902s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1902s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1903s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1904s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1905s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1906s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1907s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1908s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1909s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1910":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1910c":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1910s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1911s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1912s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1913s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1914s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1915s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1916s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1917s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1918s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1919":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1919s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1920":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1920s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1921":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1921s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1922s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1924":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1925":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1926":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1927":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1928":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1929":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1930":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1930s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1931":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1906s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1932":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1935s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1939s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1940s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1941":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1942s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1943s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1944s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1945s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1946s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1947s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1948s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1949s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1950s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1951s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1952s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1953s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1954s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1955s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1956s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1957s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1958s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1959s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1960s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1961s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1962s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1963s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1964s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1965s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1966s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1967s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1968s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1969s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1970s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1971s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1972s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1973s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1974s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1975s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1976s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1977s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1978s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1979s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1980":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1981":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1982":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1983":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1983s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1984":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1984s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1985":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1985s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1986":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1986s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1987s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1988":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1988s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1989s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1990s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1991s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1992s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1993s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1994s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1995s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1996s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1997s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1998s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-1999s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2000s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2001s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2002s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2003s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2004s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2005s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2006s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2007s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2008s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == "Football-2010s":
        test_list = []
        football = key + ".txt"
        f = open(football, 'a')
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            img = opencv.imread(file)
            text = pytesseract.image_to_string(img)
            text = text.replace('-\n', '')
            f.write(text)
        f.close()   

An alternate approach for the `if` and `elif` statements would be to use the index values for the `file_names` list.

For example, rather than `if key == "Football-1901s"` and `elif key == "Football-1902s"`

You could use `if key == file_names[0]` and `elif key == file_names[1]`

# Next Steps

This sample code that outlines an OCR workflow for issues of *The Scholastic* football review, downloaded with the file naming conventions outlined in the provided sample Python scripts. The same code assumes all review PDFs are in a single directory, creates sub-directories (or sub-folders) for each review, converts the PDFs in the main directory to individual image files in each sub-directory, then runs OCR on those images to create `.txt` files for each review PDF in the main directory.

You will need to modify this workflow for the specific publication or source materials you are working with.

For example, let's say you want to run this workflow on *The Observer* newspaper issues, published weekly during the academic year.

NOTE: The sample code below shows a modified workflow that jumps straight into iteration. You could modify the sample code from earlier in the notebook to test on a single issue/PDF.

## For loops that create subdirectories and convert PDF to images in each folder

In [None]:
# get list of PDF files for creating subdirectories

# set path
path = r"C:\Users\katie\jupyter-notebooks\archives\observer"

# create empty list
observer_file_names = []

# for loop that gets file name and appends to list
for x in os.listdir(path):
    if x.endswith(".pdf"):
        observer_file_names.append(os.path.splitext(x)[0])

# show list of file names
observer_file_names

In [None]:
# craete empty list for subdirecotry paths
observer_subdirectory_list = []

# create subdirectories
for file in file_names:
    directory = file
    parent_dir = r"C:\Users\katie\jupyter-notebooks\archives\observer"
    mode = 0o666
    path = os.path.join(parent_dir, directory)
    observer_subdirectory_list.append(str(path))
    try:
        os.makedirs(path, mode)
    except:
        continue
        
# show list of subdirectories
observer_subdirectory_list

In [None]:
# for loop that converts PDF to image files and saves images to respective subdirectory
for file in observer_subdirectory_list:
    pdf = file + ".pdf"
    pages = convert_from_path(pdf, 500, poppler_path = poppler_test)
    for i, image in enumerate(pages):
        fname = "\image" + str(i) + '.png'
        fpath = file + fname
        image.save(fpath, "PNG")

This workflow takes the image files from each issue and writes/appends the OCR output to a single `.txt` file for that issue.

You would need to modify this code to get a different output structure (i.e. a single `.txt` file for each academic year volume, or a single `.txt` file for a period of years or decade, etc).

In [None]:
# creat dictionary from file names and directories

observer_dict = {observer_file_names[i]: observer_subdirectory_list[i] for i in range(len(file_names))}

observer_dict

In [None]:
# using pytesseract image method
path = r"C:\Users\katie\jupyter-notebooks\archives\observer\\"

for key, value in observer_dict.items():
    if key == observer_file_names[0]:
        observer = path + key + ".txt"
        f = open(observer, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == observer_file_names[1]:
        observer = path + key + ".txt"
        f = open(observer, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == observer_file_names[2]:
        observer = path + key + ".txt"
        f = open(observer, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == observer_file_names[3]:
        observer = path + key + ".txt"
        f = open(observer, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == observer_file_names[4]:
        observer = path + key + ".txt"
        f = open(observer, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
    elif key == observer_file_names[5]:
        observer = path + key + ".txt"
        f = open(observer, 'a')
        test_list = []
        for file in os.listdir(value):
            test_list.append(os.path.normpath(os.path.join(value, file)))
        for file in test_list:
            text = str(((pytesseract.image_to_string(Image.open(file)))))
            text = text.replace('-\n', '')
            f.write(text)
        f.close()
        # etc with subsequent elif statements and file_names list values

## OCR Next Steps and Additional Resources

There are a number of options and next steps for refining or further building out an OCR workflow.

**More documentation on tools covered in this notebook**:
- Tesseract
  * ["Tesseract documentation"](https://tesseract-ocr.github.io/tessdoc/)
    * ["Improving the quality of the output"](https://tesseract-ocr.github.io/tessdoc/ImproveQuality)
  * University of Illinois Scholarly Commons, "[Introduction to OCR and Searchable PDFs: Tesseract](https://guides.library.illinois.edu/c.php?g=347520&p=4116757)" *University of Illinois Library*
  * NYU Libraries Scholarly Communications and Information Policy Department, "[Tesseract OCR Software Tutorial](https://guides.nyu.edu/tesseract/home)" *New York University Libraries*
     * The NYU guide covers how to convert PDF or PNG files to TIFF high-resolution images, improving image quality using command-line tool ImageMagick, and optimizing the command-line version of Tesseract
   * Andrew Akhlaghi, "OCR and Machine Translation," The Programming Historian 10 (2021), https://doi.org/10.46430/phen0091.
     * Tutorial that covers ImagMagick to Tesseract workflow.
   * Moritz Mähr, "Working with batches of PDF files," The Programming Historian 9 (2020), https://doi.org/10.46430/phen0088.
     * Tutorial that covers using `poppler` to extract images from PDF and preliminary topic modeling
- OpenCV
  * ["OpenCV documentation"](https://opencv.org/)
    * `OpenCV` has a number of more advanced features and functions, including rotation/deskewing, removing shadows, object detection, etc. OpenCV's [Python Tutorials](https://docs.opencv.org/4.5.3/d6/d00/tutorial_py_root.html) are a good place to start.

**Other tools/packages that can get you started with more advanced work**:
- [Leptonica](http://www.leptonica.org/): variety of resources related to image processing/analysis (affine transformations, binary/grayscale morphology, pixelwise masking/blending, etc)
- [ImageMagick](https://imagemagick.org/index.php): program that uses multi-threaded processing for large-scale image processing workflows (color management, image features, morphology, noise reduction, etc)
- [ScanTailor](https://github.com/4lex4/scantailor-advanced): designed for post-processing scanned pages (before OCR) to improve OCR results (margins, picture zones, dewarping, etc)
- [unpaper](https://github.com/unpaper/unpaper): another tool designed for post-processing scanned material (before OCR) to improve OCR results (dark edges, margins, deskewing, etc)
- [PRLib](https://github.com/leha-bot/PRLib): pre-recognition library (before OCR) designed to improve OCR results (binarization, deskew, noise, etc)

### Acrobat Adobe Pro

But, depending on the number of documents you're working with as well as the quality of OCR output needed for your project, Adobe Acrobat Pro will likely get you where you need to go.
- NOTE: This is a different program than the Adobe Acrobat Reader DC free program you may have on your computer.

Notre Dame students have access to the Adobe Pro software title through the Adobe Creative Cloud Desktop Application, available free through OIT.
- NOTE: The full Adobe Creative Cloud suite is available on any [OIT lab computer](https://nd.service-now.com/kb_view.do?sysparm_article=KB0013524) and can be accessed through the OIT general purpose [Virtual Computer Lab](https://inside.nd.edu/task/all/virtual-computer-lab)

To get Adobe Pro on your own computer:
- Download the program: [OIT, Adobe Creative Cloud Desktop Application](https://oit.nd.edu/services/software/software-downloads/adobe-creative-cloud-desktop-application/)
- Request a license: OIT, "[Request a license for Adobe Creative Cloud](https://nd.service-now.com/kb_view.do?sysparm_article=KB0017960)," *ND Service Now*

Once you have Adobe Pro on your own computer, you can use the expanded features to run an OCR workflow on a single PDF or multiple PDFs. 

Adobe Pro also allows you to export the contents of a single PDF or multiple PDFs to plain-text (`.txt`) files.

To run OCR within Adobe Pro:
- University of Illinois Scholarly Commons, "[Introduction to OCR and Searchable PDFs: Adobe Acrobat Pro](https://guides.library.illinois.edu/c.php?g=347520&p=4116755)" *University of Illinois Library*
- Rowan University Library Digital Scholarship Center, "[Acrobat Pro: Optical Character Recognition for Research, Learning, and Accessibility](https://youtu.be/HxSYtQdaAp0)" *YouTube video* (20 August 2020)
- Pixascene, "[Perform an OCR on a PDF document using Adobe Acrobat Pro](https://youtu.be/zZT34zmc0kw)" *YouTube video* (15 June 2020)
- Tulane University Libraries, "[Digitize Your Sources: OCR and Adobe Acrobat Pro](https://libguides.tulane.edu/diy_digital/acrobat)" *Tulane University Library Guide*

To export a single PDF as TXT from Adobe Pro (using "Export"):
- Adobe Acrobat User Guide, "[Convert or export PDFs to other file formats](https://helpx.adobe.com/acrobat/using/exporting-pdfs-file-formats.html)" *Adobe* (26 August 2021)
 
To export multiple PDFs as TXT files from Adobe Pro (using "Action Wizard"):
1. `View -> Tools -> Action Wizard -> Create New Action`
2. `Choose 'Save & Export' -> Save -> add to right-hand pane`
3. At `right-hand pane -> choose folder` and click `Specify Settings` to change export format to `TXT`
- Source: StackOverflow, "[How to convert batch pdf files to text using Adobe Acrobat Pro?](https://stackoverflow.com/questions/25212228/how-to-convert-batch-i-e-huge-pdf-files-to-text-using-adobe-acrobat-pro)" *StackOverflow* (2015)
- For more on Action Wizard: Adobe Acrobat User Guide, "[Action Wizard (Acrobat Pro)](https://helpx.adobe.com/acrobat/using/action-wizard-acrobat-pro.html#about_action_wizards)" *Adobe* (2 June 2020)