# Installing the packages
The gapfinder package has a number of other python projects that it depends on. We need to install those before we can use the software. I'm assuming you are running this in vscode, vecause thats how I set it up. If not, I'm sure you can figure things out. 

Run the following code block. A task bar may pop up asking which python environment you want to use. If there is an option for "venv", selected that. Then run the following code block. It may take up to two minutes. 

If it also tells you that vscode needs to install vscode extensions. Do it!

In [1]:
!python.exe -m pip install --upgrade pip
!python.exe -m pip install -r requirements.txt
!python.exe -m pip install -e .

Obtaining file:///D:/repos/gapfinder_images
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: gapfinder
  Building editable for gapfinder (pyproject.toml): started
  Building editable for gapfinder (pyproject.toml): finished with status 'done'
  Created wheel for gapfinder: filename=gapfinder-0.1-0.editable-py3-none-any.whl size=2686 sha256=ce6f35bbddfc468e75a532d0bbf098671ec02a8cb5a6bed8d4414694a420b181
  Stored in directory: C:\Users\magnus.wood\AppData\Local\Temp\pip-ephem-wheel-cache-jhla0rq8\wheels\a9\66

# Testing out the installation

Now that we have these things installed, we need to make sure they work. Run the following code blocks, which will be tests of the code functionality. We want it to break here before we get too far!

In [2]:
# create the bare minimum directory structure
import os
import glob
import shutil

# Define directories
images_dir = "images"
output_dir = "output"
metadata_dir = "metadata"

# Create directories
os.makedirs(images_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)
os.makedirs(metadata_dir, exist_ok=True)

# test that they exist
assert os.path.exists(images_dir)
assert os.path.exists(output_dir)
assert os.path.exists(metadata_dir)


In [3]:
import os
import glob

# make sure that we have tif files somewhere in the images directory
# Like, say: images\tif_images\Wild Type - Dark Adapted\2022-1-10\Block 1\12 WT Dark 29k.tif
tif_files = glob.glob(os.path.join("images", "**", "*.tif"), recursive=True)

assert len(tif_files) > 0, "No tif files found in images directory, something is wrong"

# Tesseract

This package depends on tesseract, an OCR tool to get text from an image. It uses the pytesseract library to interact with this system program. If the tesseract program is not installed on this computer, or if it has not been added to your system PATH variable, pytesseract will not be able to find it and use it. 

The error will look like this:
TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

To fix this, you'll need to:
   - download and build it from source, or
   - install it from the packaged installer here: [UB-Mannheim Github page](https://github.com/UB-Mannheim/tesseract/wiki)

To add a program to the PATH on a Windows machine, follow these steps:

Find the Path to the Program: Locate the directory where the program's executable file is located. It should be "C:\Program Files\Tesseract-OCR" on the lab machine. 

Open Environment Variables:

Press Win + X and select System.
Click on Advanced system settings.
In the System Properties window, click on the Environment Variables button.

Edit the PATH Variable:

In the Environment Variables window, find the Path variable in the System variables section and select it.
Click on Edit.

Add the New Path:

In the Edit Environment Variable window, click on New.
Add the path (eg "C:\Program Files\Tesseract-OCR") to the directory where the program's executable is located.
Click OK to close all windows.

Verify:

Open a new Command Prompt window. The directory doesn't matter. 
Type in "tesseract --version" and hit enter. 
You should see a printout of the program version information. 



In [4]:
# test out pytesseract
import pytesseract
import PIL

# if it still can't find it, you can directly specify it when you need to run pytesseract.
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# open a test png file: gapfinder\images\ocrtest.png
test_image = PIL.Image.open("gapfinder/images/ocrtest.png")
text_ocr = pytesseract.image_to_string(test_image)

# strip any non-alphanumeric characters
text_ocr = ''.join(e for e in text_ocr if e.isalnum())

# test that the OCR worked
assert text_ocr == "Test", f"OCR did not work, got: {text_ocr}"
