# Fingerprint Left or Right Hand Prediction

## About the data
Sokoto Coventry Fingerprint Dataset (SOCOFing) is a biometric fingerprint database designed for academic research purposes. SOCOFing is made up of 6,000 fingerprint images from 600 African subjects and contains unique attributes such as labels for gender, hand and finger name as well as synthetically altered versions with three different levels of alteration for obliteration, central rotation, and z-cut. For a complete formal description and usage policy please refer to the following paper: https://arxiv.org/abs/1807.10609.

## About the notebook
The intention of this notebook is to demonstrate steps from data ingestion to model saving that provides an accurate enough model that predicts if a fingerprint comes from a left or right hand. Coupled with other models that accurately predict finger and gender is valuable when matching against other identifiable information.

1. *Data Ingestion* [from object storage](#working-with-s3-buckets)
1. *Dataset preparation* (infer labels, splitting, augmenting, optimizing)
1. *Model Development* from scratch and *Training Strategies* (one device, mirrored, multi-worker mirrored)
1. *Model Performance* Hyperparameter Tuning strategies (RandomSearch, Hyperband, BayesianOptimization, Sklearn)
1. *Model Serialization* to object storage
1. *Prediction Sampling*

### Notebook Tested Requirements

|Notebook origin|Notebook Customization|Instance Type|Kernel|TensorFlow|Runtime|
|:-------|:-------|:-------|:-------|:-------|:-------|
|SageMaker Notebook Instances|[from GitHub](https://github.com/redhat-na-ssa/demo-rosa-sagemaker/blob/main/sagemaker/lifecycle-from-github.sh)|ml.m5.4xlarge (vCPU: 16, RAM: 64 GiB)|conda_tensorflow2_p310|2.11|~120 minutes|
|SageMaker Notebook Instances|[from GitHub](https://github.com/redhat-na-ssa/demo-rosa-sagemaker/blob/main/sagemaker/lifecycle-from-github.sh)|ml.p3.8xlarge (vCPU: 32, RAM: 244 GiB)|conda_tensorflow2_p310|2.11|~15 minutes|

# Setup

If this is your first time running the notebook, you may need to restart the kernel after the Tensorflow upgrade

In [1]:
# source the setup Bash script to run specific configuration tasks
! source ../setup.sh && setup_dataset && install_requirements


You can run individual functions!

example:
  setup_demo

Pulling dataset from https://github.com/redhat-na-ssa/datasci-fingerprint-data.git...
exists
tar: Ignoring unknown extended header keyword 'SCHILY.fflags'


In [2]:
# Install packages and frameworks

# uncomment below if using a notebook with a sagemaker notebook instance lifecycle config
#! pip install -U pip --quiet
#! pip install -r ../requirements.txt --quiet

import tensorflow as tf
import os

# debugging code "Cleanup Called..." gets displayed if get_logger is not set
# the below code suppresses the "Cleanup Called..." output
tf.get_logger().setLevel('INFO')

# expecting 2.11
# if 2.7, than logging errors will show "Cleanup called..."
print(tf.__version__)

2023-05-23 15:41:17.333121: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-23 15:41:17.594857: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-05-23 15:41:20.574571: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-05-23 15:41:20.574656: W tensorflow/

2.11.1


In [3]:
# scratch directory is apart of the .gitignore to ensure it is not committed to git
%env SCRATCH=../scratch
! [ -e "${SCRATCH}" ] || mkdir -p "${SCRATCH}"

scratch_path = os.environ.get('SCRATCH', './scratch')

env: SCRATCH=../scratch


In [8]:
! mkdir -p "${SCRATCH}"/{real,tune,train,tf_datasets,train_lr/{left,right}}

# Data Preparation

## Decompress the data for training

Let's check for an existing S3 Bucket for training data.
If it's not in S3, we will try other options...

In [5]:
# check for existing s3 bucket
! echo S3_BUCKET_DATA=$(aws s3 ls 2>/dev/null | cut -c21- | grep sagemaker-fingerprint-data) > .env

# kludge: loadenv from .env
from dotenv import load_dotenv
load_dotenv()

# if exists, download the objects from s3
! [ ! -z "$S3_BUCKET_DATA" ] && \
  aws s3 sync s3://${S3_BUCKET_DATA}/train/left $SCRATCH/train/left --quiet && \
  aws s3 sync s3://${S3_BUCKET_DATA}/train/right $SCRATCH/train/right --quiet && \
  aws s3 sync s3://${S3_BUCKET_DATA}/real $SCRATCH/real --quiet

In [6]:
# kludge: download dataset from git
! git clone https://github.com/redhat-na-ssa/demo-datasci-fingerprint-data.git ${SCRATCH}/.raw

! [ ! -d "${SCRATCH}"/train/left ] && \
  tar -Jxf ${SCRATCH}/.raw/left.tar.xz -C "${SCRATCH}"/train/ && \
  tar -Jxf ${SCRATCH}/.raw/right.tar.xz -C "${SCRATCH}"/train/ && \
  tar -Jxf ${SCRATCH}/.raw/real.tar.xz -C "${SCRATCH}"

fatal: destination path '../scratch/.raw' already exists and is not an empty directory.
