# Camera-Trap-Data-Pipeline

Pre-configured scripts for easy running of the different code blocks. See https://github.com/marco-willi/camera-trap-data-pipeline for more details.

The following codes can be run by selecting a cell (via mouse or arrow keys) and then pressing CTRL+Enter.

In [None]:
# Load important modules
import os
import pandas as pd

## Parameters 

Here we select / define the paramters of the current run: Choose the appropriate cell and execute or modify it. For a new season: create a new cell using the 'Insert' menu at the top.

In [None]:
###################################
# Template
####################################

SITE=''

SEASON=''

PROJECT_ID=''

WORKFLOW_ID=''

WORKFLOW_VERSION_MIN=''

ATTRIBUTION="''"

LICENSE="''"

In [None]:
###################################
# Grumeti
####################################

SITE='GRU'

SEASON='GRU_S1'

PROJECT_ID='5115'

WORKFLOW_ID='4979'

WORKFLOW_VERSION_MIN='275'

ATTRIBUTION="'University of Minnesota Lion Center + Snapshot Safari + Singita Grumeti + Tanzania'"

LICENSE="'Snapshot Safari + Singita Grumeti'"

In [None]:
###################################
# RUA
####################################

SITE='RUA'

SEASON='RUA_S1'

PROJECT_ID='5155'

WORKFLOW_ID='4889'

WORKFLOW_VERSION_MIN='797'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# Mountain Zebra
####################################

SITE='MTZ'

SEASON='MTZ_S1'

PROJECT_ID='5124'

WORKFLOW_ID='8814'

WORKFLOW_VERSION_MIN='247'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# Karoo
####################################

SITE='KAR'

SEASON='KAR_S1'

PROJECT_ID='7679'

WORKFLOW_ID='8789'

WORKFLOW_VERSION_MIN='237.7'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# Karoo TEST
####################################

SITE='KAR_TEST'

SEASON='KAR_S1'

PROJECT_ID='7679'

WORKFLOW_ID='8789'

WORKFLOW_VERSION_MIN='237.7'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# KRU 
####################################

SITE='KRU'

SEASON='KRU_S1'

PROJECT_ID=''

WORKFLOW_ID=''

WORKFLOW_VERSION_MIN=''

ATTRIBUTION=''

LICENSE=''

### Verify Parameters

Lets check the paramters.

In [None]:
print("Selected: site: {} \nseason: {} \nproject_id: {} \nworkflow_id: {} \nworkflow_version_min: {} \nattribution: {} \nlicense: {}".format(
    SITE, SEASON, PROJECT_ID, WORKFLOW_ID, WORKFLOW_VERSION_MIN, ATTRIBUTION, LICENSE))

## Pre-Processing New Seasons

The following scripts pre-process new season data. From checking the input to a cleaned season inventory file.

### Check Input Structure

The following script checks the input / directory structure of a new season. The files should be organized according to:
/home/packerc/shared/albums/{SITE}/{SEASON}/camera/roll/image1.JPG

In [None]:
# Check Input Structure
!python3 -m pre_processing.check_input_structure \
--root_dir /home/packerc/shared/albums/{SITE}/{SEASON}/ \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_check_input_structure

### Check for Duplicate Images

The following script checks for duplicates and prints them to the terminal and a log file if any are found but does not delete or alter anything. If duplicates are found they have to be removed manually.

In [None]:
# check for duplicates - this can take a while for large batches >> 100k images
!python3 -m pre_processing.check_for_duplicates \
--root_dir /home/packerc/shared/albums/{SITE}/{SEASON}/ \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_check_for_duplicates

### Create Image Inventory

Create a csv file with all images and some basic information.

In [None]:
# Create Image Inventory
!python3 -m pre_processing.create_image_inventory \
--root_dir /home/packerc/shared/albums/{SITE}/{SEASON}/ \
--output_csv /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory_basic.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_create_image_inventory

### Basic Inventory Checks

The following script performs basic checks of each image. It checks for all_black/ all_white images and for corruption. This code runs relatively long. For large seasons >> 100k images it is better to use the 'qsub' script.

In [None]:
# Perform basic Checks
!python3 -m pre_processing.basic_inventory_checks \
--inventory /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory_basic.csv \
--output_csv /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory.csv \
--n_processes 16 \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_basic_inventory_checks

In [None]:
# Perform basic Checks (QSUB Version for larger seasons)
!qsub -v SITE={SITE},SEASON={SEASON} basic_inventory_checks.pbs

If we use the qsub version of the script we need to wait until it has completed. We can check the status of the script using the following command. 'Q' indicates the job is in the qeue and has not yet executed. 'C' means the job has finished. 'R' means the job is running.

In [None]:
!qstat

Once the job is running we can check the status of the log file using this command:

In [None]:
!tail /home/packerc/shared/season_captures/{SITE}/log_files/{SEASON}_basic_inventory_checks.log

### Extract EXIF Data

The following script extracts EXIF (meta-data) from each image using a special, external program (exiftool).

In [None]:
# Extract EXIF data
!python3 -m pre_processing.extract_exif_data \
--inventory /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory.csv \
--update_inventory \
--output_csv /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_exif_data.csv \
--exiftool_path /home/packerc/shared/programs/Image-ExifTool-11.31/exiftool \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_extract_exif_data

### Group Images into Captures

The following script groups the images into capture events based on their timestamps. The configuration file (config/cfg.yaml) defines how many seconds images can be apart to be considered part of the same capture, as well as the max number of images per capture.

In [None]:
# Group Images into Captures
!python3 -m pre_processing.group_inventory_into_captures \
--inventory /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory.csv \
--output_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--no_older_than_year 2017 \
--no_newer_than_year 2019 \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_group_inventory_into_captures

### Rename the Images

The following script renames the images according to SEASON_SITE_ROLL_IMAG0001.JPG.

In [None]:
# Rename all images
!python3 -m pre_processing.rename_images \
--inventory /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_rename_images

### Generate Action List

The following script generates an 'action list', a csv file, which contains recommended actions to take for each image based on potential issues detected by previous codes. It also allows for adding more actions. See https://github.com/marco-willi/camera-trap-data-pipeline/blob/master/docs/pre_processing.md for more details.

In [None]:
# Generate Action List
!python3 -m pre_processing.create_action_list \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--action_list_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_action_list.csv \
--plot_timelines \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_create_action_list

### Generate Actions

After creating and updating the action list the following script generates 'action items' by unpacking the 'action list'. The script generates a csv with one line per action which can be checked for correctness.

In [None]:
# Generate Actions
!python3 -m pre_processing.generate_actions \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--action_list /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_action_list.csv \
--actions_to_perform_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_actions_to_perform.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_generate_actions

### Apply Actions

The following script applies the actions as defined in the action list and changes the captures.csv.

In [None]:
# Apply Actions
!python3 -m pre_processing.apply_actions \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--actions_to_perform /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_actions_to_perform.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_apply_actions

### Update Captures

The following script generates an updated captures file and removes certain records (for example deleted images).

In [None]:
# Update Captures
!python3 -m pre_processing.update_captures \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--captures_updated /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures_updated.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_update_captures

In [None]:
# Create Cleaned File
cp /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures_updated.csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_captures_cleaned.csv

In [None]:
# (optional) Start over if issues persist
!python3 -m pre_processing.create_action_list \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures_updated.csv \
--action_list_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_action_list2.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--log_filename {SEASON}_create_action_list

## Upload Data to Zooniverse

The following codes can be used to upload data to Zooniverse.

In [None]:
!python3 -m zooniverse_uploads.generate_manifest \
--captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--output_manifest_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--images_root_path /home/packerc/shared/albums/{SITE}/ \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--manifest_id {SEASON} \
--attribution {ATTRIBUTION} \
--license {LICENSE}

In [None]:
# (Optional) - Generate machine learning file
!python3 -m zooniverse_uploads.create_machine_learning_file \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json

In [None]:
# Generating Predictions -- Need to Run in a separate Terminal

In [None]:
# Add model predictions to Manifest
!python3 -m zooniverse_uploads.add_predictions_to_manifest \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json

In [None]:
# Split / Batch Manifest
!python3 -m zooniverse_uploads.split_manifest_into_batches \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--max_batch_size 20000

In [None]:
# Upload Manifest
!python3 -m zooniverse_uploads.upload_manifest \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--project_id {PROJECT_ID} \
--password_file ~/keys/passwords.ini \
--image_root_path /home/packerc/shared/albums/{SITE}/

In [None]:
# We need to specify the subject_set_id in case of a failure
SUBJECT_SET_ID=''

In [None]:
# In case of an error when uploading the Manifest
!python3 -m zooniverse_uploads.upload_manifest \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--project_id {PROJECT_ID} \
--subject_set_id {SUBJECT_SET_ID} \
--image_root_path /home/packerc/shared/albums/{SITE}/ \
--password_file ~/keys/passwords.ini

## Download Zooniverse Data

We can download Zooniverse data through Python. However, we first need to request the "Classification", as well as the "Subject" export on Zooniverse and wait for the mails indicating the exports are ready.

In [None]:
# Download Classifications
!python3 -m zooniverse_exports.get_zooniverse_export \
--password_file ~/keys/passwords.ini \
--project_id {PROJECT_ID} \
--output_file /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_classifications.csv \
--export_type classifications \
--log_dir /home/packerc/shared/zooniverse/Exports/{SITE}/

In [None]:
# Get Zooniverse Subject Data
!python3 -m zooniverse_exports.get_zooniverse_export \
--password_file ~/keys/passwords.ini \
--project_id {PROJECT_ID} \
--output_file /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects.csv \
--export_type subjects \
--log_dir /home/packerc/shared/zooniverse/Exports/{SITE}/

In [None]:
# (Optional) Get Worfklow ID / Worfklow Version
!python3 -m zooniverse_exports.extract_annotations \
--classification_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_classifications.csv \
--output_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_annotations.csv

In [None]:
# Extract Annotations
!python3 -m zooniverse_exports.extract_annotations \
--classification_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_classifications.csv \
--output_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_annotations.csv \
--workflow_id {WORKFLOW_ID} \
--workflow_version_min {WORKFLOW_VERSION_MIN}

In [None]:
# Extract Subject Data
!python3 -m zooniverse_exports.extract_subjects \
--subject_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects.csv \
--output_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects_extracted.csv

## Aggregate Annotations

The following codes are for aggregating annotations using the plurality algorithm.

In [None]:

!python3 -m aggregations.aggregate_annotations_plurality \
--annotations /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_annotations.csv \
--output_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality.csv \
--log_dir /home/packerc/shared/zooniverse/Aggregations/{SITE}/

In [None]:
# Now we add subject infos to the aggregations
!python3 -m zooniverse_exports.merge_csvs \
--base_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality.csv \
--to_add_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects_extracted.csv \
--output_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_aggregated_plurality.csv \
--key subject_id

## Create Reports

The following codes create several reports by joining subject / season data to the aggregated annotations.

In [None]:
# Create report
!python3 -m reporting.create_zooniverse_report \
--season_captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--aggregated_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_aggregated_plurality.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all.csv \
--default_season_id {SEASON} \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

# Create statistics file
!python3 -m reporting.create_report_stats \
--report_path /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all_stats.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

In [None]:
# Create species consensus only report
!python3 -m reporting.create_zooniverse_report \
--season_captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--aggregated_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_aggregated_plurality.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/ \
--default_season_id {SEASON} \
--exclude_blanks \
--exclude_humans \
--exclude_non_consensus \
--exclude_captures_without_data


# Create statistics file
!python3 -m reporting.create_report_stats \
--report_path /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species_stats.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/


# Create a small sample report
!python3 -m reporting.sample_report \
--report_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species_samples.csv \
--sample_size 300 \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

In [None]:
# Create species consensus only report
!python3 -m reporting.create_zooniverse_report \
--season_captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--aggregated_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_aggregated_plurality.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report.csv \
--default_season_id {SEASON} \
--exclude_blanks \
--exclude_humans \
--exclude_captures_without_data


# Create statistics file
!python3 -m reporting.create_report_stats \
--report_path /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_stats.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/


# Create a small sample report
!python3 -m reporting.sample_report \
--report_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_samples.csv \
--sample_size 300 \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/