In [None]:
import subprocess

# Research

[Exiftool](https://exiftool.org/geotag.html) already has some sophisicated functionality
for geotagging photos. Notes

- Supports some GPS log formats "out of the box", including GPX, NMEA, and KML, but it's also
  possible to define a custom CSV file (with specific header names). See exiftool [CSV
  format](exiftool.org/geotag.html#CSVFormat). "Required columns are GPSDateTime (or
  GPSDateStamp and GPSTimeStamp), GPSLatitude and GPSLongitude. All other columns are
  optional, and unrecognized columns are ignored."
- The geotagging is based on having images that already are correctly timestamped (in
  EXIF metadata). Positions for a given image time are then linearly interpolated from
  the time-posision pairs in the CSV file. In my case (extracting images from video) I
  may have to manually write the time to the EXIF metadata first. There are many
  time-related tags. "When extracting from file, timestamps are taken from the first
  available of the following tags: Image timestamp: SubSecDateTimeOriginal,
  SubSecCreateDate, SubSecModifyDate, DateTimeOriginal, CreateDate, ModifyDate,
  FileModifyDate"
- Setting e.g. SubSecDateTimeOriginal is probably best achieved with
  pyexiftool.PyExifToolHelper.set_tags() (see
  [docs](https://sylikc.github.io/pyexiftool/examples.html#setting-tags))

# Review of code for extracting images from video

Considering building a separate python package for extraction of images from video
transects. The code developed so far is based on some different data sources:

Image data:
- Video from GoPro
- "Time lapse" images from GoPro

GPS tracks:
- Trimble Catalyst + GPS logger -> CSV (time in local time?), GPX, NMEA(?)
- Otter -> NMEA (original), CSV (processed)
- "Skippo" GPS track from mobile phone (Smøla) (?)

The general workflow is as follows:
- List / search for GPS track CSV file
- List / search for video files
- Use get_video_data() to get video data organized as Pandas dataframe. The function accepts time zone information.
- Use xxx_csv_to_geodataframe() to get all CSV positions (where "xxx" represents
  variations; "otter", "track", "yx") and return data as geodataframe. This can be very
  densely sampled (tens of samples per second for Otter) or less densely sampled.
- Reduce the number of data points by spatial filtering using filter_gdf_on_distance()
- Cryptically "prepare_gdf_with_video_data", doing many things that probably could be
  split into multiple functions: Excluding points outside video window. Adding video
  file and relative time within video for each point. 
- Finally, using extract_images_from_video() to both extract images from video and save
  the corresponding position in a GeoPackage file. **This function could be changed to
  extract the image and save the position data as an EXIF tag instead**.

The code works, but "smells" a bit (according to my new preferences, at least). 

THe geodataframe format was chosen because it seemed like a good match with reading a
CSV file with positions and writing a GPKG file with points. However, a workflow where
the dataframe is continuously expanded and modified seems a bit clunky. Or? Does it just
feel like it because I'm not used to it?

# Pros / cons of switching to geotagged photos
Pros:
- Each photo is a self-contained "unit". No need to distribute separate file with
  positions.
- Import of geotagged photos supported by GIS software (but not e.g. Google Earth).

Cons:
- More work!
- Users end up with something similar to the geopackage file anyway - so there's not
  much gained in the end.

# Code refactoring alternatives
- Introduce typing
- Split into smaller functions with single responsibilities
- Using OOP? If there's any reason to.
- Try to generalize data formats
- Consider alterantives to Pandas data structures.