# mp_genjpg: Multi-processor ARW to JPG converter
C. W. Wright [Github.com/lidar532](https://github.com/lidar532)

mp_genjpg walks a directory containing 
[raw Sony ARW photo files](https://en.wikipedia.org/wiki/Raw_image_format#ARW), and generates a
Linux bash script which will use all CPU cores to generate jpg or other
photo files.  After gathering the list of raw photos, it divides them by the number
of CPU cores on your computer, and then generates a list for each core of input and output
file names.  A Bash script is generated that will run several copies (one per core) of GraphicsMagick 
configured to convert the raw file to jpg (or other) output format.

The program runs on a windows-10 based jupyter Python Notebook and generates
a bash script suitable for execution on a [Windows-10 WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install-win10). 

## Usage:
1. Configure the `User Inputs` variables to set the jpg quality, directory for mission, images, output, script
1. Run the code cell.
1. Start a WSL Linux terminal in your selected script directory
1. Execute the generated bash script.  Example `bash generate-jpgs.bash`.( _It will generally run for hours._)
1. Once script completes, check to make sure the output directory has the correct file count

## User Inputs:
The user changes the following variables in the program as required.
```css
    jpg_quality = 50
    mask = '*.arw'
    bash_script_fn = 'generate-jpgs.txt'
    mission_dir    = '/mnt/j/data/2020-0208-NC-outer-banks'
    script_dir     = 'j:/data/2020-0208-NC-outer-banks/etc'
    idir           = 'j:/data/2020-0208-NC-outer-banks/raw/'
    odir           = f'{mission_dir}/field-jpg{jpg_quality}/'
```

* ***jpg_quality***: 0-100 for jpeg quality vs file size
* ***mask***:  The file mask to use when walking thought the input directory.  Example: `mask = '*.arw'` for all Sony ARW
files.
* ***bask_script_fn***:  the name of the generated bash script file.
* ***mission_dir***:  The full path directory of the mission.
* ***script_dir***:   The directory where the bash script will be saved.
* ***idir***:         The input directory where the input photos are stored. The directory can have any number of
subdirectories and the program will walk the directory tree extracting the photo file names found in each subdirectory.
* ***odir***:         The directory to store the resulting converted photo files.

## Output:
1. Linux bash script file to generate Jpegs and a file of EXIF data extracted frmo the raw photo files
1. A 1-deep directory populated with Jpeg photos derived from the input raw photo files
1. A set of files containing the input file names of the raw and resulting Jpeg file names.
1. An single EXIF file containing the necessary EXIF info to sync each photo with PPK GPS.

### Result EXIF File
* The EXIF file contains the following columns and column headers:
    - ***filename*** _(example: 2020-0209-182755-DSC00793-17103-N7251F.ARW)_
    - ***date_time*** _(Example: 2020:02:09 18:27:46)_  Note: The columns are ',' delimited and the date_time is delimited with a space character
    - ***iso***
    - ***shutterspeed***
    - ***imagecount*** Total images captured by this camera as of this photo.  Indicates total camera shutter actuations.

## Requirements:
* Installed [Windows-10 WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
* Jupyter Notebooks for Python.

## Other required tools:
* [GraphicsMagick](http://www.graphicsmagick.org/). (Linux version). A Swiss Army knife for image processing. Note that it uses [DCRAW](https://www.dechifro.org/dcraw/) to read 
[raw camera photo files](https://en.wikipedia.org/wiki/Raw_image_format#ARW).
* [xargs](http://man7.org/linux/man-pages/man1/xargs.1.html).  Builtin to WSL Linux. Feeds command-line parameters from a file to the GraphicsMagick program.
* [Exiftool for Linux.](https://exiftool.org/)  Used to extract EXIF data from the raw photos.

## Exiftool

Extract EXIF info and write to stdout:
~~~
awk '{print $1}' allfiles.txt | xargs -n 10000 exiftool -T -fast -filename -datetimeoriginal -iso -shutterspeed -imagecount  > 2020-0208-exif.txt
~~~



In [284]:
import numpy as np
import os
import fnmatch

####################################################################
# User set Data directories, mask, settings, etc.
jpg_quality = 50
mask = '*.arw'
bash_script_fn = 'generate-jpgs.bash'
mission_dir    = '/mnt/j/data/2020-0208-NC-outer-banks'
script_dir     = 'j:/data/2020-0208-NC-outer-banks/etcX'
idir           = 'j:/data/2020-0208-NC-outer-banks/raw/'
odir           = f'{mission_dir}/field-jpg{jpg_quality}/'
####  END OF User Settings...
####################################################################

# Convert windows file name to WSL linux file name.  
# changes drive letter like "c:" to "/mnt/c/" and changes all \ to /
def mkufn(fn):
    if fn[1] == ':':
        ufn = f"/mnt/{fn[0]}{fn.split(':')[1]}".replace('\\', '/')
    else:
        ufn = fn.replace('\\', '/')
    return ufn

# Convert a WSL linux filename from /mnt/c/filename to Windoze10 c:/filename
def mkwfn(fn):
    # print(fn[0:6])
    if fn[0:5] == '/mnt/':
        wfn = f'{fn[5]}:'
        r = str(wfn+'/'+'/'.join(fn.split('/')[3:-1]))
    else:
        r = fn
    return r

threads = os.cpu_count()				# Determine the number of CPU cores 

allfiles = []
tc = 0
####################################################################
# Read into 'allfiles' a list of files in 'idir' that match 'mask'.
# The matching filenames are converted from windows to linux format
# as they are stored in 'allfiles'
####################################################################
for root, dir, files in os.walk(idir):
    for items in fnmatch.filter(files, mask):
        pfn = f'{root}/{items}'
        tc = tc + 1
        if pfn[1] == ':':
            idfn = f"/mnt/{pfn[0]}{pfn.split(':')[1]}".replace('\\', '/')
            odfn = odir+os.path.split(idfn)[-1].split('.')[0]+'.jpg'
            allfiles.append(f'{idfn} {odfn} ')
# Save 'allfiles' in 'allfiles.txt' mostly for debugging purposes.
with open(f'{script_dir}/allfiles.txt', 'w') as f:
    for n in allfiles:
        f.write(str(f'{n}\n'))
        
step = int(tc/threads)
blocksz = step*threads
starts = np.arange(0, tc, step)
## stops  = (starts -1)[1:]
stops  = (starts )[1:]
blocks = []

hs = f'\
# Generate jpgs with multiple gm.  W. Wright as of 2020-0220\n\
#########################################################################\n\
#   Mission Directory: {mission_dir}\n\
#          Script_dir: {script_dir}\n\
#         bash Script: {bash_script_fn}\n\
#           Input dir: {idir}\n\
#          Output dir: {odir}\n\
# CPU Cores (threads):{threads:2d}  Jpeg Quality:{jpg_quality:2d}   Blocksize:{blocksz:3d}\n\
#    Total File Count:{tc:4d}        Step: {step:4d} \n\
#########################################################################\n\
\n'

print(hs)						# Display settings & stats on the console

####################################################################
# Generate a Linux Bash script file containing the commands
# to execute graphicsmagick commands to convert an input file list
# into a new directory of converted images.
####################################################################
bash_script = open( f'{script_dir}/{bash_script_fn}', 'w')
bash_script.write(hs)

for i in range( len(stops)): 
		bk = allfiles[starts[i]:stops[i]]    # Extract a slice for each core
		blocks.append(bk)                    # Append bk to blocks.

blocks.append(allfiles[starts[-1]:tc])

# Each block contains a list of files to pass to a thread for processing
# Linux xargs used. See: http://man7.org/linux/man-pages/man1/xargs.1.html
core_number = 0
for i in blocks:
	fn = str(f'{script_dir}/f-{core_number}.txt')
	bash_script.write(str(f'( cat {mkufn(fn)} | xargs -t -n 2 gm convert -format jpeg -quality {jpg_quality} ) &\n'))
	with open(fn, 'w') as f:
		for n in i:
			f.write(str(f'{n}\n'))    # add /n to add line sep.  xargs 
	core_number = core_number + 1

exif_fn = '2020-0208-exif.txt'
bash_script.write(f'echo \'filename,date_time,iso,imagecount\' > {exif_fn}\n' )
bash_script.write("( awk '{print $1}' allfiles.txt\
 | xargs -n 100 exiftool -csv -fast \
 -filename \
 -datetimeoriginal \
 -iso -shutterspeed \
 -imagecount  >> "+f'{exif_fn} ) &\n')
bash_script.write('echo \'Conversion script written\'\n')
bash_script.close()
  
  

# Generate jpgs with multiple gm.  W. Wright as of 2020-0220
#########################################################################
#   Mission Directory: /mnt/j/data/2020-0208-NC-outer-banks
#          Script_dir: j:/data/2020-0208-NC-outer-banks/etcX
#         bash Script: generate-jpgs.bash
#           Input dir: j:/data/2020-0208-NC-outer-banks/raw/
#          Output dir: /mnt/j/data/2020-0208-NC-outer-banks/field-jpg50/
# CPU Cores (threads):12  Jpeg Quality:50   Blocksize:17100
#    Total File Count:17103        Step: 1425 
#########################################################################


