# mp_genjpg: Multi-processor ARW to JPG converter

as of 2020-0225 1517

C. W. Wright [Github.com/lidar532](https://github.com/lidar532)

mp_genjpg walks a directory containing 
[raw Sony ARW photo files](https://en.wikipedia.org/wiki/Raw_image_format#ARW), and generates a
Linux bash script which will use all CPU cores to generate jpg or other
photo files.  After gathering the list of raw photos, it divides them by the number
of CPU cores on your computer, and then generates a list for each core of input and output
file names.  A Bash script is generated that will run several copies (one per core) of GraphicsMagick 
configured to convert the raw file to jpg (or other) output format.

The program now runs on a windows-10 based Jupyter Python Notebook or a windows-10 WSL (Linux) subsystem with Jupyter Notebook and generates
a bash script suitable for execution on a [Windows-10 WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install-win10). 

## Usage:
1. Configure the `User Inputs` variables to set the jpg quality, directory for mission, images, output, script
1. Run the code cell.
1. Start a WSL Linux terminal in your selected script directory
1. Execute the generated bash script.  Example `bash generate-jpgs.bash`.( _It will generally run for hours._)
1. Once script completes, check to make sure the output directory has the correct file count

## User Inputs:
The user changes the following variables in the program as required.  ***Important note: This program is 
somewhat unique because it runs on Windoze but generates a script that must run on Linux. It is important
that the script_dir and idir variables are set using windows drive:filepath convention, and the mission_dir and odir
variables use the Linux filepath convention.***
```css
    jpg_quality = 50
    mask           = '*.arw'
    bash_script_fn = 'generate-jpgs.txt'
    exif_fn        = '2020-0208-exif.txt'
    mission_dir    = '/mnt/j/data/2020-0208-NC-outer-banks'
    script_dir     = 'j:/data/2020-0208-NC-outer-banks/etc'
    idir           = 'j:/data/2020-0208-NC-outer-banks/raw/'
    odir           = f'{mission_dir}/field-jpg{jpg_quality}/'
```

* ***jpg_quality***: 0-100 for jpeg quality vs file size
* ***mask***:  The file mask to use when walking thought the input directory.  Example: `mask = '*.arw'` for all Sony ARW
files.
* ***bash_script_fn***:  The name of the generated bash script file.
* ***exif_fn***: The name of the EXIF filename.
* ***mission_dir***:  The full path directory of the mission.
* ***script_dir***:   The directory where the bash script will be saved.
* ***idir***:         The input directory where the input photos are stored. The directory can have any number of
subdirectories and the program will walk the directory tree extracting the photo file names found in each subdirectory.
* ***odir***:         The directory to store the resulting converted photo files.

## Output:
1. Linux bash script file to generate Jpegs and a file of EXIF data extracted frmo the raw photo files
1. A 1-deep directory populated with Jpeg photos derived from the input raw photo files
1. A set of files containing the input file names of the raw and resulting Jpeg file names.
1. An single EXIF file containing the necessary EXIF info to sync each photo with PPK GPS.

### Result EXIF File
* The EXIF file contains the following columns and column headers:
    - ***filename*** _(example: 2020-0209-182755-DSC00793-17103-N7251F.ARW)_
    - ***date_time*** _(Example: 2020:02:09 18:27:46)_  Note: The columns are ',' delimited and the date_time is delimited with a space character
    - ***iso***
    - ***shutterspeed***
    - ***imagecount*** Total images captured by this camera as of this photo.  Indicates total camera shutter actuations.

## Requirements:
* Installed [Windows-10 WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
* [Jupyter Notebooks](https://jupyter.org/) for Python. 

## Other required tools:
* [GraphicsMagick](http://www.graphicsmagick.org/). (Linux version). A Swiss Army knife for image processing. Note that it uses [DCRAW](https://www.dechifro.org/dcraw/) to read 
[raw camera photo files](https://en.wikipedia.org/wiki/Raw_image_format#ARW).
* [xargs](http://man7.org/linux/man-pages/man1/xargs.1.html).  Builtin to WSL Linux. Feeds command-line parameters from a file to the GraphicsMagick program.
* [Sed.  (stream editor) is a Unix utility that parses and transforms text](https://en.wikipedia.org/wiki/Sed)
* [Tr.](https://en.wikipedia.org/wiki/Tr_(Unix)) Takes two sets of characters (generally of the same length), and replaces occurrences of the characters in the first set with the corresponding elements from the second set.
* [Exiftool for Linux.](https://exiftool.org/)  Used to extract EXIF data from the raw photos.  Exiftool options and examples [can be found here](https://exiftool.org/exiftool_pod.html#RENAMING-EXAMPLES)
* [Adobe Digital Negative Converter](https://helpx.adobe.com/photoshop/using/adobe-dng-converter.html) *recommended* Can be used to convert ARW files to DNG.  It is multithreaded and will do the conversion significantly faster than oter methods.

## Exiftool

Extract EXIF info and write to stdout:

New Script with total shutter count
~~~
exiftool   -T  -ext arw -filename -datetimeoriginal  -iso -shutterspeed  -imagecount  -r ../raw/ | tr '\t' ',' | sed 's/.ARW/.jpg/g'
~~~

New Without shutter count
~~~
exiftool   -T  -ext arw -filename -datetimeoriginal  -iso -shutterspeed  -r ../raw/ | tr '\t' ',' | sed 's/.ARW/.jpg/g'
~~~




In [28]:
# Convert windows file name to WSL linux file name.  
# changes drive letter like "c:" to "/mnt/c/" and changes all \ to /
# From code-fragments gist
def mkufn(fn):
    if fn[1] == ':':
        ufn = f"/mnt/{fn[0]}{fn.split(':')[1]}".replace('\\', '/')
    else:
        ufn = fn.replace('\\', '/')
    return ufn

#@title mkwfn(fn):  Convert a windows drive:/path/filename to a WSL linux path/filename
# From code-fragments gist
def mkwfn(fn):
    # print(fn[0:6])
    if fn[0:5] == '/mnt/':
        wfn = f'{fn[5]}:'
        r = str(wfn+'/'+'/'.join(fn.split('/')[3:]))
    else:
        r = fn
    return r

def my_platform():
  s = platform.system()
  c = multiprocessing.cpu_count()
  return s,c

if __name__ == '__main__':
    import numpy as np
    import os
    import fnmatch
    import platform
    import multiprocessing
    import os
    import getpass
    from datetime import datetime
    uname = platform.uname()
    today = datetime.utcnow()
    username = getpass.getuser()


    ####################################################################
    # User set Data directories, mask, settings, etc.
    user_notes  = '2020-Feb Outer Banks flights. N7251F.  '
    jpg_quality = 50
    mask = '*.arw'
    bash_script_fn = 'generate-jpgs.bash'
    exif_fn        = '2020-0208-exif.txt'
    mission_dir    = f'/mnt/i/2020-0208-NC'
    script_dir     = f'{mission_dir}/07_etc/'
    idir           = f'{mission_dir}/03_Photos/raw/'
    odir           = f'{mission_dir}/03_Photos/field-jpg{jpg_quality}/'
    ####  END OF User Settings...
    ####################################################################

    
    # Make sure all the files names conform to the host platform
    if my_platform()[0] == 'Windows':
        bash_script_fn = mkwfn( bash_script_fn )
        exif_fn        = mkwfn( exif_fn )        
        mission_dir    = mkwfn( mission_dir )
        script_dir     = mkwfn( script_dir )
        idir           = mkwfn( idir )        
        odir           = mkwfn( odir )
    else:
        bash_script_fn = mkufn( bash_script_fn )
        exif_fn        = mkufn( exif_fn )        
        mission_dir    = mkufn( mission_dir )
        script_dir     = mkufn( script_dir )
        idir           = mkufn( idir )  
        odir           = mkufn( odir )     
        
    if os.path.isdir(idir) == False:
        raise ValueError(f'Directory doest not exist: {idir}')
    if os.path.isdir(odir) == False:
        raise ValueError(f'Directory doest not exist: {odir}')            
    if os.path.isdir(mission_dir) == False:
        raise ValueError(f'Directory doest not exist: {mission_dir}')             
    if os.path.isdir(script_dir) == False:
        raise ValueError(f'Directory doest not exist: {script_dir}') 
    if os.path.isdir(idir) == False:
        raise ValueError(f'Directory doest not exist: {idir}')
    

    threads = os.cpu_count()				# Determine the number of CPU cores 

    allfiles = []
    tc = 0
    ####################################################################
    # Read into 'allfiles' a list of files in 'idir' that match 'mask'.
    # The matching filenames are converted from windows to linux format
    # as they are stored in 'allfiles'
    ####################################################################
    for root, dir, files in os.walk(idir):
        for items in fnmatch.filter(map(str.lower,files), mask.lower()) :
            pfn = f'{root}/{items}'
            tc = tc + 1
            if pfn[1] == ':':
                idfn = f"/mnt/{pfn[0]}{pfn.split(':')[1]}".replace('\\', '/')
                odfn = odir+os.path.split(idfn)[-1].split('.')[0]+'.jpg'
                allfiles.append(f'{idfn} {odfn} ')

    # Save 'allfiles' in 'allfiles.txt' mostly for debugging purposes.
    with open(f'{script_dir}/allfiles.txt', 'w') as f:
        for n in allfiles:
            f.write(str(f'{n}\n'))
    step = int(tc/threads)
    blocksz = step*threads
    starts = np.arange(0, tc, step)
    ## stops  = (starts -1)[1:]
    stops  = (starts )[1:]
    blocks = []

    #print(f'# Generated: {today:%Y-%m-%d at %H:%M:%S Zulu}' )
    #print(f'#      User: {os.getlogin()}  OS:{uname[0]}-{uname[2]}  System:{uname[1]} CPU:{uname[4]}')
    hs = f'\
    # Generate jpgs with multiple gm.  W. Wright as of 2020-0225 1517\n\
    #########################################################################\n\
    #          User Notes: {user_notes} \n\
    #           Generated: {today:%Y-%m-%d at %H:%M:%S Zulu}  by: {username} \n\
    #                  OS: {uname[0]}-{uname[2]}  System:{uname[1]} CPU:{uname[4]}  \n\
    #   Mission Directory: {mission_dir}\n\
    #          Script_dir: {script_dir}\n\
    #         bash Script: {bash_script_fn}\n\
    #           Input dir: {idir}\n\
    #          Output dir: {odir}\n\
    # CPU Cores (threads):{threads:2d}  Jpeg Quality:{jpg_quality:2d}   Blocksize:{blocksz:3d}\n\
    #    Total File Count:{tc:4d}        Step: {step:4d} \n\
    #\n\
    # This program can be found at:\n\
    # https://github.com/lidar532/ppkgeotag/blob/2020-0222-dev/mp_genjpg.ipynb\n\
    #########################################################################\n\
    \n\
    #  Next, Login to your Windows WSL (Linux) and:\n\
    #  cd to: {mkufn(script_dir)} and then run bash {bash_script_fn} \n\
    \n'

    print(hs)						# Display settings & stats on the console

    ####################################################################
    # Generate a Linux Bash script file containing the commands
    # to execute graphicsmagick commands to convert an input file list
    # into a new directory of converted images.
    ####################################################################
    bash_script = open( f'{script_dir}/{bash_script_fn}', 'w')
    bash_script.write(hs)

    for i in range( len(stops)): 
            bk = allfiles[starts[i]:stops[i]]    # Extract a slice for each core
            blocks.append(bk)                    # Append bk to blocks.

    blocks.append(allfiles[starts[-1]:tc])

    # Each block contains a list of files to pass to a thread for processing
    # Linux xargs used. See: http://man7.org/linux/man-pages/man1/xargs.1.html
    core_number = 0
    for i in blocks:
        fn = str(f'{script_dir}/f{core_number}.txt')
        bash_script.write(str(f'( cat {mkufn(fn)} | xargs -t -n 2 gm convert -format jpeg -quality {jpg_quality} ) &\n'))
        with open(fn, 'w') as f:
            for n in i:
                f.write(str(f'{n}\n'))    # add /n to add line sep.  xargs 
        core_number = core_number + 1

    ##  bash_script.write(f'echo \'filename,date_time,iso,imagecount\' > {exif_fn}\n' )

    bash_script.write(f"exiftool -T -ext arw -filename -datetimeoriginal  -iso -shutterspeed \
      -r {mkufn(idir)} | tr '\\t' ',' | sed 's/.ARW/.jpg/g' > {exif_fn}  & \n")
    bash_script.write('echo \'Conversion script written\'\n')
    bash_script.close()
else:
    print('mp_genjpg loaded as library.')

    # Generate jpgs with multiple gm.  W. Wright as of 2020-0225 1517
    #########################################################################
    #          User Notes: 2020-Feb Outer Banks flights. N7251F.   
    #           Generated: 2020-02-27 at 19:38:47 Zulu  by: wright 
    #                  OS: Linux-4.4.0-18362-Microsoft  System:LLT-WW CPU:x86_64  
    #   Mission Directory: /mnt/i/2020-0208-NC
    #          Script_dir: /mnt/i/2020-0208-NC/07_etc/
    #         bash Script: generate-jpgs.bash
    #           Input dir: /mnt/i/2020-0208-NC/03_Photos/raw/
    #          Output dir: /mnt/i/2020-0208-NC/03_Photos/field-jpg50/
    # CPU Cores (threads): 4  Jpeg Quality:50   Blocksize:17100
    #    Total File Count:17103        Step: 4275 
    #
    # This program can be found at:
    # https://github.com/lidar532/ppkgeotag/blob/2020-0222-dev/mp_genjpg.ipynb
    #########################################################################
    
    #  Next, Login to your Windows