# Organizing images and annotations
## How not to do it
One of the most time-consuming aspects of machine learning is assembling training data that have a good balance between object categories and as much diversity as possible in image backgrounds.  **This notebook is really a master class in how _not_ to do it, because I failed to insist on a high enough level of organization early in the project. I am including it as a cautionary tale.** 

I started with: 
- An annotation format (Pascal-VOC) that requires an exact 1:1 match between images and annotations; 
- A very complicated folder structure for images, which mixed images that had been analyzed by a human with images that had never been examined;
- A bunch of zip files with annotations whose filepaths didn't exactly match the image filepaths; and 
- Annotations from a program called CVAT that didn't produce output for empty images (therefore, it was challenging to figure out which had been examined).

It was a ghastly nightmare to straighten it all out.

## How to do it better
To make the AI side of the project go smoothly, it will greatly help to insist on the following:
1. **Unique names for images**.  This is absolutely worth doing, even if it takes time to set up.
2. **A simple folder structure**.  Regardless of whatever complexity you may have in the initial stages of data collection, by the time you get to input for an AI model, you want _only two folders_ : 1) images, and 2) matching annotations.  There should be no extraneous files in either folder.
3. **An annotation program whose output makes it possible to tell whether a file has been checked, even if there was nothing in it.** If you working with Pascal-VOC annotation, then you want it to provide an XML annnotation file for every image, _including_ 'empty' images.
4. **Come to agreement soon on object categories**.  You probably won't have enough images to train the model on some categories.  It may make sense to lump or drop those categories, and it saves time to experiment and come to agreement on a final set before getting serious about training.  Adding or subtracting categories requires re-training the model head (at a minimum) so it is expensive to do it too late.  In addition, the raw annotations will undoubtedly include many that don't fall into your permitted categories.  Those can be handled during the dataloading process, but it is a good idea to anticipate and discuss how to handle them in advance.

## Goal of this notebook

1. To assemble a set of full-size images with:
    - A good balance between classes (to the extent possible), including 'empty' images
    - A diversity of backgrounds
    - Clean up input labels and convert to a set of consistent labels from a restricted set of categories
2. To ensure that all images (including empty ones) have annotation files in PVOC format with the correct categories;
3. To ensure that all filenames are unique;
4. To create an exactly matching (1:1) set of image files and annotations, with the same names but different extensions.


### Directory structure (because I'll forget)

The general structure is this:
<pre>
Original images (as received) are in:
/cdata/tanzania/annotated_images/TA25 + others (RR17, RR19, SL25, MXJ-2019)
            \              \        \- _data/RKE 
             \              \          
              \              \- AIAIA Original annotation zipfiles (all dirs)
               \                    \- pvoc Pascal-VOC format zipfiles
                \ - /temp               
                      |-/annotations -- Annotations (unzipped)  
                      |-/tiled_annotations -- tiled annotations (flat)  
                      |-/tiled_images -- tiled images (flat)  
                      |-/temp --junk (used for moving unzipped files) 
</pre>
The annotation subdirectory names (`annotations/subname`) match the image subdirectories to which they are related, but the image subdirectories may be further divided into additional subdirectories.

In [None]:
!hostname

In [1]:
import sys
from pathlib import Path
sys.path.insert(1,str(Path.cwd().parent)) #A kludge for finding my own packages

In [3]:
from pathlib import Path
#import xml.etree.ElementTree as ET
from lxml import etree as et
#from pascal_voc_writer import Writer
from PIL import Image
from torchvision.transforms.functional import pad as tvpad
from torch import randperm
import random
import pdb
import re #regex
import subprocess
import pandas as pd
import zipfile
import shutil
import os
import pyvips
import pandas as pd

In [5]:
from trident_project.dev_packages.image_bbox_tiler import image_bbox_slicer as ibs
from trident_project.dev_packages.pascal_voc_writer.pascal_voc_writer import Writer
from trident_project.dev_packages import jp_utilities

## Copy files from Amazon AWS onto a Microsoft Azure VM

We had a lot of data (full-sized images and annotations) in an Amazon AWS s3 storage 'bucket' that we wanted to transfer to a Microsoft Azure virtual machine (VM). The Azure migration tools assume that you are going to want the data in a 'file store' or 'blob store', but we wanted to put the data on a solid-state hard drive (SSD) that was attached to our Azure VM. AWS access control is complicated. We eventually discovered that the following method worked very fast and smoothly:

1. Set up a [named profile](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html) on AWS;
2. Install the [aws-cli](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) (the AWS command-line client) on the Azure VM;
```bash
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
```
3. Configure the AWS profile on the Azure VM (replace `<...>` with your data):
```bash
aws configure --profile <profile_name>
    AWS Access Key ID [None]: <access_key>
    AWS Secret Access Key [None]: <secret_key>
    Default region name [None]: <AWS bucket region> #us-east or whatever 
    Default output format [None]:
```
3. Copy each of the directories you want, using the profile.  On the Azure VM:
```bash
cd <destination_image_directory>  
aws s3 --profile <profile_name> sync s3://<bucket_name>/<folder>/ . #don't forget the dot
```

## Extract and divide annotations
Now it gets ugly.  This is necessary because annotation zipfiles can include images from more than one subdirectory.

Steps:
1. Figure out what subdirectories are represented in each annotation zipfile
2. Extract the zipfile and divide the annotation files into subdirectories based on the folder in each filename
3. Write missing annotation files for 'empty' images (see below for details)

In [None]:
#Set main paths
imagepath = Path('/cdata/tanzania/annotated_images') #Parent folder for fullsize images
annotation_source_path = Path('/cdata/tanzania/annotated_images/AIAIA/pvoc') #Where original zipped annotation files are
annotationpath = Path('/cdata/tanzania/temp/annotations') #Destination parent folder for unzipped annotation files

In [7]:
#Get Pascal-VOC zipped annotation files from annotation_source_path
annotation_files = [str(x) for x in annotation_source_path.iterdir() if str(x).find('pvoc') > 0 and x.suffix == '.zip'] #convert strings to Paths and get parent folder
annotation_files

['/cdata/tanzania/annotated_images/AIAIA/pvoc/TA25-RKE-20191128A-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019Db-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019b-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/TA25-RKE-20191201-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019Fa-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/SL25-CFA-TNP_2013105-12-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/SL25-ZGF_2013105-12-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019Ea-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019H-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/RR19-5EL-20180930A-1008A-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019Eb-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019Da-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019G-pvoc.zip',
 '/cdata/tanzania/annotated_images/AIAIA/pvoc/MXJ2019a-pvoc.zip',
 '/cdata/t

### Some utilities for finding subdirectories in annotation files

In [6]:
def get_zipfile_directories(zfile):
    """
    Returns a list of subdirectories found in the filenames in a zipfile; uses ZipFile.    
    """
    #Get the filenames inside the zipfile
    zpf = zipfile.ZipFile(zfile)
    fnames = zpf.namelist()
    #Get the directory of every file; and from there a list of unique directories
    fdirs = list(set([str(Path(f).parent) for f in fnames]))
    return (fdirs)

In [7]:
def get_zipfile_annotation_dirs(zfile,remove_list):
    fdirs = get_zipfile_directories(zfile)
    if len(fdirs) > 0:
        #filter out only those subdirectories that start with 'Annotations'
        anndirs = [f for f in fdirs if f.split('/',1)[0] =='Annotations'] 
        #Clean the directory names by removing some roots
        subdirs = [clean_filepath(d,remove_list) for d in anndirs]
    return subdirs

In [8]:
#Create a flat list of all subdirectories found in all annotation files
def get_subdirs(annotation_files,remove_list):
    subdirs = []
    if not isinstance(annotation_files,list):
        return get_zipfile_annotation_dirs(annotation_files,remove_list)
    else:        
        for f in annotation_files:
            d = get_zipfile_annotation_dirs(f,remove_list)
            subdirs = subdirs + d
        return subdirs

In [9]:
#Removes elements from a filepath iteratively, only if they are in the first position.
#So to remove /Annotations or /Annotations/CVAT, remove_list should be ["Annotations","CVAT"] in that order.
def clean_filepath(fpath,remove_list):
    if not isinstance(remove_list,list):
        remove_list = list(remove_list)
    p = Path(fpath)
    for item in remove_list:
        if len(p.parts) > 0:
            if (p.parts[0]==item):
                p = Path(*p.parts[1:])
    return str(p)

In [None]:
#Check out the subdirectories in this batch of annotation files
rlist = ['Annotations','CVAT']
#rlist = []
get_subdirs(annotation_files,rlist)

### Clear the temporary directory and unzip annotations to it

In [11]:
def clear_directory(dir):
    """
    Clear a directory (erase both files and folders but leave the directory itself)
    """
    for filename in os.listdir(dir):
        file_path = os.path.join(dir, filename)
        try:
            if os.path.isfile(file_path) or os.path.islink(file_path):
                os.unlink(file_path)
            elif os.path.isdir(file_path):
                shutil.rmtree(file_path)
        except Exception as e:
            print('Failed to delete %s. Reason: %s' % (file_path, e))



In [12]:
#Unzips a single annotation zipfile into a temporary directory, complete with relative filepaths
#First we clear the temporary directory (delete files and folders)
def unzip_ann_file(ann_file,tempdir):
    """Unzip an annotation file into a temporary directory, complete with relative filepaths"""
    print("Clearing directory ",tempdir)
    clear_directory(tempdir)
        
    #Unzip the archive files into the temporary directory, in two steps
    #Warning: the -o flag can’t have spaces between it and the filename, and can only have ONE space before it
    outdir = '-o' + str(tempdir) 
    #Extract the files complete with relative filepaths (more stable than other methods tried)
    subprocess.call(["7z","x",ann_file,outdir]) #ann_file,outdir
    print("7z x ",ann_file,outdir)
    

### Copy existing annotation files to the annotationpath directory
Slighly complicated because of paths relative to different root directories.

In [13]:
#Copy existing annotations from the temp directory, tempdir = Path('/cdata/tanzania/temp/temp')) 
#to the annotations directory (annotationpath).  Annotations are unpacked from zipfiles with file structure like this:
#tempdir/Annotations/TA25/_data/TWR/etc../somefile.xml.
def copy_existing_annotations(tempdir,annotationpath,remove_list):
    fullpaths = list(tempdir.rglob("*.xml"))
    written = 0
    for fp in fullpaths:
        fullpath = str(fp)
        relpath = Path(fp).relative_to(tempdir) #output is full path but maybe with undesirable root
        cleanpath = clean_filepath(relpath,remove_list) #remove unwanted root
        to_path = str(Path(annotationpath)/cleanpath)
        #Fix xml elements <filename>, <path>, and <folder> (get correct values from cleanpath).  Overwrites fp in place.
        fix_annotation_paths(fullpath,cleanpath) 
        shutil.copy(fullpath, to_path)
        written += 1
    print('Copied ',written,' files from temp to annotationpath')


In [14]:
def fix_annotation_paths(file_to_fix,desired_path):
    """Fixes filename, path, folder in the 'file_to_fix' XML file and overwrites it. Adds <path> element if missing.
    From 'desired_path', extracts the filename (including extension) and the terminal folder.    
    """    
    #breakpoint()
    filepath = str(Path(desired_path).with_suffix('.jpg'))
    filename = str(Path(filepath).name)
    folder = str(Path(filepath).parent)
    parser = et.XMLParser(remove_blank_text=True) #required to get nice output with prettyprint
    tree = et.parse(file_to_fix,parser)
    root = tree.getroot()
    root.find('filename').text = filename
    root.find('folder').text = folder
    if(root.find('path') is not None):
        root.find('path').text = filepath
    else:
        path = et.Element('path') #create a new path element
        path.text = filepath 
        root.insert(1,path)
    tree.write(file_to_fix, pretty_print=True) 

In [77]:
#test
fix_annotation_paths('/cdata/tanzania/test2/junk/RR17-TWR-R-2014-11-13_AM_0005.xml','RR17/RR17-SGR-TWR_201/RR17-TWR-R-2014-11-13_AM_0005.xml')


## Fill in missing annotation files for 'empty' images
For each subdirectory represented in an annotation zipfile:
1. Get list of _all_ image filepaths in the associated subdirectory (recursive)
2. Get list of filepaths for _annotated_ images (ones that have associated annotation files)
3. Compare the lists to identify which images are missing annotation files
4. Use pascal_voc_writer to create an xml annotation file with no bounding box for each missing file

In [15]:
#1. Get list of all files processed from /ImageSets/Main/default.txt
#These are full filepaths with unwanted root parts removed (as defined by remove_list)
def get_image_filepaths(tempdir,remove_list):
    file = open(Path(tempdir)/"ImageSets/Main/default.txt")
    lines = [line.strip() for line in file]
    file.close()
    flist = [clean_filepath(l,remove_list) for l in lines]
    return flist

In [16]:
#2. Get list of filenames for images that have associated annotation files
#We read this by searching the tempdir structure; then clean up a bit to match image names
def get_annotated_images(tempdir,remove_list):
    fullpaths = list(tempdir.rglob("*.xml"))
    relpaths = [str(Path(fp).relative_to(tempdir)) for fp in fullpaths]
    nosuffix = [str(Path(f).with_suffix('')) for f in relpaths] #Get rid of suffix
    ann_imgs = [clean_filepath(fp,remove_list) for fp in nosuffix] #Drop root directories (Annotations/CVAT)
    return ann_imgs

In [17]:
#3. Calculate the missing annotations by subtracting annotations from total files
#We're using full filepaths but dropping the file extensions
def calculate_missing_annotations(images,annotated_images):
    missing_ann = list(set(images) - set(annotated_images))
    assert (len(missing_ann)==(len(images) - len(annotated_images))), 'images - annotated_images != missing_ann.  Do filepaths match?'
    return(missing_ann)

In [18]:
#4. Use pascal_voc_writer to create a new empty annotation file for each image in a list.
# missing_ann is expected to be a list of full filepaths without extensions.  We also assume that
# unwanted root directories have been removed.
def write_empty_annotations(imagepath,missing_ann,an_dest):  
    nc = 0
    badlist = []
    if(len(missing_ann) > 0):
        for filestem in missing_ann:
            image = (Path(imagepath)/filestem).with_suffix('.jpg')
            relpath = Path(filestem).with_suffix('.jpg')
            filename = str(image.name)
            folder = str(image.parent)            
            try:
                im = Image.open(image)
            except FileNotFoundError:
                #print('Could not create xml file: image not found: ',image)
                bad_xml_file = (Path(an_dest)/filestem).with_suffix('.xml')
                badlist.append(str(bad_xml_file))
                next
            else:
                width, height = im.size
                writer = Writer(filename, width, height) # Writer(path, width, height)
                writer.changePath(relpath)
                writer.changeFolder(folder)
                outfile = str((Path(an_dest)/filestem).with_suffix('.xml'))
                writer.save(outfile)
                nc +=1
    print("   Created " + str(nc) + " new annotation files; ")
    if (len(badlist) > 0):
        print("Warning: No annotation file written for ",len(badlist)," xml files with no match to images.")
    return(badlist)

In [19]:
#Calculate which images are missing annotations in a directory, and write 'empty' annotations for them.
def add_missing_annotations(tempdir,imagedir,ann_dest,remove_list):
    """
    Calculates which images in a directory are missing annotation and writes 'empty' annotation files for them.
    Parameters:
     """
    print('Annotation directory: ',ann_dest)
    imgs = get_image_filepaths(tempdir,remove_list)
    print('   Total images: ', len(imgs))
    annotated_images = get_annotated_images(tempdir,remove_list)
    print('   Annotated images: ',len(annotated_images))
    missing_ann = calculate_missing_annotations(imgs,annotated_images)
    print('   Missing annotations: ',len(missing_ann))
    badlist = write_empty_annotations(imagedir,missing_ann,ann_dest)
    return(badlist)

## Process annotations

In [20]:
#Wrap it all into one function
def process_annotations(annotation_files,tempdir,imagedir,ann_dest,remove_list):
    #Create subfolders of the /cdata/tanzania/temp/annotations directory to put annotations in
    #The list is created directly from the annotation files.  Will not overwrite existing directories/files).
    subdirs = get_subdirs(annotation_files,remove_list)
    for sub in subdirs:
        p = annotationpath/sub
        p.mkdir(parents=True, exist_ok=True)
    subdirs

    #Add missing annotations.  No annotation will be created if the associated image file is missing
    badlist = [] #List of filenames for annotation files where no associated image can be found
    for ann_file in annotation_files:
        unzip_ann_file(ann_file,tempdir)
        copy_existing_annotations(tempdir,annotationpath,remove_list)
        blist = add_missing_annotations(tempdir,imagedir,ann_dest,remove_list)
        badlist = badlist + blist
    return badlist

In [27]:
#CALL: process_annotations()
tempdir = Path('/cdata/tanzania/temp/temp')
rlist = ['Annotations','CVAT']
badlist = process_annotations(annotation_files,tempdir,imagepath,annotationpath,rlist)

Clearing directory  /cdata/tanzania/temp/temp
7z x  /cdata/tanzania/annotated_images/AIAIA/pvoc/TA25-RKE-20191128A-pvoc.zip -o/cdata/tanzania/temp/temp
Copied  93  files from temp to annotationpath
Annotation directory:  /cdata/tanzania/temp/annotations
   Total images:  141
   Annotated images:  93
   Missing annotations:  48
   Created 48 new annotation files; 
Clearing directory  /cdata/tanzania/temp/temp
7z x  /cdata/tanzania/annotated_images/AIAIA/pvoc/TA25-RKE-20191201-pvoc.zip -o/cdata/tanzania/temp/temp
Copied  85  files from temp to annotationpath
Annotation directory:  /cdata/tanzania/temp/annotations
   Total images:  132
   Annotated images:  85
   Missing annotations:  47
   Created 47 new annotation files; 
Clearing directory  /cdata/tanzania/temp/temp
7z x  /cdata/tanzania/annotated_images/AIAIA/pvoc/SL25-CFA-TNP_2013105-12-pvoc.zip -o/cdata/tanzania/temp/temp
Copied  384  files from temp to annotationpath
Annotation directory:  /cdata/tanzania/temp/annotations
   Total 

In [None]:
# Exception: Each image in `/cdata/tanzania/annotated_images/RR19/RR19-5EL-20180920A-28A` must have its 
# corresponding XML file in `/cdata/tanzania/temp/annotations/RR19/RR19-5EL-20180920A-28A` with the same file name.

## Manually delete some problematic nonmatching files

In [22]:
def find_nonmatching_files(imagepath,annotationpath,folder):
    #Get image files
    impth = imagepath/folder
    imlist = list(impth.glob('*.jpg'))
    imfiles = [str(Path(im).relative_to(imagepath).with_suffix('')) for im in imlist]
    #Get annotation files that should match
    anpth = Path(annotationpath)/folder
    anlist = list(anpth.glob('*.xml'))
    anfiles = [str(Path(af).relative_to(annotationpath).with_suffix('')) for af in anlist]
    image_hasno_xml = [f for f in imfiles if f not in anfiles]
    xml_hasno_image = [f for f in anfiles if f not in imfiles]
    return (image_hasno_xml,xml_hasno_image)
    

In [28]:
len(badlist)

140

In [44]:
#Call it

bad_folders =   ['RR19/RR19-CFA-20180921A-1009A',
                 'RR19/RR19-5EL-20180920A-28A', 
                 'RR19/RR19-5EL-20180930A-1008A']

for folder in bad_folders:
    (image_hasno_xml,xml_hasno_image) = find_nonmatching_files(imagepath,annotationpath,folder)
    print(len(image_hasno_xml),image_hasno_xml,len(xml_hasno_image),xml_hasno_image)

0 [] 27 ['RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180927A_RSOL_2688', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181008A_RSOR_0014', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180924A_RSOL_2540', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOL_2442', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181007A_RSOL_2919', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180925A_RSOR_0047', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180925A_RSOR_0037', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOR_0020', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180927A_RSOL_2684', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180921A_RSO-R_0008', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180928A_RSOR_0014', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181005A_RSOR_0004', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181005A_RSOL_2879', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181009A_RSOR_0007', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOL_2848', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOL_2422'

In [45]:
xml_hasno_image = ['RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180927A_RSOL_2688', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181008A_RSOR_0014', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180924A_RSOL_2540', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOL_2442', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181007A_RSOL_2919', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180925A_RSOR_0047', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180925A_RSOR_0037', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOR_0020', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180927A_RSOL_2684', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180921A_RSO-R_0008', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180928A_RSOR_0014', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181005A_RSOR_0004', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181005A_RSOL_2879', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181009A_RSOR_0007', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOL_2848', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOL_2422', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOL_2815', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181008A_RSOL_2970', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181008A_RSOL_2926', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOL_2808', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOL_2428', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOR_0009', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180924A_RSOL_2548', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181008A_RSOL_2939', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180929A_RSOR_0004', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180921A_RSO-R_0003', 'RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180928A_RSOL_2717']
for f in xml_hasno_image:
    xmlpath = annotationpath/Path(f).with_suffix('.xml')
    xmlpath.unlink()
    print('Deleted ',xmlpath)

Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180927A_RSOL_2688.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181008A_RSOR_0014.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180924A_RSOL_2540.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180922A_RSOL_2442.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181007A_RSOL_2919.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180925A_RSOR_0047.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180925A_RSOR_0037.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20181003A_RSOR_0020.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA-20180921A-1009A/RR19_CFA_20180927A_RSOL_2684.xml
Deleted  /cdata/tanzania/temp/annotations/RR19/RR19-CFA