# Making a script to organize high-throughput imaging data

- Generates folders for each drug and dose
- Rename .nd2 files to remove spaces and commas
- Moves the .nd2 files into organized drug, dose folders

By Prech Uapinyoying

12/20/2018

In [1]:
import shutil
import os # for dealing with operating system commands like paths to files
#import sys
import re
import itertools

# Step 1. Generate test files

Before you even start, you need to create some test files to play with. Use the included 'create_orig_files.sh'

Then run the script in the directory you want by doing `bash create_orig_files.sh`

---

# Step2.  Get the current directory path and a list of all the files in it

In [2]:
# Get the current working directory
cwd = os.getcwd()
cwd

In [3]:
# I generated the test files into a folder called 'test_files' lets create that path
# join command will intellegently combine path names with correct slash for any operating system
filesDir = os.path.join(cwd, 'test_files') 
filesDir

In [4]:
# Get a list of files in that test_files directory
fileList = os.listdir(filesDir)
fileList[0:5] # take a look at first 5

['grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2',
 'grn_d6_24hr_posttx_WellF03_Channel405 nm,488 nm,561 nm,640 nm_Seq0141.nd2',
 'grn_d6_24hr_posttx_WellA22_Channel405 nm,488 nm,561 nm,640 nm_Seq0021.nd2',
 'grn_d6_24hr_posttx_WellJ20_Channel405 nm,488 nm,561 nm,640 nm_Seq0220.nd2',
 'grn_d6_24hr_posttx_WellI01_Channel405 nm,488 nm,561 nm,640 nm_Seq0192.nd2']

In [5]:
# We need to check if the filename also ends with an nd2, can do this with os.path.splitext
filename, file_extension = os.path.splitext(
    'grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2')

print(f'The extension is: "{file_extension}"')

The extension is: ".nd2"


In [6]:
# Make it into a function, you will probably use the script in the folder anyways

def get_nd2_fileList(filesDir):
    
    fileList = []
    
    cwd = os.getcwd()
    allFiles = os.listdir(filesDir)
    for file in allFiles:
        filename, file_extension = os.path.splitext(file)
        if file_extension == '.nd2':
            fileList.append(file)
    
    return fileList

---


# Step 3. Automating the generation of new drug/dose directories

Start by making lists of drugs and doses for generating drug directories and dose subdirectories

In [7]:
drugDirNames = ['drug1','drug2','drug3','drug4','drug5','drug6',
                'drug7','drug8','drug9','drug10','drug11', 'drug12']

doseDirNames = ['dose1','dose2','dose3','dose4','dose5','dose6','dose7','dose8']

print(drugDirNames)
print(doseDirNames)

['drug1', 'drug2', 'drug3', 'drug4', 'drug5', 'drug6', 'drug7', 'drug8', 'drug9', 'drug10', 'drug11', 'drug12']
['dose1', 'dose2', 'dose3', 'dose4', 'dose5', 'dose6', 'dose7', 'dose8']


Lets add the full path to the first drug folder and drug dose using `os.path.join()`.  This function is particularly nice because it will intellegently format the path (e.g. slashes) for the operating system you are on.

In [8]:
os.path.join(filesDir, drugDirNames[0], doseDirNames[0])

'/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose1'

Now try it with a for loop that will automate the path generation for all drug/dose paths and put it into a list

In [9]:
fullDrugDirPaths = [] # make an empty list to put the generated directory paths into

# For each drug directory in the list, join the paths and add it to the empty list
for drugDir in drugDirNames:
    for doseDir in doseDirNames:
        currDir = os.path.join(filesDir, drugDir, doseDir)
        fullDrugDirPaths.append(currDir)
    
fullDrugDirPaths[0:16] # peak at first two drugs and their 8 doses

['/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose1',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose2',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose3',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose4',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose5',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose6',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose7',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose8',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug2/dose1',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug2/dose2',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug2/dose3',
 '/Users/uapinyoyingp

Turn it into a function

In [10]:
def gen_dir_paths(drugDirNames, drugDoseNames, filesDir):
    fullDrugDirPaths = [] # make an empty list to put the generated directory paths into

    # For each drug directory in the list, join the paths and add it to the empty list
    for drugDir in drugDirNames:
        for doseDir in doseDirNames:
            currDir = os.path.join(filesDir, drugDir, doseDir)
            fullDrugDirPaths.append(currDir)
            
    return fullDrugDirPaths

Let's work out the logic on how to create a new directory with a single folder. We can specify that we only want to make the new directory if it doesn't exist yet, but its not important. If you run this, you may need to rerun the `create_orig_files.sh` script again to restore the name.

In [11]:
# if not os.path.exists(fullDrugDirPaths[1]):
#     os.makedirs(fullDrugDirPaths[1])
#     print('I just created this folder: {}'.format(fullDrugDirPaths[1]))
# else:
#     print('Folder already exists, man!')

Here I turn the example into a function that will create all folders in the `fullDrugDirPaths` list. However, I make the folder exist check a bit more robust using a `try` statement. This way I can catch the `FileExistsError` and gracefully respond to the user with feedback and shutdown the script instead of letting the whole script crash with an ugly error when it runs.

In [12]:
def make_dirs(fullDrugDirPaths):
    for path in fullDrugDirPaths:
        try:
            # chng to 'exist_ok=False' if you don't want to allow overwriting
            os.makedirs(path, exist_ok=True) 
            #print(f'Created directory: {path}')
        except FileExistsError:
            print(f'Error: {path} exists dude. Have you already run this script before?')

Now to actually make all the directories using our new function

In [13]:
make_dirs(fullDrugDirPaths)

# Check if the folders are there by walking through the directory and selecting only subfolders
makeDirsOutput = next(os.walk(filesDir))[1]
print(makeDirsOutput)

['drug12', 'drug8', 'drug6', 'drug1', 'drug7', 'drug9', 'drug11', 'drug10', 'drug2', 'drug5', 'drug4', 'drug3']


---

# Step 4. Figure out how to rename files to remove spaces and commas 

First, lets try to use regular expressions to determine what well each file is from based on its name

In [14]:
# Example of a full file name with spaces and commas
test = 'grn_d6_24hr_posttx_WellA01_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2'

# Create a regular expression that caputures all the relevant groups using parentheses ()
REGEXP = r'(\w+)_Well(\w+)_Channel(\d+) nm,(\d+) nm,(\d+) nm,(\d+) nm_(\w+).(nd2)'

# Use the re.search() function to look for a match and capture all the groups specified in REGEXP ()
r = re.search(REGEXP,test)

# Here I use the captured groups and replace them in the format I want, including removing spaces and commas
# This (f'{}') is called 'f-string interpolation' its super useful and concise way to print stirngs
replacement = f'{r.group(1)}_Well{r.group(2)}_Channel_{r.group(3)}nm_{r.group(4)}nm_{r.group(5)}nm_{r.group(6)}nm_{r.group(7)}.{r.group(8)}'

print(f'Original(test):\t{test}') # The original file name in our test case (spaces, commas and all)
print(f'Replacement:\t{replacement}') # Check the replacement string if it removed all our spaces and commas

Original(test):	grn_d6_24hr_posttx_WellA01_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2
Replacement:	grn_d6_24hr_posttx_WellA01_Channel_405nm_488nm_561nm_640nm_Seq0024.nd2


Quick test if renaming works using `test` and `replacement`. If you do this, you will need to delete the `test_files` folder and rerun the `create_orig_files.sh` script.

In [15]:
#shutil.move(os.path.join(filesDir,test),os.path.join(filesDir,replacement))

Larger test to rename all the files using a for loop. Again, if you do this you will need to recreate the files using 'create_orig_files.sh'

In [16]:
# for file in fileList:
#     r = re.search(REGEXP,file)
#     replacement = f'{r.group(1)}_Well{r.group(2)}_Channel_{r.group(3)}nm_\
# {r.group(4)}nm_{r.group(5)}nm_{r.group(6)}nm_{r.group(7)}.{r.group(8)}'
    
#     sourcePath = os.path.join(filesDir,file)
#     replacementPath = os.path.join(filesDir, replacement)
    
#     shutil.move(sourcePath,replacementPath)

Let's reserve the actual renaming for later down the line. I will try to combine renaming and moving into a single step, but first I'll turn the name changing portion into a function that we can reuse later.

In [17]:
def get_new_filenames(fileList):
    oldNewFilenameList = [] # initiate an empty list for housing old and new file names
    
    for oldFilename in fileList:
        r = re.search(REGEXP,oldFilename)
        newFilename = f'{r.group(1)}_Well{r.group(2)}_Channel_{r.group(3)}nm_{r.group(4)}nm_{r.group(5)}nm_{r.group(6)}nm_{r.group(7)}.{r.group(8)}'
        
        # file names in a list of tuples (old name, new name)
        oldNewFilenameList.append((oldFilename, newFilename))

    return oldNewFilenameList

Now use the new function to generate the old and new file name list

In [18]:
oldNewFilenameList = get_new_filenames(fileList)

# Print out the first 5 entries
for (oldName, newName) in oldNewFilenameList[0:5]:
    print(f'old name: {oldName}')
    print(f'new name: {newName}\n')

old name: grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2
new name: grn_d6_24hr_posttx_WellB24_Channel_405nm_488nm_561nm_640nm_Seq0024.nd2

old name: grn_d6_24hr_posttx_WellF03_Channel405 nm,488 nm,561 nm,640 nm_Seq0141.nd2
new name: grn_d6_24hr_posttx_WellF03_Channel_405nm_488nm_561nm_640nm_Seq0141.nd2

old name: grn_d6_24hr_posttx_WellA22_Channel405 nm,488 nm,561 nm,640 nm_Seq0021.nd2
new name: grn_d6_24hr_posttx_WellA22_Channel_405nm_488nm_561nm_640nm_Seq0021.nd2

old name: grn_d6_24hr_posttx_WellJ20_Channel405 nm,488 nm,561 nm,640 nm_Seq0220.nd2
new name: grn_d6_24hr_posttx_WellJ20_Channel_405nm_488nm_561nm_640nm_Seq0220.nd2

old name: grn_d6_24hr_posttx_WellI01_Channel405 nm,488 nm,561 nm,640 nm_Seq0192.nd2
new name: grn_d6_24hr_posttx_WellI01_Channel_405nm_488nm_561nm_640nm_Seq0192.nd2



---


# Step 5. Matching specific files to specific directories.

Now we need to figure out how to match up the file with its destination if we want to move the correct files to their new directories. We can do this by using the captured well name within the file.

To arrange the wells for each drug, we will make a dictionary for each drug with the dose as the key and list of wells as the values.

In [19]:
drug1Dict = {'dose1': ['A01','A03','A05', 'A07'],
             'dose2': ['C01','C03','C05', 'C07'],
             'dose3': ['E01','E03','E05', 'E07'],
             'dose4': ['G01','G03','G05', 'G07'],
             'dose5': ['I01','I03','I05', 'I07'],
             'dose6': ['K01','K03','K05', 'K07'],
             'dose7': ['M01','M03','M05', 'M07'],
             'dose8': ['O01','O03','O05', 'O07']}

We can use itertools' `.items()` to loop through key (dose), value (wells) pairs

In [20]:
for dose, wells in drug1Dict.items():
    print(f'{dose} --> {wells}')

dose1 --> ['A01', 'A03', 'A05', 'A07']
dose2 --> ['C01', 'C03', 'C05', 'C07']
dose3 --> ['E01', 'E03', 'E05', 'E07']
dose4 --> ['G01', 'G03', 'G05', 'G07']
dose5 --> ['I01', 'I03', 'I05', 'I07']
dose6 --> ['K01', 'K03', 'K05', 'K07']
dose7 --> ['M01', 'M03', 'M05', 'M07']
dose8 --> ['O01', 'O03', 'O05', 'O07']


Let's try to figure out the logic for determining what file goes to what folder

In [21]:
for file in fileList:
    r = re.search(REGEXP,file)
    for dose, wells in drug1Dict.items():
        if r.group(2) in wells:
            print(f'{file} - belongs in --> {dose}')

grn_d6_24hr_posttx_WellI01_Channel405 nm,488 nm,561 nm,640 nm_Seq0192.nd2 - belongs in --> dose5
grn_d6_24hr_posttx_WellA01_Channel405 nm,488 nm,561 nm,640 nm_Seq0000.nd2 - belongs in --> dose1
grn_d6_24hr_posttx_WellK03_Channel405 nm,488 nm,561 nm,640 nm_Seq0242.nd2 - belongs in --> dose6
grn_d6_24hr_posttx_WellM03_Channel405 nm,488 nm,561 nm,640 nm_Seq0290.nd2 - belongs in --> dose7
grn_d6_24hr_posttx_WellE03_Channel405 nm,488 nm,561 nm,640 nm_Seq0098.nd2 - belongs in --> dose3
grn_d6_24hr_posttx_WellC07_Channel405 nm,488 nm,561 nm,640 nm_Seq0054.nd2 - belongs in --> dose2
grn_d6_24hr_posttx_WellI03_Channel405 nm,488 nm,561 nm,640 nm_Seq0194.nd2 - belongs in --> dose5
grn_d6_24hr_posttx_WellC01_Channel405 nm,488 nm,561 nm,640 nm_Seq0048.nd2 - belongs in --> dose2
grn_d6_24hr_posttx_WellO01_Channel405 nm,488 nm,561 nm,640 nm_Seq0336.nd2 - belongs in --> dose8
grn_d6_24hr_posttx_WellA03_Channel405 nm,488 nm,561 nm,640 nm_Seq0002.nd2 - belongs in --> dose1
grn_d6_24hr_posttx_WellK01_Cha

Lets take it to the next step and make a nested dictionary of all the drugs and doses that will exist on a 384 well plate. This list is going to be huge. I used find and replace to create it.
`{ drug1: {dose1: wells} }`. 

In [22]:
drugDict = {
'drug1': {
'dose1': ['A01','A03','A05', 'A07'],    'dose2': ['C01','C03','C05', 'C07'], 
'dose3': ['E01','E03','E05', 'E07'],    'dose4': ['G01','G03','G05', 'G07'],
'dose5': ['I01','I03','I05', 'I07'],    'dose6': ['K01','K03','K05', 'K07'],
'dose7': ['M01','M03','M05', 'M07'],    'dose8': ['O01','O03','O05', 'O07']},
            
'drug2': {
'dose1': ['A02','A04','A06', 'A08'],    'dose2': ['C02','C04','C06', 'C08'], 
'dose3': ['E02','E04','E06', 'E08'],    'dose4': ['G02','G04','G06', 'G08'],
'dose5': ['I02','I04','I06', 'I08'],    'dose6': ['K02','K04','K06', 'K08'],
'dose7': ['M02','M04','M06', 'M08'],    'dose8': ['O02','O04','O06', 'O08']},
            
'drug3': {
'dose1': ['A09','A11','A13', 'A15'],    'dose2': ['C09','C11','C13', 'C15'],
'dose3': ['E09','E11','E13', 'E15'],    'dose4': ['G09','G11','G13', 'G15'],
'dose5': ['I09','I11','I13', 'I15'],    'dose6': ['K09','K11','K13', 'K15'],
'dose7': ['M09','M11','M13', 'M15'],    'dose8': ['O09','O11','O13', 'O15']},
            
'drug4': {
'dose1': ['A10','A12','A14', 'A16'],    'dose2': ['C10','C12','C14', 'C16'],
'dose3': ['E10','E12','E14', 'E16'],    'dose4': ['G10','G12','G14', 'G16'],
'dose5': ['I10','I12','I14', 'I16'],    'dose6': ['K10','K12','K14', 'K16'],
'dose7': ['M10','M12','M14', 'M16'],    'dose8': ['O10','O12','O14', 'O16']},
            
'drug5': {
'dose1': ['A17','A19','A21', 'A23'],    'dose2': ['C17','C19','C21', 'C23'],
'dose3': ['E17','E19','E21', 'E23'],    'dose4': ['G17','G19','G21', 'G23'],
'dose5': ['I17','I19','I21', 'I23'],    'dose6': ['K17','K19','K21', 'K23'],
'dose7': ['M17','M19','M21', 'M23'],    'dose8': ['O17','O19','O21', 'O23']},

'drug6': {
'dose1': ['A18','A20','A22', 'A24'],    'dose2': ['C18','C20','C22', 'C24'],
'dose3': ['E18','E20','E22', 'E24'],    'dose4': ['G18','G20','G22', 'G24'],
'dose5': ['I18','I20','I22', 'I24'],    'dose6': ['K18','K20','K22', 'K24'],
'dose7': ['M18','M20','M22', 'M24'],    'dose8': ['O18','O20','O22', 'O24']},

'drug7': {
'dose1': ['B01','B03','B05', 'B07'],    'dose2': ['D01','D03','D05', 'D07'],
'dose3': ['F01','F03','F05', 'F07'],    'dose4': ['H01','H03','H05', 'H07'],
'dose5': ['J01','J03','J05', 'J07'],    'dose6': ['L01','L03','L05', 'L07'],
'dose7': ['N01','N03','N05', 'N07'],    'dose8': ['P01','P03','P05', 'P07']},
            
'drug8': {
'dose1': ['B02','B04','B06', 'B08'],    'dose2': ['D02','D04','D06', 'D08'],
'dose3': ['F02','F04','F06', 'F08'],    'dose4': ['H02','H04','H06', 'H08'],
'dose5': ['J02','J04','J06', 'J08'],    'dose6': ['L02','L04','L06', 'L08'],
'dose7': ['N02','N04','N06', 'N08'],    'dose8': ['P02','P04','P06', 'P08']},

'drug9': {
'dose1': ['B09','B11','B13', 'B15'],    'dose2': ['D09','D11','D13', 'D15'],
'dose3': ['F09','F11','F13', 'F15'],    'dose4': ['H09','H11','H13', 'H15'],
'dose5': ['J09','J11','J13', 'J15'],    'dose6': ['L09','L11','L13', 'L15'],
'dose7': ['N09','N11','N13', 'N15'],    'dose8': ['P09','P11','P13', 'P15']},
            
'drug10':{
'dose1': ['B10','B12','B14', 'B16'],    'dose2': ['D10','D12','D14', 'D16'],
'dose3': ['F10','F12','F14', 'F16'],    'dose4': ['H10','H12','H14', 'H16'],
'dose5': ['J10','J12','J14', 'J16'],    'dose6': ['L10','L12','L14', 'L16'],
'dose7': ['N10','N12','N14', 'N16'],    'dose8': ['P10','P12','P14', 'P16']},
            
'drug11':{
'dose1': ['B17','B19','B21', 'B23'],    'dose2': ['D17','D19','D21', 'D23'],
'dose3': ['F17','F19','F21', 'F23'],    'dose4': ['H17','H19','H21', 'H23'],
'dose5': ['J17','J19','J21', 'J23'],    'dose6': ['L17','L19','L21', 'L23'],
'dose7': ['N17','N19','N21', 'N23'],    'dose8': ['P17','P19','P21', 'P23']},
 
 'drug12':{
'dose1': ['B18','B20','B22', 'B24'],    'dose2': ['D18','D20','D22', 'D24'],
'dose3': ['F18','F20','F22', 'F24'],    'dose4': ['H18','H20','H22', 'H24'],
'dose5': ['J18','J20','J22', 'J24'],    'dose6': ['L18','L20','L22', 'L24'],
'dose7': ['N18','N20','N22', 'N24'],    'dose8': ['P18','P20','P22', 'P24']}
}

Print out the first two drugs and their doses in order. 

Note: `itertools.islice(iterable,stop)` is a nice way to slice and iterable because we can't use the same slice formate as we would with a list on an iterable e.g. `list[0:2]`

In [23]:
for drug, doseList in itertools.islice(drugDict.items(), 2):
        for dose, wells in doseList.items():
            print(f'{drug} --> {dose} --> {wells}')

drug1 --> dose1 --> ['A01', 'A03', 'A05', 'A07']
drug1 --> dose2 --> ['C01', 'C03', 'C05', 'C07']
drug1 --> dose3 --> ['E01', 'E03', 'E05', 'E07']
drug1 --> dose4 --> ['G01', 'G03', 'G05', 'G07']
drug1 --> dose5 --> ['I01', 'I03', 'I05', 'I07']
drug1 --> dose6 --> ['K01', 'K03', 'K05', 'K07']
drug1 --> dose7 --> ['M01', 'M03', 'M05', 'M07']
drug1 --> dose8 --> ['O01', 'O03', 'O05', 'O07']
drug2 --> dose1 --> ['A02', 'A04', 'A06', 'A08']
drug2 --> dose2 --> ['C02', 'C04', 'C06', 'C08']
drug2 --> dose3 --> ['E02', 'E04', 'E06', 'E08']
drug2 --> dose4 --> ['G02', 'G04', 'G06', 'G08']
drug2 --> dose5 --> ['I02', 'I04', 'I06', 'I08']
drug2 --> dose6 --> ['K02', 'K04', 'K06', 'K08']
drug2 --> dose7 --> ['M02', 'M04', 'M06', 'M08']
drug2 --> dose8 --> ['O02', 'O04', 'O06', 'O08']


Now try matching the file with the drug and dose using the new nested dictionary data structure. I'll only print out the first drug in the list. Its too large!

In [24]:
for file in fileList:
    r = re.search(REGEXP,file)
    for drug, doseList in itertools.islice(drugDict.items(),1):
        for dose, wells in doseList.items():
            if r.group(2) in wells:
                print(file,'- belongs to --> ', r.group(2), drug, dose)

grn_d6_24hr_posttx_WellI01_Channel405 nm,488 nm,561 nm,640 nm_Seq0192.nd2 - belongs to -->  I01 drug1 dose5
grn_d6_24hr_posttx_WellA01_Channel405 nm,488 nm,561 nm,640 nm_Seq0000.nd2 - belongs to -->  A01 drug1 dose1
grn_d6_24hr_posttx_WellK03_Channel405 nm,488 nm,561 nm,640 nm_Seq0242.nd2 - belongs to -->  K03 drug1 dose6
grn_d6_24hr_posttx_WellM03_Channel405 nm,488 nm,561 nm,640 nm_Seq0290.nd2 - belongs to -->  M03 drug1 dose7
grn_d6_24hr_posttx_WellE03_Channel405 nm,488 nm,561 nm,640 nm_Seq0098.nd2 - belongs to -->  E03 drug1 dose3
grn_d6_24hr_posttx_WellC07_Channel405 nm,488 nm,561 nm,640 nm_Seq0054.nd2 - belongs to -->  C07 drug1 dose2
grn_d6_24hr_posttx_WellI03_Channel405 nm,488 nm,561 nm,640 nm_Seq0194.nd2 - belongs to -->  I03 drug1 dose5
grn_d6_24hr_posttx_WellC01_Channel405 nm,488 nm,561 nm,640 nm_Seq0048.nd2 - belongs to -->  C01 drug1 dose2
grn_d6_24hr_posttx_WellO01_Channel405 nm,488 nm,561 nm,640 nm_Seq0336.nd2 - belongs to -->  O01 drug1 dose8
grn_d6_24hr_posttx_WellA03_C

Let's build upon this more to create the file path of where the file should be moved to

Recall from above that:
`filesDir = '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files'`

In [25]:
fileDestList = []

for file in fileList: # original file names in a list
    r = re.search(REGEXP,file)
    for drug, doseDict in drugDict.items():
        for dose, wells in doseDict.items():
            if r.group(2) in wells:
                destPath = os.path.join(filesDir, drug, dose) # destination path
                
                # I am going to pair the file and destination path into a tuple using the extra parentheses ()
                fileDestList.append((file, destPath))

Here are the first 5 entries of `fileDestList`.  I am unpacking the tuple using `(file, dest)`

In [26]:
for (file, dest) in fileDestList[0:5]:
    print(f'Original File:\t"{file}"\nDestination:\t"{dest}"\n')  

Original File:	"grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2"
Destination:	"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug12/dose1"

Original File:	"grn_d6_24hr_posttx_WellF03_Channel405 nm,488 nm,561 nm,640 nm_Seq0141.nd2"
Destination:	"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug7/dose3"

Original File:	"grn_d6_24hr_posttx_WellA22_Channel405 nm,488 nm,561 nm,640 nm_Seq0021.nd2"
Destination:	"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug6/dose1"

Original File:	"grn_d6_24hr_posttx_WellJ20_Channel405 nm,488 nm,561 nm,640 nm_Seq0220.nd2"
Destination:	"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug12/dose5"

Original File:	"grn_d6_24hr_posttx_WellI01_Channel405 nm,488 nm,561 nm,640 nm_Seq0192.nd2"
Destination:	"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug1/dose5"



Turn it all into a reusable function

In [27]:
def gen_fileDestList(filesDir, fileList, drugDict):
    fileDestList = []

    for file in fileList: # original file names in a list
        r = re.search(REGEXP,file)
        for drug, doseDict in drugDict.items():
            for dose, wells in doseDict.items():
                if r.group(2) in wells:
                    destPath = os.path.join(filesDir, drug, dose) # destination path
                    fileDestList.append((file, destPath))
    
    return fileDestList

---

# Step 6. Putting it all together

Next we want to take this list of tuples that includes the file and destination path and use it to move and rename our files at the same time, by combining what we learned earlier. First thing I am going to try is take the two lists `fileDestList` and `oldNewFilenameList` and pair them together to create the final `moveList`. Although, I am sure there is a more efficient way I could have done this in one go rather than using this extra step.

- `fileDestList` has orginal file name paired with the destination folder

In [28]:
fileDestList[0]

('grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2',
 '/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug12/dose1')

- `oldNewFilenameList` has original filename as well paired with the new name

In [29]:
oldNewFilenameList[0]

('grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2',
 'grn_d6_24hr_posttx_WellB24_Channel_405nm_488nm_561nm_640nm_Seq0024.nd2')

Work out the logic on `moveList` generation. We want to see if the `file` from `fileDestList` is the same as `oldFile` from `oldNewFilenameList`. If so, then we can pair the `dest` / destination of the renamed file / `newFile`.

In [30]:
moveList = [] # a list of tuples with (originalFullFilePath, renamedNewFilePath)

for (file, destPath) in fileDestList:
    for (oldFile, newFile) in oldNewFilenameList:
        if file == oldFile:
            oldFilePath = os.path.join(filesDir, oldFile)
            newFilePath = os.path.join(destPath, newFile)
            
            # Again, add it as a tuple
            moveList.append((oldFilePath, newFilePath))
            
for (source, destination) in moveList[0:2]: # peak at the first entry
    print(f'Source path:\n"{source}"\n\nDestination path:\n"{destination}"\n\n')

Source path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2"

Destination path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug12/dose1/grn_d6_24hr_posttx_WellB24_Channel_405nm_488nm_561nm_640nm_Seq0024.nd2"


Source path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/grn_d6_24hr_posttx_WellF03_Channel405 nm,488 nm,561 nm,640 nm_Seq0141.nd2"

Destination path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug7/dose3/grn_d6_24hr_posttx_WellF03_Channel_405nm_488nm_561nm_640nm_Seq0141.nd2"




Make it into a reusable function

In [31]:
def gen_moveList(fileDestList, oldNewFilenameList):
    moveList = [] # a list of tuples with (originalFullFilePath, renamedNewFilePath)

    for (file, destPath) in fileDestList:
        for (oldFile, newFile) in oldNewFilenameList:
            if file == oldFile:
                oldFilePath = os.path.join(filesDir, oldFile)
                newFilePath = os.path.join(destPath, newFile)

                moveList.append((oldFilePath, newFilePath))

    return moveList

Use the new function to generate the final `moveList`

In [32]:
moveList = gen_moveList(fileDestList, oldNewFilenameList)

for (source, destination) in moveList[0:2]: # peak at the first two entry\ies
    print(f'Source path:\n"{source}"\n\nDestination path:\n"{destination}"\n\n')

Source path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/grn_d6_24hr_posttx_WellB24_Channel405 nm,488 nm,561 nm,640 nm_Seq0024.nd2"

Destination path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug12/dose1/grn_d6_24hr_posttx_WellB24_Channel_405nm_488nm_561nm_640nm_Seq0024.nd2"


Source path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/grn_d6_24hr_posttx_WellF03_Channel405 nm,488 nm,561 nm,640 nm_Seq0141.nd2"

Destination path:
"/Users/uapinyoyingpb/misc-scripts/python_scripts/organize_nd2/test_files/drug7/dose3/grn_d6_24hr_posttx_WellF03_Channel_405nm_488nm_561nm_640nm_Seq0141.nd2"




 I'll make this into a simple function

In [33]:
def move_files(moveList):
    for (source, destination) in moveList:
        shutil.move(source, destination)
    print('*Complete*')

Finally all we have to do now is to move and rename the files!

In [34]:
move_files(moveList)

*Complete*
