## Data organizer

This notebook allows a user to specify a folder containing OPT data with different fluorescence and transmission files and it organizes the files into stacks and outputs them as tiff and hdf5.

In [4]:
# USE THIS TO INSTALL MISSING PACKAGES
import sys
!conda install --yes --prefix {sys.prefix} pandas

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\lchen\AppData\Local\Continuum\anaconda3

  added / updated specs:
    - pandas


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.6.16          |           py37_1         156 KB
    conda-4.7.10               |           py37_0         3.0 MB
    pandas-0.25.0              |   py37ha925a31_0         9.8 MB
    ------------------------------------------------------------
                                           Total:        13.0 MB

The following NEW packages will be INSTALLED:

  pandas             pkgs/main/win-64::pandas-0.25.0-py37ha925a31_0

The following packages will be UPDATED:

  certifi                                  2019.6.16-py37_0 --> 2019.6.16-py37_1
  conda                                      

In [6]:
import os
import pandas as pd

import numpy as np
import h5py

### User input
Set path and indicators that LabView inserts into file names.

In [9]:
given_path = 'D:\\VHIR_july\AP2'

fluor_indicator = 'fluor' # string shared in names of all fluorescent data 
bkgd_indicator = 'bkgd' # string indicating flat-field transmission background image
dark_indicator = 'dark'# string indicating dark-field transmission background image

# Based on LabView naming conventions, what user named each dataset, usually either filter or staining
filter_start = 'T0_'
filter_end = '_View'

### Stand-alone code

Reads in and identifies files.

In [10]:
# March through file structure and collect all files
file_set = []
for path, dirs, files in os.walk(given_path):
    for f in files:
        file_set.append(path+"\\"+f)
        
# Collect only .tif files
tiff_pd = pd.DataFrame({"FileName":[s for s in file_set if '.tif' in s]})

# Identify fluorescence vs transmission
tiff_pd['image_type']= ['fluor' if fluor_indicator in s else 'trans' for s in tiff_pd.FileName]

# Identify transmission bright-field and dark-field data
tiff_pd.loc[(tiff_pd['image_type']=='trans') & 
            (tiff_pd['FileName'].str.contains(bkgd_indicator)), 'image_type'] = 'trans_bkgd'
tiff_pd.loc[(tiff_pd['image_type']=='trans') & 
            (tiff_pd['FileName'].str.contains(dark_indicator)), 'image_type'] = 'trans_dark'

# Get filter name 
tiff_pd['filter_name'] =  [s[s.find(filter_start)+len(filter_start):s.find(filter_end)] for s in tiff_pd.FileName]
tiff_pd.loc[(tiff_pd['image_type']=='trans') , 'filter_name'] = 'trans'
tiff_pd.loc[(tiff_pd['image_type']=='trans_bkgd') , 'filter_name'] = 'trans_bkgd'
tiff_pd.loc[(tiff_pd['image_type']=='trans_dark') , 'filter_name'] = 'trans_dark'

Creates input directory and saves HDF5 files and tiff stacks for reconstruction algorithm.

*Future work: Checks pixel-modes in files to ensure transmission, background, and fluorescence files were properly labelled by user - if there is an anomaly, program doesn't create input stacks and asks user to manually create them.*

In [11]:
# Create input and output directories
dirInput = 'input'
 
try:
    # Creates directory
    os.mkdir(given_path+"\\"+dirInput)
    print("Directory " , dirInput ,  " created ") 
except FileExistsError:
    print("Directory " , dirInput ,  " already exists")
    


Directory  input  already exists


In [12]:
# Output data
import matplotlib.pyplot as plt
from tifffile import imsave

filter_set = list(set(tiff_pd['filter_name']))

for filt in range(len(filter_set)):
    # Creates stack based on filter/image-type
    stack_of_names = []
    for ix in range(tiff_pd.shape[0]):
        if tiff_pd['filter_name'].loc[ix] == filter_set[filt]:
            stack_of_names.append(tiff_pd['FileName'].loc[ix])
    stack = np.array([plt.imread(c) for c in  stack_of_names])
    
    # Saves tiff stacks
    imsave(given_path+'\\input\\' +filter_set[filt]+'.tif', stack)
    
    # Saves stacks in separate hdf5
    file_output_h5 = given_path+'\\input\\' +filter_set[filt]+'.h5'
    indiv = h5py.File(file_output_h5, 'w')
    indiv['data']  = stack
    indiv.close()
''' 
    # Saves to single hdf5 with tree structure
    full_output_h5 = given_path+r'\\input\\all_data.h5'
    if os.path.isfile(full_output_h5)==True:
        os.remove(full_output_h5)
    full =  h5py.File(full_output_h5, 'a')
    if 'trans' in filter_set[filt]:
        full.create_dataset('\\trans\\'+filter_set[filt], data=stack)   
    else:
        full.create_dataset('\\fluor\\'+filter_set[filt], data=stack)  
    full.close()
'''   
print('Complete.')

KeyError: 'O'