### Analyze fluorescein images
#### Implementation notes
- If there are issues running MultiTracker check to see if it is in conflict with OpenCV2. If so, edit your `.bashrc` file to change the PATH from ros to conda while running this script only. Make sure to run `source ~/.bashrc` to update your changes. 

#### List of tasks accomplished in this Jupyter Notebook:
- Examine experimental image to see which encoding best captures differences in color
- Bin experimental photographs into 1mm x 1mm segments. This is done to normalize between photos that are a different number of pixels wide. Each bin is represented by the mean saturation value (S) of all pixels within that 1mm x 1mm segment. Save these reduced data files into a new folder. 
- Subtract the Saturation value in the blank image from the corresponding experiment image for each experiment series. This is done to correct for potential differences in lighting between photographs or experiments. 
- Similarly, subtract the Saturation value in the blank image from the corresponding experiment image for each standardization image. 
- Combine the data from all 1mm x 1mm cells for all standardization images into a single file
- Create a master dataframe of statistics for the standardization dataset
- Create a threshold of concentration for the experimental dataset to use as the 100% dye value 
- Create a linear interpolation between color (HSV saturation) and concentration using the reference dataset
- Use the interpolation between color and concentration to map each experimental saturation value to concentration
- Average the concentration calculations for every time unit (1min, 2min, etc) across all 10 experiments with larvae 
- Average the concentration calculations for every time unit across all 10 experiments without larvae 
- Create a master dataframe containing the concentration in each bin across all 15 minutes
- Create a file containing distances of all bins from the odor source for experiments with larvae (for use in computational modeling)
- Create files to analyze differences in diffusion between experiments with and without larvae
- Fit an exponential line to the distance and concentration dataset for modeling purposes. 

In [None]:
import numpy as np
import pandas as pd
import cv2, glob
import matplotlib.pyplot as plt
import scipy.interpolate

- Examine all experimental images to see which encoding best captures differences in color

*Result:* The saturation channel from HSV captured the greatest variance in color (range of values is highest). Therefore this channel was used in all subsequent analyses to represent color as a single value. 

In [None]:
names = glob.glob("./data/fluorescein/experiments/*-experiment*/crop_experiments/*.jpg")
print(len(names), "files to include in analysis dataset")

color_dict = {"filename":[],"R":[],"G":[],"B":[],"H":[],"S":[],"V":[]}
labels = ["R", "G", "B", "H", "S", "V"]
    
for name in names:
    test_image = cv2.imread(name)
    b = test_image[:,:,0]
    g = test_image[:,:,1]
    r = test_image[:,:,2]

    hsv = cv2.cvtColor(test_image, cv2.COLOR_BGR2HSV)
    h = hsv[:,:,0]
    s = hsv[:,:,1]
    v = hsv[:,:,2]
    
    colors = [r, g, b, h, s, v]
    color_dict["filename"].append(name.split("/")[-1].split("\\")[-1])
    
    for label, color in zip(labels, colors):
        colrange = max(color.flatten()) - min(color.flatten())
        color_dict[label].append(colrange)

# Save the results in a CSV file 
df = pd.DataFrame.from_dict(color_dict)
df.to_csv("./data/fluorescein/color_spaces_variance_results.csv", index=False)
display(df.head())
print("--- All files analyzed ---")

- Bin experimental photographs into 1mm x 1mm segments. This is done to normalize between photos that are a different number of pixels wide. Each bin is represented by the mean saturation value (S) of all pixels within that 1mm x 1mm segment. Save these reduced data files into a new folder. 

In [None]:
def binned_means(array, xbins=80, ybins=30):
    ''' Return the mean value of all numbers 
        binned into the specied number of bins '''
    height, width = array.shape
    xbin_indices = np.linspace(0, width, xbins+1)
    ybin_indices = np.linspace(0, height, ybins+1)
    xbin_indices = [int(x) for x in xbin_indices]
    ybin_indices = [int(x) for x in ybin_indices]
    count = 0
    means = []
    for y in range(len(ybin_indices)-1):
        temp_row = []
        for x in range(len(xbin_indices)-1):
            yindex = ybin_indices[y]
            n_yindex = ybin_indices[y+1]
            xindex = xbin_indices[x]
            n_xindex = xbin_indices[x+1]
            temp_bin = array[yindex:n_yindex, xindex:n_xindex]
            count += temp_bin.size
            temp_row.append(np.mean(temp_bin))
        means.append(temp_row)
        
    # Check that all values in image were sampled in mean calculation
    assert count == array.size 
    
    # Check that array is the right size
    assert len(means) == ybins
    assert len(means[0]) == xbins
    
    return means

names = glob.glob("./data/fluorescein/experiments/*/crop_*/*.jpg")

for name in names: 
    image = cv2.imread(name)
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    s = hsv[:,:,1]
    binned = binned_means(s, xbins=80, ybins=30)
    bin_df = pd.DataFrame(binned)

    name_id = name.split("/")[-1].split("\\")[-1].split(".jpg")[0]
    name_type = name_id.split("-")[-1].split(".jpg")[0]
    name_top = name_id.split("-")[0]+"-experiment-"+name_id.split("-")[1]
    savename = "./data/fluorescein/experiments/"+name_top+"/1mm_csv_"+name_type+"s/"+name_id+".csv"

    bin_df.to_csv(savename, index=None, header=False)

print("--- All files analyzed ---")

- Subtract the Saturation value in the blank image from the corresponding experiment image for each experiment series. This is done to correct for potential differences in lighting between photographs or experiments. 

In [None]:
names = glob.glob("./data/fluorescein/experiments/*experiment*/1mm_csv_experiments/*.csv")

for name in names: 
    try:
        name_id = name.split("/")[-1].split("\\")[-1].split(".csv")[0]
        name_num = name_id.split("-cropped")[0]
        name_type = name_id.split("-")[-1].split(".csv")[0]
        name_top = name_id.split("-")[0]+"-experiment-"+name_id.split("-")[1]
        
        std_name = "./data/fluorescein/experiments/"+name_top+"/1mm_csv_standards/"+name_num+"-cropped-standard.csv"
        savename = "./data/fluorescein/experiments/"+name_top+"/1mm_csv_differences/"+name_num+"-difference.csv"
        
        exp_df = pd.read_csv(name, header=None)
        std_df = pd.read_csv(std_name, header=None)
        exp = np.array(exp_df.values)
        std = np.array(std_df.values)

        # Check that the two dataframes are the same shape
        assert exp_df.shape == std_df.shape
        diff = exp-std

        # Check that the output is the same size as the input
        diff_df = pd.DataFrame(diff)
        assert diff_df.shape == exp_df.shape

        diff_df.to_csv(savename, index=None, header=False)
    except:
        print(exp_name)

print("--- All files analyzed ---")

- Similarly, subtract the Saturation value in the blank image from the corresponding experiment image for each standardization image. 

In [None]:
names = glob.glob("./data/fluorescein/standards/1mm_csv_experiments/*.csv")

for name in names: 
    try:
        name_id = name.split("/")[-1].split("\\")[-1].split("_experiment.csv")[0]
        std_name = "./data/fluorescein/standards/1mm_csv_standards/"+name_id+"_standard.csv"
        savename = "./data/fluorescein/standards/1mm_csv_differences/"+name_id+"_difference.csv"
        
        exp_df = pd.read_csv(name, header=None)
        std_df = pd.read_csv(std_name, header=None)
        exp = np.array(exp_df.values)
        std = np.array(std_df.values)

        # Check that the two dataframes are the same shape
        assert exp_df.shape == std_df.shape
        diff = exp-std

        # Check that the output is the same size as the input
        diff_df = pd.DataFrame(diff)
        assert diff_df.shape == exp_df.shape

        diff_df.to_csv(savename, index=None, header=False)
    except:
        print(exp_name)

print("--- All files analyzed ---")

- Combine the data from all 1mm x 1mm cells for all standardization images into a single file

In [None]:
concentrations = [0.1, 0.5, 0, 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, 100]
df = pd.DataFrame()

for c in concentrations:
    label = str(c).replace(".", "_")
    fname = "./data/fluorescein/standards/1mm_csv_differences/"+label+"_difference.csv"
    temp = pd.read_csv(fname, header=None)
    flat = temp.values.flatten()
    df[c] = flat
    
display(df.describe())
df.to_csv("./data/fluorescein/standardization_saturation_results.csv", index=False)

- Create a master dataframe of statistics for the standardization dataset

In [None]:
df = pd.read_csv("./data/fluorescein/standardization_saturation_results.csv")
stats_df = df.std(axis=0).reset_index(name='std')
stats_df.columns = ["concentration", "std_dev"]
stats_df["mean"] = df.mean(axis=0).values
stats_df["std_err"] = df.sem(axis=0).values
stats_df.to_csv("./data/fluorescein/standardization_saturation_stats.csv", index=False)
stats_df.head()

- Create a threshold of concentration for the experimental dataset to use as the 100% dye value in mapping concentration. This is done because the amount of fluorescein observable at the maximum concentration is unknown. 

In [None]:
def get_interp(vals, interp, xmin, xmax):
    ''' Return the predicted concentration values (0-100) based on color (S) '''
    # Set interpolation value to be within range of standardization dataset
    vals = [min(xmax, val) for val in vals]
    vals = [max(xmin, val) for val in vals]
    
    # Set maximum values to be between range specified by ymin and ymax
    guess = interp(vals)
    return(guess)

fnames = glob.glob("./data/fluorescein/experiments/*experiment*/1mm_csv_differences/*-00-00-difference.csv")
df = pd.read_csv("./data/fluorescein/standardization_saturation_stats.csv")
xmin, xmax = df["mean"].min(), df["mean"].max()
interp = scipy.interpolate.interp1d(df["mean"], df["concentration"])
values = []

for name in fnames:
    temp = pd.read_csv(name, header=None)
    temp_s = temp.apply(get_interp, args=(interp, xmin, xmax), axis=1)
    v = np.array(temp_s.values.tolist()).flatten()
    values.append(v)
    
# Turn into list to pass to histogram matplotlib
values = np.array(values).flatten()
value_df = pd.DataFrame({"values_0min":values})
value_df.to_csv("./data/fluorescein/0min_concentration_values_cutoff_calculation.csv", index=False)

# Make sure that all values are represented
assert values.shape[0] == len(fnames)*80*30
cutoff_percent = 0.95

value_sort = sorted(values)
cutoff_index = int(len(value_sort)*cutoff_percent)
print(value_sort[cutoff_index], "concentration (AU) to use to threshold images")

save_df = pd.DataFrame({"cutoff_concentration":[value_sort[cutoff_index]], "n_bin_samples":[len(value_sort)],
                        "max_data":[max(values)], "min_data":[min(values)], "cutoff_percent":[cutoff_percent]})
save_df.to_csv("./data/fluorescein/cutoff_concentration.csv", index=False)
display(save_df.head())

- Create a linear interpolation between color (HSV saturation) and concentration using the reference dataset
- Use the interpolation between color and concentration to map each experimental saturation value to concentration

In [None]:
def get_interp_bounded(vals, interp, xmin, xmax, ymin, ymax):
    ''' Return the predicted concentration values (0-100) based on color (S) '''
    # Set interpolation value to be within range of standardization dataset
    vals = [min(xmax, val) for val in vals]
    vals = [max(xmin, val) for val in vals]
    
    # Set maximum values to be between range specified by ymin and ymax
    guess = interp(vals)
    guess = [min(ymax, y) for y in guess]
    guess = [max(ymin, y) for y in guess]
    
    # Set interpolated value to be percentage of range in ymin and ymax
    data_range = ymax-ymin
    low_threshold = [y-ymin for y in guess]
    percentages = [100*y/data_range for y in low_threshold]
    
    return(percentages)

df = pd.read_csv("./data/fluorescein/standardization_saturation_stats.csv")
xmin, xmax = df["mean"].min(), df["mean"].max()
interp = scipy.interpolate.interp1d(df["mean"], df["concentration"])
y_df = pd.read_csv("./data/fluorescein/cutoff_concentration.csv")
ymin = y_df["min_data"].values[0]
ymax = y_df["cutoff_concentration"].values[0]
print("Y-min:",ymin, "Y-max:",ymax, "X-min:",xmin, "X-max:",xmax)

names = glob.glob("./data/fluorescein/experiments/*experiment*/1mm_csv_differences/*.csv")
for name in names:
    temp = pd.read_csv(name, header=None)
    temp_s = temp.apply(get_interp_bounded, args=(interp, xmin, xmax, ymin, ymax), axis=0)
    
    name_date = name.split("/")[-1].split("\\")[-1].split("-")[0]
    name_exp = name.split("/")[-1].split("\\")[-1].split("-")[1]
    name_file = name.split("/")[-1].split("\\")[-1].split("-difference")[0]

    savename = "./data/fluorescein/experiments/"+name_date+"-experiment-"\
            +name_exp+"/1mm_csv_concentrations/"+name_file+"_concentrations.csv"
    temp_s.to_csv(savename, index=False, header=False)
    
print("--- All files analyzed ---")

- Average the concentration calculations for every time unit (1min, 2min, etc) across all 10 experiments with larvae 
- Average the concentration calculations for every time unit across all 10 experiments without larvae 

In [None]:
df = pd.read_csv("./data/fluorescein/files_larvae.csv")
larvae = df["larvae_experiments"].values
no_larvae = df["no_larvae_experiments"].values

times = ['00-00', '01-00', '02-00', '03-00', '04-00', '05-00', 
         '06-00', '07-00', '08-00', '09-00', '10-00', '11-00',
         '12-00', '13-00', '14-00', '15-00', 'ss-ss']

animal_dict = {"no_larvae":no_larvae, "larvae":larvae}

for label, animals in animal_dict.items():
    for time in times: 
        timedf = np.zeros((30, 80))
        timename = "./data/fluorescein/experiments/"+label+"_averages_1mm/"+time+"_averages.csv"
        for animal in animals:
            a0, a1 = animal.split("-")[0], animal.split("-")[-1]
            readname = "./data/fluorescein/experiments/"+animal+"/1mm_csv_concentrations/"+ \
                       a0+"-"+a1+"-"+time+"_concentrations.csv"
            df = pd.read_csv(readname, header=None)
            exp = np.array(df.values)

            # Check that the two dataframes are the same shape
            assert exp.shape == timedf.shape
            timedf = timedf + exp

            # Check that the output is the same size as the input
            diff_df = pd.DataFrame(diff)
            assert diff_df.shape == exp_df.shape
        
        timedf = timedf / len(animals)
        timedf = pd.DataFrame(timedf)
        timedf.to_csv(timename, index=False, header=False)
        
print("--- All files analyzed ---")

- Create a master dataframe containing the concentration in each bin across all 15 minutes

In [None]:
time_dict = {0:'00-00', 120:'01-00', 240:'02-00', 360:'03-00', 480:'04-00', 600:'05-00', 
             720:'06-00', 840:'07-00', 960:'08-00', 1080:'09-00', 1200:'10-00', 1320:'11-00',
             1440:'12-00', 1560:'13-00', 1680:'14-00', 1800:'15-00'}

bin_ids = np.arange(0, 80*30, 1)
bin_ids = ["bin_"+str(int(x)) for x in bin_ids]
bin_df = pd.DataFrame(index=time_dict.keys(), columns=bin_ids)
bin_df.index.name = "frames"

for larva in ["larvae", "no_larvae"]:
    for time, name in time_dict.items():
        name = './data/fluorescein/experiments/'+larva+'_averages_1mm/'+name+'_averages.csv'
        # Round values to 2 decimal points so file is not huge
        df = pd.read_csv(name, header=None).round(2)
        array = np.array(df.values)
        rows, columns = array.shape[0], array.shape[1]
        for row in range(rows):
            for column in range(columns):
                ID = 'bin_' + str(row*80 + column)
                bin_df.loc[time, ID] = array[row][column]

    display(bin_df.head())
    bin_df.to_csv("./data/fluorescein/bin_concentration_by_time_"+larva+".csv")
print("--- All files analyzed ---")

- Create a file containing distances of all bins from the odor source for experiments with larvae (for use in computational modeling)

In [None]:
df = pd.read_csv("./data/fluorescein/files_larvae.csv")
larvae = df["larvae_experiments"].values
print(len(larvae) == 10) # There should be 10 experiments with larvae

pairs = []
for larva in larvae[0:1]:
    a0, a1 = larva.split("-")[0], larva.split("-")[-1]
    name = "./data/fluorescein/experiments/"+larva+"/1mm_csv_concentrations/"+ \
                       a0+"-"+a1+"-00-00_concentrations.csv"
    
    df = pd.read_csv(name, header=None)
    array = np.array(df.values)
    rows, columns = array.shape[0], array.shape[1]
    for row in range(rows):
        for column in range(columns):
            distance = np.hypot(row, column)
            value = array[row][column]
            pairs.append([distance, array[row][column]])
            
pair_df = pd.DataFrame(pairs, columns = ["distance_mm", "concentration"])
pair_df.to_csv("./data/fluorescein/distance_concentration_map.csv", index=None)
display(pair_df.tail())
print("--- All files analyzed ---")

- Create files to analyze differences in diffusion between experiments with and without larvae

In [None]:
df = pd.read_csv("./data/fluorescein/files_larvae.csv")
larvae = df["larvae_experiments"].values
no_larvae = df["no_larvae_experiments"].values

times = ['00-00', '01-00', '02-00', '03-00', '04-00', '05-00', 
         '06-00', '07-00', '08-00', '09-00', '10-00', '11-00',
         '12-00', '13-00', '14-00', '15-00']

animal_dict = {"no_larvae":no_larvae, "larvae":larvae}
timevals = []

for label, animals in animal_dict.items():
    for time in times: 
        for animal in animals:
            a0, a1 = animal.split("-")[0], animal.split("-")[-1]
            readname = "./data/fluorescein/experiments/"+animal+"/1mm_csv_concentrations/"+ \
                       a0+"-"+a1+"-"+time+"_concentrations.csv"
            df = pd.read_csv(readname, header=None)
            array = np.array(df.values)
            vals = array.flatten()
            perc = sum(i > 50 for i in vals)/len(vals)
            timevals.append([time, perc, label])
            
timeval_df = pd.DataFrame(timevals, columns = ["time", "perc_over_50", "larva_presence"])
timeval_df["time"] = timeval_df["time"].str.replace("-00", "").astype(float)
timeval_df.to_csv("./data/fluorescein/larvae_no_larvae_comparison.csv", index=None)
display(timeval_df.tail())
print(len(timeval_df)/16 == 20) # Should be 20 experiments - 10 larvae and 10 without larvae
print("--- All files analyzed ---")

- Fit an exponential line to the distance and concentration dataset for modeling purposes. 

In [None]:
df = pd.read_csv("./data/fluorescein/distance_concentration_map.csv")

df = df[df["distance_mm"] > 0]
df = df[df['concentration'] > 0]
x = df["distance_mm"]
y = df["concentration"]

# BLUE: EXPONENTIAL SCALE
a, b = np.polyfit(x, np.log(y), 1)
print("A:", a, "B:", b)

proc_50 = (np.log(50)-b)/a
print("Distance where concentration is 50%:", proc_50)

df["ln_conc"] = np.log(df["concentration"]) # In numpy log is natural log
df.to_csv("./data/fluorescein/distance_concentration_map_fitted.csv", index=False)