Streamlining DNA foci data wrangling

In [None]:
# Standard imports
import numpy as np
import pandas as pd
%matplotlib notebook
import scipy.stats as st
import os # os module to be used to rename files
import matplotlib.pyplot as plt
import cv2
import seaborn as sns 
sns.palplot(sns.color_palette("muted"))
%matplotlib inline
from matplotlib import pyplot 
import datetime as dt

Confirm list of files to be renamed.

In [None]:
#list = os.listdir(‘Src’) : Where Src is the source to be listed out, in this case the Foci folder containing the files
# these are the files to be renamed
list = os.listdir('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_0hr_102819')
list

Rename all files in the directory.  Pictures are originally saved into files with identifying information when taken on the microscope.  The following code will rename each file within the directory with it's own identifying information.  This will make the analysis later on easy to do with meaningful tags on the image files that end up in the final .csv file.  This will be done for each folder in the experiment - in this case 18 folders.  Since I do not want to accidentally mislabel any images I want the control of checking that each one was labeled correctly.  Therefore I check the directory before and after the renaming.  This might seem cumbersome but to rename them all manually could take at least half a day.  I added a datetime stamp to show how fast the renaming step takes.  Experiments don't vary widely in their format, just in treatment.  I can copy the notebooks and have "templates" for renaming the files.  All I have to do is to point to the right directory and maybe change some labels.  Easy!


In [None]:
# Timing an event
now = dt.datetime.now()
print(now)

tic = dt.datetime.now()

# Python program to rename all file names in your directory  
# https://www.geeksforgeeks.org/rename-all-file-names-in-your-directory-using-python/

os.chdir('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_0hr_102819') 
print(os.getcwd()) 
COUNT = 1
  
# Function to increment count  
# to make the files sorted. 
def increment(): 
    global COUNT 
    COUNT = COUNT + 1
  
  
for f in os.listdir(): 
    f_name, f_ext = os.path.splitext(f) 
    f_name = "00" + str(COUNT)+ "_NT_0hr_102819"
    
    increment() 
  
    new_name = '{}{}'.format(f_name, f_ext) 
    os.rename(f, new_name) 

    toc = dt.datetime.now()
eltime = toc-tic
print('Elapsed time: {}'.format(eltime)) 

In [None]:
os.listdir()

In [None]:
list = os.listdir('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_30min_102819')
list

In [None]:

os.chdir('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_30min_102819') 
print(os.getcwd()) 
COUNT = 1
  
def increment(): 
    global COUNT 
    COUNT = COUNT + 1
  
  
for f in os.listdir(): 
    f_name, f_ext = os.path.splitext(f) 
    f_name = "00" + str(COUNT)+ "_NT_30min_102819"
    
    increment() 
  
    new_name = '{}{}'.format(f_name, f_ext) 
    os.rename(f, new_name) 


In [None]:
os.listdir()

In [None]:
list = os.listdir('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_1hr_102819')
list

In [None]:
os.chdir('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_1hr_102819') 
print(os.getcwd()) 
COUNT = 1
  
def increment(): 
    global COUNT 
    COUNT = COUNT + 1
  
  
for f in os.listdir(): 
    f_name, f_ext = os.path.splitext(f) 
    f_name = "00" + str(COUNT)+ "_NT_1hr_102819"
    
    increment() 
  
    new_name = '{}{}'.format(f_name, f_ext) 
    os.rename(f, new_name) 


In [None]:
os.listdir()

The following code moves all of the files from the source folder into a new destination folder.  I have not yet figured out how to just move individual files.  Still a work in progress!  Ideally I want to move all the blue images with prefixes (001, 004, 007, 0010, 0013) into a folder called dapi.  All of the green images (002, 005, 008, 0011, 0014) to a folder called H2AX.  All of the red images (003, 006, 009, 0012, 0015) to a folder called 53BP1. At this point I am happy with the fact that the files did move!  I can easily just move them all using the code into one folder and then manually move each color myself.  

In [None]:
# importing shutil module  
import shutil  
  
# path  
path = 'C:/Users/tximeleta/Desktop/coding/BIOS6644/Data'
  
# Source path  
source = 'C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/NT_0hr_102819'
  
# Destination path, in this example the destination file does not already exist  
destination = 'C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Foci/dapi'
  
# Move the content of  
# source to destination  
dest = shutil.move(source, destination)  
  
# Print path of newly created folder dapi
print("Destination path:", dest)  

After image processing to identify objects using the dapi files and count the foci in the green(H2AX) and red(53BP1) channels, data analysis needs to be done.  The file from JQuantPro can be saved in excel as a .csv file. 
Here is what the file looks like.

In [None]:
df = pd.read_csv('C:/Users/tximeleta/Desktop/coding/BIOS6644/Data/Final_project_dataset_102819exp.csv', sep=',')
df.head()


In [None]:
#change column labels to something easier to call
df.columns = ['label','condition',  'IR_hours', 'file', 'parameters', 'N_of_cells', 'Avg_cell_area', 'min_Foci_N', 'max_foci_N', 'avg_foci_N', 'StDev_Foci_N']
df.head()


Let's run some descriptive analytics on the replicates.

In [None]:
grouped_data = df.groupby(['condition', 'IR_hours'])
df_desc = grouped_data['avg_foci_N'].describe()
df_desc

I would like to graph the above data as a grouped bar graph with stdev, however, I cannot find code to graph the descriptive summary data above.  So let's try something else.  

In [None]:
#Let's pull out just the control condition numbers and plot just to make sure the experiment worked.
df_control= df[0:30]
df_control.head()

In [None]:
#https://python-graph-gallery.com/104-seaborn-themes/
fig, ax = pyplot.subplots(figsize =(15, 7)) 
sns.set_style("dark")
ax = sns.boxplot(x=df_control['IR_hours'], y=df_control['avg_foci_N'])
plt.title("102819:H2AX foci count in control cells", fontweight = 'bold')
ax.set_ylabel('Average Foci per cell', fontweight = 'bold')
ax.set_xlabel('Hr post IR (1Gy)', fontweight = 'bold')

That looks great but I would like to see how the dasatinib and imatinib treated ones differ as that is really what we are testing.  Therefore I usually graph as grouped bar graph.

In [None]:
#Let's pull out just the dasatinib condition numbers and plot just to make sure the experiment worked.
df_dasatinib= df[30:60]
df_dasatinib.head()

In [None]:
#https://python-graph-gallery.com/104-seaborn-themes/
fig, ax = pyplot.subplots(figsize =(15, 7)) 
sns.set_style("dark")
ax = sns.boxplot(x=df_dasatinib['IR_hours'], y=df_dasatinib['avg_foci_N'])
plt.title("102819:H2AX foci count in dasatinib cells", fontweight = 'bold')
ax.set_ylabel('Average Foci per cell', fontweight = 'bold')
ax.set_xlabel('Hr post IR (1Gy)', fontweight = 'bold')

In [None]:
#Let's pull out just the dasatinib condition numbers and plot just to make sure the experiment worked.
df_imatinib= df[60:90]
df_imatinib.head()

In [None]:
#https://python-graph-gallery.com/104-seaborn-themes/
fig, ax = pyplot.subplots(figsize =(15, 7)) 
sns.set_style("dark")
ax = sns.boxplot(x=df_imatinib['IR_hours'], y=df_imatinib['avg_foci_N'])
plt.title("102819:H2AX foci count in imatinib cells", fontweight = 'bold')
ax.set_ylabel('Average Foci per cell', fontweight = 'bold')
ax.set_xlabel('Hr post IR (1Gy)', fontweight = 'bold')
