# HCR Thresholding Notebook 

<br>

# Index of Data Structures in Notebook
   ### does not necessarily include all of the created objects, but the useful ones 
    
    
## DataFrames
#### intDF
    Derived from the full cp output DataFrame (intensityDF)
    Contains the relevant and essential measurements for entire data set 

#### manualDFs (within manual_dict)
    manuals counts by image, probe, and counter
    
#### IIIDF
    dataframe determining the integrated intensity cutoff on an image x counter x probe basis 

#### filteredDF
    contains the number of positive cells for each image based on the median integrated intensity threshold value for a given thresholding method (NumD6 or NumD8)

## Dictionaries

#### projdict
    Dictionary of Mouse IDs (Keys) and associated Projection Types (Values)    
    
#### manual_dict
    dictionary of dataframes of manual counts. Keys indicate which probe and which "stage", i.e. testing or training, 
    the counts are used for 

#### man_images_dict
    dictionary of lists of the specific images in each manual_dict dataframe. Keys are the same as ^^ just with "_images" added on 
    
#### probes_mean_dict
    Dictionary of probe x thresholding methods (Keys) and their associated mean III (Values)

#### probes_median_dict
    Dictionary of probe x thresholding methods (Keys) and their associated median III (Values)
    
## Outputted Thresholded CSVs

#### First group of DFs include just probe 1 positive, probe 2 positive, and double positive. Double positives are counted in probe 1 and 2 positive (and are therefore counted twice).

#### Chosen_IIIDF
	Filters data after threshold is chosen to only include information relevant to NumD6 or NumD8. Shows chosen threshold cells that are positive for one probe or double positive 
	
#### Animal_IIIDF
	Same data included in Chosen IIIDF, but grouped by animal ID with summed total GFP, probe positive, and proportion positive for each of: probe1, probe2, double positive

#### APbin_DF
	Same data included in Chosen IIIDF, but groups overlapping AP bins(9) for each animal / injection / probe. Shows total GFP, probe positive, and proportion positive per bin.


#### DF2s have the same data as previous DF, but double positive cells are removed from the probe 1 positive and probe 2 positive counts. Double negative cells are also added to the counts.

#### Chosen_IIIDF2
	Filters data from Chosen IIIDF to remove double positive cells from probe 1 and probe 2 positive counts. Adds in double negative count. 
	
#### Animal_IIIDF2
	Same data included in Chosen IIIDF, but grouped by animal ID with summed total GFP, probe positive, and proportion positive for each of: probe1, probe2, double positive as well as proportion negative (still under proportion positive column).

#### APbin_DF2
	Same data included in Chosen IIIDF, but groups overlapping AP bins(9) for each animal / injection / probe. Shows total GFP, probe positive, and proportion positive/negative per bin.
____________________________

# Flow

1. The imageDF is used to create numnamedict so that image filenames can be added to the Cell Level DF from CellProfiler

2. The intensityDF is the full Cell Level DF from cellprofiler.

3. tempimageDF is generated and holds image information in different columns ['FileName_RAW', 'Projection Type', 'Mouse ID'] 

4. The intDF is a cell level DF that inherits the identifying information (projectiontype, mouse, filename) from tempimageDF and the relevant measurements (area normalized and non-normalized) for each cell in each image from the intensityDF

5. The CellDF inherits everything from intDF, but renames several columns (Animal ID, Area, probe1_II_NumD6..etc)

6. Chosen_IIIDF inherits data about II from III_med_dict and image info from Cell_DF

7. APbin_DF and Animal_III_DF inherits and reorganizes all data from Chosen_IIIDF



# Importing Packages and Data 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
import scipy.stats as stats
import statsmodels.api as sm
import os
from zipfile import ZipFile
import tabulate as tabulate
from pandas.plotting import table
#import dataframe_image as dfi

## Assign Probe IDs

In [None]:
Probe1 = "Slc30a3"
Probe2 = "Spp1"

Person = "LM"

## Define Probe Color Dict

In [None]:
Double = Probe1+Probe2
Neg = "Double Negative"

probecolordict = {'Kcng2': 'magenta',
                  'Spp1' : 'cyan',
                  'Pax6' : 'cyan',
                  'Nos1' : 'red',
                  'Slc30a3' : 'magenta',
                  'Slc30a3Spp1': 'darkblue',
                  'Nos1Spp1': 'darkblue',
                  'Kcng2Pax6': 'darkblue',
                  'Double Negative': 'gray'}

injcolordict = {'SC': 'limegreen',
                'PRN': 'darkviolet',
                'MRN': 'limegreen',
                'Gi': 'darkviolet',
                'dmPAG': 'orange',
                'POm': 'skyblue'}

#for barplots in combo bar-swarm plots
probecolordict2 = {'Kcng2': 'magenta',
                  'Spp1' : 'cyan',
                  'Pax6' : 'cyan',
                  'Nos1' : 'red',
                  'Slc30a3' : 'magenta',
                  'Slc30a3Spp1': 'steelblue',
                  'Nos1Spp1': 'steelblue',
                  'Kcng2Pax6': 'steelblue',
                  'Double Negative': 'lightgray'}

## Mouse ID to Projection Target Dictionary (keys = Mouse ID, values = Projection Target)

In [None]:
projdict = {'4-5-4f': 'SC' ,
            '4-5-5': 'SC', 
            '7309-5': 'dmPAG',
            'B1-4a-1R': 'dmPAG',
            '7376-1': 'PRN', 
            '7376-2': 'PRN',
            '1907-2': 'PRN',
            '1907-3': 'PRN',
            'WT1-5b-1R': 'PRN',
            'WT1-5b-none': 'PRN',
            '1907-4': 'SC',
            '1974-8': 'SC',
            'WT1-5a-none': 'SC',
            'WT2-9-1R': 'SC',
            ##New cases imaged on FV3000 below:
            'JAX-10F-1R': 'dmPAG',
            'JAX-10F-NE': 'dmPAG',
            'JAX-14F-1R': 'dmPAG',
            'JAX-MM-NE': 'dmPAG',
            'JAX-12F-1L': 'Gi',
            'JAX-12F-1R': 'Gi',
            'JAX-14M-1R': 'Gi',
            'JAX-15M-1L': 'Gi',
            'JAX-12M-1L': 'MRN',
            'JAX-12M-1R': 'MRN',
            'JAX-12M-NE': 'MRN',
            'JAX-14M-1L': 'MRN',
            'JAX-10M-1R': 'POm',
            'JAX-10M-NE': 'POm',
            'JAX-13F-1R': 'POm',
            'JAX-13M-1R': 'POm'}

#JAX-15M-1R is the same as 13M-1R

## Provide filepath to Anterior-Posterior Position File to Create an Image x AP Dictionary

In [None]:
if Probe1 == "Slc30a3" and Person == "LM":
    AP_DF = pd.read_excel("/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/AP_positions_SlcSpp1_UpdatedNames_LM.xlsx")
elif Probe1 == "Slc30a3" and Person == "NR":
    AP_DF = pd.read_excel("/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Slc30a3Cy5_Spp1Cy3_RvGFP/AP_positions_Slc30a3Spp1_LMNameEdit.xlsx")
elif Probe1 == "Kcng2":
    AP_DF = pd.read_excel("/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Kcng2Cy5_Pax6Cy3_RvGFP/AP_positions_Kcng2Pax6_new.xlsx")

AP_dict = dict(zip(AP_DF["FileName_RAW"].to_list(),AP_DF["AP_Position"].to_list()))

## Provide CellProfiler Output Filepaths

In [None]:
if Probe1 == "Kcng2":
    CP_Cell_level_filepath= '/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Kcng2_Pax6/CellProfilerOutput/Kcng2Pax6_10-17-23FilteredAllRV.csv'
elif Probe1 == "Slc30a3" and Person == "LM":
    CP_Cell_level_filepath = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/CellProfilerOutput/2024_01_24/Slc30a3Spp1_2024_01_24FilteredAllRV.csv"
elif Probe1 == "Slc30a3" and Person == "NR":
    CP_Cell_level_filepath = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Slc30a3Cy5_Spp1Cy3_RvGFP/CellProfilerOutput/12-8-22/Slc30a3_Spp1_12-8-22FilteredAllGFP.csv"
elif Probe1 == "Nos1":
    CP_Cell_level_filepath = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Nos1Cy5_Spp1Cy3_RvGFP/CellProfilerOutput/1-12-23/higher_NumDs/Nos1_Spp1_1-12-23_higher_NumDsFilteredAllGFP.csv"

if CP_Cell_level_filepath.find("FilteredAllRV") != -1:
    CP_Image_level_filepath = CP_Cell_level_filepath[0:CP_Cell_level_filepath.find("FilteredAllRV.csv")] +"Image.csv"
elif CP_Cell_level_filepath.find("FilteredAllGFP") != -1:
    CP_Image_level_filepath = CP_Cell_level_filepath[0:CP_Cell_level_filepath.find("FilteredAllGFP.csv")] +"Image.csv"

## Specify Path For Saving Thresholding Related Figures

In [None]:
image_save_path = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/Python_Analysis/figures/test_sliding_Scale"

## Specify Path For Saving CSV Output 
(Which Will Serve As Input to The Plotting Notebook)

In [None]:
csv_save_path = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/Python_Analysis/Thresholded_CSVS"

## Provide Filepaths for Manual Counts

In [None]:
#get list of files 

if Probe1 == "Kcng2" and Person == "LM":
    Probe1_train = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Kcng2_Pax6/ManualCounts/Kcng2_manual"
    Probe1_test = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Kcng2_Pax6/ManualCounts/Kcng2_manual/Testing"
    Probe2_train = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Kcng2_Pax6/ManualCounts/Pax6_manual"
    Probe2_test = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Kcng2_Pax6/ManualCounts/Pax6_manual/Testing"
elif Probe1 == "Kcng2" and Person == "NR":
    Probe1_train = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Kcng2Cy5_Pax6Cy3_RvGFP/RyanNita_CellPoseManualCounts/ManualKcng2ROIs"
    Probe1_test = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Kcng2Cy5_Pax6Cy3_RvGFP/RyanNita_CellPoseManualCounts/ManualKcng2ROIs/Testing_Set"
    Probe2_train = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Kcng2Cy5_Pax6Cy3_RvGFP/RyanNita_CellPoseManualCounts/ManualPax6ROIs/wThresholding"
    Probe2_test = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Kcng2Cy5_Pax6Cy3_RvGFP/RyanNita_CellPoseManualCounts/ManualPax6ROIs/wThresholding/Testing_Set"
elif Probe1== "Slc30a3" and Person == "LM":
    Probe1_train = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/ManualCounts/Slc30a3_manual"
    Probe1_test = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/ManualCounts/Slc30a3_manual/Testing"
    Probe2_train ="/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/ManualCounts/Spp1_manual"
    Probe2_test = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Slc30a3_Spp1/ManualCounts/Spp1_manual/Testing"
elif Probe1== "Slc30a3" and Person == "NR":
    Probe1_train = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Slc30a3Cy5_Spp1Cy3_RvGFP/Manual_Counts/ManualSlc30a3ROIs/Training_Set"
    Probe1_test = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Slc30a3Cy5_Spp1Cy3_RvGFP/Manual_Counts/ManualSlc30a3ROIs/Testing_Set"
    Probe2_train ="/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Slc30a3Cy5_Spp1Cy3_RvGFP/Manual_Counts/ManualSpp1ROIs/Training_Set"
    Probe2_test = "/mnt/share/RKDATA/ConfocalData/SDC/HCR/RV-dG-GFP_40X_TrueBlack/Data/Slc30a3Cy5_Spp1Cy3_RvGFP/Manual_Counts/ManualSpp1ROIs/Testing_Set"
elif Probe1 == "Nos1":
    Probe1_train = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Nos1_Spp1/ManualCounts/Nos1_manual"
    Probe1_test = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Nos1_Spp1/ManualCounts/Nos1_manual/Testing"
    Probe2_train ="/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Nos1_Spp1/ManualCounts/Spp1_manual"
    Probe2_test = "/mnt/share/RKDATA/ConfocalData/FV3000/HCR/HCR_FINAL/Analysis/Nos1_Spp1/ManualCounts/Spp1_manual/Testing"

file_path_list = [Probe1_train, Probe2_train, Probe1_test, Probe2_test]

## Option to use cell area as a normalizing factor

In [None]:
#We ARE using area normalization as of 102323
normalize = True

if normalize == True:
    norm_save = "Area_normalized"
elif normalize == False:
    norm_save= "not_normalized" 

## Create DataFrames from the Image Level and Cell Level .csv output from Cell Profiler

In [None]:
intensityDF = pd.read_csv(CP_Cell_level_filepath)
imageDF = pd.read_csv(CP_Image_level_filepath)
numnamedict = dict(zip(imageDF['ImageNumber'].tolist(), imageDF['FileName_RAW'].tolist()))

filelist = []
for i in range(len(intensityDF)):
    filelist.append(numnamedict[intensityDF.iloc[i]['ImageNumber']])

intensityDF['FileName_RAW'] = filelist

tempimageDF = pd.DataFrame()
tempimageDF.loc[:, ('ImageNumber', 'FileName_RAW')] = imageDF[['ImageNumber','FileName_RAW']]

In [None]:
tempimageDF

In [None]:
imageDF

# Reformat FV3000 filenames to match the requirements of SDC processing scripts

In [None]:
FV3000_SDC_dict = dict(zip(tempimageDF['FileName_RAW'].tolist(),tempimageDF['FileName_RAW'].tolist()))

In [None]:
FV3000_SDC_dict

In [None]:
#Extract unique/useful part of the FileName_RAW strings
imagelist = []

for i in range(len(tempimageDF)):
    name = (tempimageDF.iloc[i]['FileName_RAW'])
    imagelist.append(name)


In [None]:
tempimageDF.loc[:,'FileName_RAW'] =  imagelist

In [None]:
#Add the Mouse ID associated with each image to the intDF
mouselist = []

for i in range(len(imagelist)):
    x = imagelist[i]
    y = x[4: x.find('-S')]
    mouselist.append(y)
    
tempimageDF.insert(0, 'Mouse ID', mouselist)

In [None]:
#Use the projdict and the MouseID portion of the FileName to create a Projection Type Column
projectlist = []

for i in range(len(tempimageDF)):
    if tempimageDF.loc[i,'Mouse ID'] == '':
        projectlist.append('NaN')
    else:
        projection = projdict[tempimageDF.loc[i,'Mouse ID']]
        projectlist.append(projection)

tempimageDF.insert(0,'Projection Type',projectlist)

In [None]:
#Trim the data down to the columns related to the  Intensity and Area
cols = [x for x in intensityDF.columns.tolist() if ((x.find('Integrated') != -1) & 
                                                    (x.find('Edge') == -1)) or 
                                                    (x.find('Area') != -1)]

cols.append('ImageNumber')

In [None]:
intDF = pd.DataFrame()
intDF.loc[:, cols] = intensityDF[cols]

FileNameList = []
ProjList = []
MouseList = []

for i in range(len(intDF)):
    img = intDF.loc[i,'ImageNumber']
    FileNameList.append(tempimageDF[tempimageDF['ImageNumber'] == img]['FileName_RAW'].iloc[0])
    ProjList.append(tempimageDF[tempimageDF['ImageNumber'] == img]['Projection Type'].iloc[0])
    MouseList.append(tempimageDF[tempimageDF['ImageNumber'] == img]['Mouse ID'].iloc[0])

intDF.insert(0, 'Projection Type', ProjList)
intDF.insert(1,'Mouse ID', MouseList)
intDF.insert(2, 'FileName_RAW', FileNameList)

keep_col = ["Projection Type","Mouse ID","FileName_RAW","AreaShape_Area",
             "Intensity_IntegratedIntensity_Final_Threshold_C1_NumD6",
             'Intensity_IntegratedIntensity_Final_Threshold_C1_NumD8',
             'Intensity_IntegratedIntensity_Final_Threshold_C2_NumD6',
             'Intensity_IntegratedIntensity_Final_Threshold_C2_NumD8']
drop_col = []
for col in intDF.columns:
    if col in keep_col:
        pass
    else:
        drop_col.append(col)

intDF.drop(drop_col, axis = 1, inplace= True)

In [None]:
intDF
intDF.loc[:,Probe1+"_NumD6_norm"] = intDF["Intensity_IntegratedIntensity_Final_Threshold_C1_NumD6"]/intDF["AreaShape_Area"]
intDF.loc[:, Probe1+"_NumD8_norm"] = intDF["Intensity_IntegratedIntensity_Final_Threshold_C1_NumD8"]/intDF["AreaShape_Area"]
intDF.loc[:, Probe2+"_NumD6_norm"] = intDF["Intensity_IntegratedIntensity_Final_Threshold_C2_NumD6"]/intDF["AreaShape_Area"]
intDF.loc[:, Probe2+"_NumD8_norm"] = intDF["Intensity_IntegratedIntensity_Final_Threshold_C2_NumD8"]/intDF["AreaShape_Area"]

In [None]:
intDF

In [None]:
FV3000_SDC_AP_dict = {}

for key in AP_dict.keys():
    if type(key) == str:
        print(key)
        
        if key in FV3000_SDC_dict.keys():
            FV3000_SDC_AP_dict[FV3000_SDC_dict[key]] = AP_dict[key]

In [None]:
cellDF = intDF.rename(columns = {"Intensity_IntegratedIntensity_Final_Threshold_C1_NumD6" : Probe1+"_II_NumD6",
                                "Intensity_IntegratedIntensity_Final_Threshold_C1_NumD8": Probe1+"_II_NumD8",
                                "Intensity_IntegratedIntensity_Final_Threshold_C2_NumD6": Probe2+"_II_NumD6",
                                "Intensity_IntegratedIntensity_Final_Threshold_C2_NumD8": Probe2+"_II_NumD8",
                                "Mouse ID": "Animal ID",
                                "AreaShape_Area":"Area"})
cellDF.name = "cellDF"

#adding in bregma position of images

AP_list = []

for i in range(len(intDF)):
    AP_list.append(FV3000_SDC_AP_dict[intDF.iloc[i]["FileName_RAW"]])
    
intDF["AP_position"] = AP_list


# Import Manual Counts from ZIP files

In [None]:
print(file_path_list)

In [None]:
#getting manual counts from # of ROIs in ZIP files for each image, and keeping the probes in separate DFs

manual_df_list = []


for file_path in file_path_list:
    counter_list = []
    image_list= []
    probe_list = []
    count_list = []
    mouse_list = []
    projection_list = []
    
    filelist = os.listdir(file_path)
    for file in filelist:
        key = file
        
        if Person == "LM":
            probe = file_path[file_path.find('ManualCounts/')+13:file_path.find('_manual')]
        elif Person == "NR":
            probe = file_path[file_path.find('Counts/Manual')+13:file_path.find('ROIs')]
        
              
        temp_name = ""    
        if file.find("-Kcng2Counts") != -1:
            temp_name = file[0:-19]
        elif file.find("-Kcng2counts") != -1:
            temp_name = file[0:-19]
        elif file.find("-Pax6Counts") != -1:
            temp_name = file[0:-18]
        elif file.find("-Pax6counts") != -1:
            temp_name = file[0:-18]
        elif file.find("-Slc30a3Counts") != -1:
            temp_name = file[0:-21]
        elif file.find("-Spp1counts") != -1:
            temp_name = file[0:-18]
        elif file.find("-Nos1counts") != -1:
            temp_name = file[0:-18]
        
        image = temp_name
        
        if file.find("-Kcng2Counts") != -1:
            image = file[0:-19]
        elif file.find("-Kcng2counts") != -1:
            image = file[0:-19]
        elif file_path.find("ManualKcng2") != -1:
            image = file[0:-7]
        elif file.find("-Pax6Counts") != -1:
            image = file[0:-18]
        elif file.find("-Pax6counts") != -1:
            image = file[0:-18]
        elif file_path.find("ManualPax6") != -1:
            image = gile[0:-7]
        elif file.find("-Slc30a3Counts") != -1:
            image = file[0:-21]
        elif file.find("_Slc30a3_") != -1:
            image = file[0:-15]
        elif file.find("-Spp1Counts") != -1:
            image = file[0:-18]
        elif file.find("_Spp1_") != -1:
            image = file[0:-12]
        elif file.find("-Nos1Counts") != -1:
            image = file[0:-18]
        elif file.find("_Nos1_") != -1:
            image = file[0:-12]
        
        if file.find(".zip") != -1:
            final_path = file_path+"/"+file
            with ZipFile(final_path) as archive:
                count = len(archive.infolist())
            #counter = file[file.find("ounts-") +6 : file.find(".zip")] #use for NR data
            counter = file[-6 : file.find(".zip")] #use for LM data
            mouse = key[4:key.find("-S")] 
            image_list.append(image + ".tif")
            probe_list.append(probe)
            counter_list.append(counter)
            count_list.append(count)
            mouse_list.append(mouse)
            projection_list.append(projdict[mouse])

    manualDF = pd.DataFrame()
    manualDF["Probe"] = probe_list
    manualDF["Number of Std Devs"] = 0
    manualDF["Projection Type"] = projection_list
    manualDF["Count_Method"] = counter_list
    manualDF["Mouse ID"] = mouse_list
    manualDF["Count_GFPcells"] = 0
    manualDF["ImageNumber"] = 0
    manualDF["FileName_RAW"] = image_list
    manualDF["Number Probe Positive"] = count_list
    
    manual_df_list.append(manualDF)

manual_dict = dict(zip(["Probe1_train", "Probe2_train", "Probe1_test","Probe2_test"],manual_df_list))

In [None]:
#inspect manual_dict to ensure the data is correct

manual_dict['Probe2_train']

In [None]:
#getting total number of GFP cells to add into manualDFs

for manualDF in manual_dict.values():
    
    gfp_list = []
    for image in manualDF["FileName_RAW"]:
        tempDF = cellDF[cellDF["FileName_RAW"] == image]
        gfp = len(tempDF)
        gfp_list.append(gfp)
    

    manualDF["Count_GFPcells"] = gfp_list

In [None]:
manualDF

In [None]:

#Make a lists of the images in each manualDF
man_image_list = []

for manualDF in manual_dict.values():
    man_images = manualDF["FileName_RAW"].unique().tolist()
    man_image_list.append(man_images)
    
    
#check that all of the manually counted images were run in the pipeline
for man_image in man_image_list:
    for x in man_images:
        if x not in image_list:
            print(x)

man_image_dict = dict(zip(["Probe1_train_images", "Probe2_train_images", "Probe1_test_images", "Probe2_test_images"],
                         man_image_list))



In [None]:
manualDF

In [None]:
#RK Description of this cell on 1-15-22
#Identify the IntegratedIntensity Threshold (III, integrated intensity intersection)
#that would produce the same number of cells as the manual counts for each thresholding method, image, and manual counter combination

IIIDF_list= []

if normalize == True:
    key_phrase= "_norm"
elif normalize == False:
    key_phrase = '_II_'

#only going through the training manual counts to determine the integrated intensity intersection

for key in ["Probe1_train", "Probe2_train"]:
    tempIIIDF = pd.DataFrame()
    imagelist = []
    probelist = []
    methodlist = []
    counterlist = []
    intersectionlist = []
    
    manualDF = manual_dict[key]
    man_images = man_image_dict[key+"_images"]
    probe = manualDF["Probe"][0]
   
    for m in cellDF.columns:
        proceed = 0
        if m.find(key_phrase) != -1:
           
            if m.find(Probe1) != -1 and probe == Probe1:
                proceed = 1
            elif m.find(Probe2) != -1 and probe == Probe2 :
                proceed = 1
        
        if proceed >= 1:
            for person in manualDF["Count_Method"].unique():
                for image in manualDF[manualDF["Count_Method"] == person]["FileName_RAW"]:
                    manny = manualDF[(manualDF['FileName_RAW'] == image)&(manualDF["Count_Method"] == person)]
                    val = manny['Number Probe Positive'].values[0] ###LM_edit: I switched .loc to values 
                    x = cellDF[cellDF['FileName_RAW'] == image]
                    y = x.sort_values(by = m, ascending = False).reset_index()
                    III = (float(y.iloc[val][m]) + float(y.iloc[val-1][m])) / 2

                    imagelist.append(image)
                    probelist.append(probe)
                    methodlist.append(m)
                    counterlist.append(person)
                    intersectionlist.append(III)
                    
                    




    tempIIIDF['Image'] = imagelist
    tempIIIDF['Probe'] = probelist
    tempIIIDF['Method'] = methodlist
    tempIIIDF['Counter'] = counterlist
    tempIIIDF['Intersection'] = intersectionlist
    IIIDF_list.append(tempIIIDF)

#agDF= pd.concat(ag)
#agDF

#combining the training data from the two probe combos into one DF at this point, 
#though easy enough to change the following code to have separate DFs for the different probes
IIIDF = pd.concat(IIIDF_list)

In [None]:
#Generate dictionaries of thresholding method and the corresponding integrated intensity intersection mean or 
#integrated intensity intersection median

IIImean = IIIDF.groupby(['Method', "Image"]).mean(numeric_only=True).groupby(["Method"]).mean(numeric_only=True).reset_index()
IIImed = IIIDF.groupby(['Method', "Image"]).mean(numeric_only=True).groupby(["Method"]).median(numeric_only=True).reset_index()

probes_mean = []
probes_med = []

for m in range(len(IIImean)):
    x = (IIImean.iloc[m]["Method"], IIImean.iloc[m]["Intersection"])
    probes_mean.append(x)
    
for m in range(len(IIImed)):
    y = (IIImed.iloc[m]["Method"], IIImed.iloc[m]["Intersection"])
    probes_med.append(y)
    
III_mean_dict = dict(probes_mean)
III_med_dict = dict(probes_med)
III_med_dict

## Produce a dataframe that contains only the cells that meet the manually validated, normalized integrated intensity threshold

In [None]:
#Generate a DF that contains the Percentage of Probe+ Cells for each image, based on the probeXmethod appropriate 
#integrated intensity cutoff value from the IIIDF

filteredDF = pd.DataFrame()

dfimagelist = []
dfmethodlist = []
dfcells = []
gfplist = [] 
threshlist = []
relthreshlist = []
probelist = []
stage_list = []

for key in manual_dict.keys():
    
    manualDF = manual_dict[key]
    probee = manualDF["Probe"][0]
    man_images = man_image_dict[key + "_images"]
    methodlist = IIIDF[IIIDF['Probe'] == probee]['Method'].unique().tolist()
    
    for meth in methodlist:

            median = float(III_med_dict[meth])
            
            for image in man_images:
                if key.find("train") != -1:
                    stage = "train"
                elif key.find("test") !=-1:
                    stage = "test"
               
                x = cellDF[cellDF['FileName_RAW'] == image]
                gfp = len(x)
                numcells = len(x[x[meth] >= median])
                dfimagelist.append(image)
                dfmethodlist.append(meth)
                dfcells.append(numcells)
                gfplist.append(gfp)
                threshlist.append(median)
                probelist.append(probee)
                stage_list.append(stage)

filteredDF['FileName_RAW'] = dfimagelist
filteredDF['Probe'] = probelist
filteredDF["Stage"] = stage_list
filteredDF['Method'] = dfmethodlist
filteredDF['Threshold'] = threshlist
filteredDF['Probe+ Cells'] = dfcells
filteredDF['Number GFP Cells'] = gfplist
filteredDF['Percent Positive'] = filteredDF['Probe+ Cells'] / filteredDF['Number GFP Cells']

#Split the method into LOF and NumD columns for facetgrids and lmplots to be split up into LOF and NumD cols and rows

if normalize == True:
    new = filteredDF['Method'].str.rsplit("_", n = 2, expand = True) 
    filteredDF['NumD'] = new[1]
    
elif normalize == False:
    new = filteredDF['Method'].str.rsplit("_", n = 3, expand = True) 
    filteredDF['NumD'] = new[2]

In [None]:
#### This can show you the full filtered DF
pd.set_option("display.max_rows", None, "display.max_columns", None)
filteredDF

In [None]:
#generating DF of manual counts to append to filteredDF
temp = pd.concat(manual_dict.values()).groupby(['FileName_RAW','Probe']).mean(numeric_only=True).reset_index() 
man_ave = pd.DataFrame()

#Specifically Extracting only the columns of interest from the manualDFs
man_ave['FileName_RAW'] = temp["FileName_RAW"] 
man_ave['Probe'] = temp['Probe'] 
man_ave['True Threshold'] = 'Manual' 
man_ave['Relative Threshold'] = 'Manual' 
man_ave['Probe+ Cells'] = temp["Number Probe Positive"]
man_ave['Number GFP Cells']= temp['Count_GFPcells'] 
man_ave['Percent Positive'] = temp["Number Probe Positive"]/temp['Count_GFPcells'] 

In [None]:
#Create a Manual Percent Positive Column for the filteredDF
perc_list_man = []
ag = []

for i in range(len(filteredDF)):
    
        x = man_ave[(man_ave['FileName_RAW'] == filteredDF.iloc[i]['FileName_RAW']) &
                (man_ave['Probe'] == filteredDF.iloc[i]['Probe'])]['Percent Positive']
        
        if len(x) > 0: 
            perc_list_man.append(float(x))
            
        else: 
            perc_list_man.append("nan")
            ag.append(x)
        
filteredDF['Manual Percent Positive'] = perc_list_man


## Graphing Correlation between manual and automated colocalization

In [None]:
sns.set_theme(style = 'white', font_scale=1.5)

In [None]:
PROBE = Probe1
Title = "Correlation between manual and automated counts\ntraining vs test"
image_name = "man_vs_auto_lmplot-"+PROBE

g = sns.lmplot(data = filteredDF[filteredDF['Probe'] == PROBE],
                  x = 'Manual Percent Positive', 
                  y = 'Percent Positive',
                  row = "Stage",
                  col = 'NumD',
                  line_kws = {'color': probecolordict[PROBE]},
                  scatter_kws =  {'color': probecolordict[PROBE]})


def annotate(data, **kws):
    line = stats.linregress(data['Manual Percent Positive'], data['Percent Positive'])
    ax = plt.gca()

    ax.text(.05, .8, 'R2={:.2}, p={:.2}\n f(x)={:.2}x+{:.2}'.format(line.rvalue**2, 
                                                            line.pvalue, line.slope, line.intercept),
            transform=ax.transAxes)
    
g.set(ylim = (0,1), xlim = (0,1))
g.map_dataframe(annotate)

g.set_xlabels('Manual ' + PROBE + '+ Percent Positive')

g.set_ylabels('Cell Profiler\n' + PROBE + '+ Percent Positive')

g.set_titles(col_template="{col_name}" ,row_template="{row_name}")

g.tight_layout()
g.fig.suptitle(Title+"\n"+"Normalized by area = " + str(normalize))
g.fig.subplots_adjust(top=0.9) 
g.fig.set_size_inches(15, 15)
#plt.savefig(image_save_path+"/"+"NumD6vNumD8/"+image_name+".svg", format= "svg", bbox_inches = 'tight' )

In [None]:
plt.figure(dpi =300, tight_layout=True)
PROBE = Probe1
SET = 'test'
METHOD = 'NumD8'

if SET == 'train':
    Name = 'TRAIN'
    TitleLabel = 'Training'
    
if SET == 'test':
    Name = 'TEST'
    TitleLabel = 'Test'
    
    

image_name = "ChosenMethod_man_vs_auto_lmplot" + Name + "_" + PROBE

g = sns.lmplot(data = filteredDF[(filteredDF['Probe'] == PROBE) &
                                 (filteredDF['Stage'] == SET) &
                                 (filteredDF['NumD'] == METHOD)],
                  x = 'Manual Percent Positive', 
                  y = 'Percent Positive',
                  line_kws = {'color': probecolordict[PROBE]},
                  scatter_kws =  {'color': probecolordict[PROBE]})


def annotate(data, **kws):
    line = stats.linregress(data['Manual Percent Positive'], data['Percent Positive'])
    ax = plt.gca()

    ax.text(.05, .8, 'R2={:.2}, p={:.2}\n f(x)={:.2}x+{:.2}'.format(line.rvalue**2, 
                                                            line.pvalue, line.slope, line.intercept),
            transform=ax.transAxes)
    
g.set(ylim = (0,1), xlim = (0,1))
g.map_dataframe(annotate)

plt.title("Correlation Between Manual and\nAutomated Quantification on " + TitleLabel + " Images",pad=20)
g.set_xlabels('Manual Quantificaiton\nProportion ' + PROBE + ' Positive')
g.set_ylabels('Cell Profiler\nProportion ' + PROBE + ' Positive')

#g.set_titles(col_template="{col_name}" ,row_template = "{row_name}")
###plt.savefig(image_save_path + "/NumD6vNumD8/" + image_name + ".svg", format= "svg", bbox_inches = 'tight')

In [None]:
PROBE = Probe2
Title = "Correlation between manual and automated counts\ntraining vs test"
image_name = "man_vs_auto_lmplot-"+PROBE
g = sns.lmplot(data = filteredDF[filteredDF['Probe'] == PROBE],
                  x = 'Manual Percent Positive', 
                  y = 'Percent Positive',
                  row = "Stage",
                   col = 'NumD',
                  line_kws = {'color': probecolordict[PROBE]},
                  scatter_kws =  {'color': probecolordict[PROBE]})


def annotate(data, **kws):
    line = stats.linregress(data['Manual Percent Positive'], data['Percent Positive'])
    ax = plt.gca()

    ax.text(.05, .8, 'R2={:.2}, p={:.2}\n f(x)={:.2}x+{:.2}'.format(line.rvalue**2, 
                                                            line.pvalue, line.slope, line.intercept),
            transform=ax.transAxes)
    
g.set(ylim = (0,1), xlim = (0,1))
g.map_dataframe(annotate)
g.set_xlabels('Manual ' + PROBE + '+ Percent Positive')
g.set_ylabels('Cell Profiler\n' + PROBE + '+ Percent Positive')
g.set_titles(col_template="{col_name}" ,row_template="{row_name}")
g.tight_layout()
g.fig.suptitle(Title+"\n"+"Normalized by area = " + str(normalize))
g.fig.subplots_adjust(top=0.9) 
g.fig.set_size_inches(15, 15)
###plt.savefig(image_save_path+"/"+"NumD6vNumD8/"+image_name+".svg", format= "svg", bbox_inches = 'tight' )

# Pixel Thresholding
## Selection of pixel thresholding method for each probe based on R^2 value

In [None]:
#add in the Person qualifier to make sure I'm using the correct thresholding for NR data vs LM data
if Probe1 == "Kcng2":
    P1NumD="_NumD6"
    P2NumD="_NumD8" ##RK Changed on 10-23-23
elif Probe1 == "Slc30a3" and Person == "NR":
    P1NumD="_NumD6"
    P2NumD="_NumD8"
elif Probe1 == "Slc30a3" and Person == "LM":
    P1NumD="_NumD8"
    P2NumD="_NumD6"
elif Probe1 == "Nos1":
    P1NumD="_NumD8"
    P2NumD="_NumD8"
    
if normalize == False:
    probe1_method = Probe1 +"_II"+P1NumD
    probe2_method = Probe2 +"_II"+P2NumD
elif normalize == True:
    probe1_method = Probe1+P1NumD+"_norm"
    probe2_method =Probe2+P2NumD+"_norm"

## Sliding Scale for Cutoffs

In [None]:
#DF for percent pos on all images based on II cutoff values

imagelist = []
probelist = []
probe_pos = []
animal_list= []
inj_list = []
per_pos = []
all_GFP = []
method_list=[]
relativity_list = []
cutoff_list = []
sliding_IIDF = pd.DataFrame()

if normalize == False:
    range_list = [-8,-4,0,4,8,16,32]
elif normalize == True:
    range_list= [-0.008, -0.004, 0, 0.004, 0.008, 0.016, 0.032] #(np.array(range(-1,30,3))*(1/1000)).tolist()
II_range = dict(zip(str(range_list),range_list))

for method in probe1_method,probe2_method: #3/3/23 changed from "for method in filteredDF" as to only include chosen method
    if method.find(Probe1)!=-1:
        probe= Probe1
    elif method.find(Probe2)!=-1:
        probe = Probe2
    else:
        print(method)
        break
    #for m in cellDF.columns:      
     #   if m == method:
    for image in cellDF["FileName_RAW"].unique():
        animal = image[4:image.find("-S")]
        injection = projdict[animal]
        x = cellDF[cellDF["FileName_RAW"] == image]
        x.sort_values(method, ascending = False).reset_index()
        for a in range_list:
            cutoff = float(III_med_dict[method])+a 
            num_pos = 0
            for y in range(len(x)): 
                if float(x.iloc[y][method])>=cutoff: 
                    num_pos = num_pos + 1

            GFP = len(x)
            percent_pos = num_pos/GFP
            imagelist.append(image)
            probelist.append(probe)
            probe_pos.append(num_pos)
            animal_list.append(animal)
            inj_list.append(injection)
            per_pos.append(percent_pos)
            all_GFP.append(GFP)
            method_list.append(method)
            relativity_list.append(a)
            cutoff_list.append(cutoff)
                        
                        
sliding_IIDF['FileName_RAW'] = imagelist
sliding_IIDF['Animal ID'] = animal_list
sliding_IIDF['Inj Site'] = inj_list
sliding_IIDF['Probe'] = probelist
sliding_IIDF['Method'] = method_list
sliding_IIDF["Cutoff"] = cutoff_list
sliding_IIDF["Relative Threshold"] = relativity_list
sliding_IIDF['Total GFP'] = all_GFP
sliding_IIDF['Pos Cells'] = probe_pos 
sliding_IIDF['Percent Positive'] = per_pos



if normalize == True:
    new = sliding_IIDF['Method'].str.rsplit("_", n = 2, expand = True) 
    sliding_IIDF['NumD'] = new[1]
elif normalize == False:
    new = sliding_IIDF['Method'].str.rsplit("_", n = 3, expand = True) 
    sliding_IIDF['NumD'] = new[2]

In [None]:
animalslidingIIDF = sliding_IIDF.groupby(['Animal ID', 'Inj Site', 'Cutoff', 'Relative Threshold', 'Probe', 'Method'])[['Total GFP', 'Pos Cells']].sum()

animalslidingIIDF.insert(0,'Proportion Positive', animalslidingIIDF['Pos Cells'] / animalslidingIIDF['Total GFP'])
                         
    
animalslidingIIDF.reset_index(inplace=True)


In [None]:
animalslidingIIDF

In [None]:
animalslidingIIDF[(animalslidingIIDF["Relative Threshold"]== -0.008)&(animalslidingIIDF["Method"] == probe2_method)].value_counts("Proportion Positive")

In [None]:
sliding_IIDF[(sliding_IIDF["Relative Threshold"]== -0.008)&(sliding_IIDF["NumD"] == "NumD8")].value_counts("Percent Positive")

## Graphing Sliding Threshold Cutoff vs Percent Pos

In [None]:
sns.set_theme(style = 'white', font_scale=0.8)

In [None]:
plt.figure(dpi=300)
probe = Probe1
if probe == Probe1:
    method = probe1_method
elif probe == Probe2:
    method = probe2_method


Title = "Percent Pos for "+probe+" based on Sliding Threshold\nImage Level"
image_name = "sliding-threshold_"+probe
g = sns.catplot(data=sliding_IIDF[(sliding_IIDF["Method"] == probe1_method) 
                                       | (sliding_IIDF["Method"] == probe2_method)] ,
            x= "Relative Threshold", y= "Percent Positive",
           hue = "Probe", palette=probecolordict,
                kind = "point", units = "FileName_RAW", legend_out= False)

g.set(ylim= (0,1.1))
g.set_xlabels("Threshold Relative to Optimized\n"+method)
#g.set_titles(col_template="{col_name}" ,row_template="{row_name}")
plt.title(Title, pad =20)#+"\nNormalized by area = " + str(normalize)).set_size(20)

plt.legend(title= 'ZI Projection Type', bbox_to_anchor=(1.02, 0.7),  borderaxespad=0.1)
###plt.savefig(image_save_path+"/"+image_name+".svg", format= "svg", bbox_inches = 'tight' )

In [None]:
plt.figure(dpi=400)
probe = Probe1
if probe == Probe1:
    method = probe1_method
elif probe == Probe2:
    method = probe2_method

Title = "Proportion Retrogradely Labeled ZI->PRN Neurons\nExpressing Marker Genes Defined on Sliding Threshold\nAnimal Level"

image_name = "sliding-threshold_AnimalLevel_" + probe

g = sns.catplot(data= animalslidingIIDF[(animalslidingIIDF["Method"] == probe1_method) 
                                       | (animalslidingIIDF["Method"] == probe2_method)] ,
                x= "Relative Threshold", 
                y= "Proportion Positive",
                hue = "Probe", 
                #hue_order=["PRN","SC"], 
                palette=probecolordict,
                kind = "point",
                scale =1.5,
                #units = "Animal ID",
                errorbar = 'se',
                n_boot = None, 
                legend_out=False,
                )


g.set(ylim= (0,1.1))
g.set_ylabels("Proportion " + probe + " Positive")
g.set_xlabels("Threshold Relative to Minimal " + probe + " Expression Threshold\nNormalized By Cell Area")
plt.title(Title)#+"\nNormalized by area = " + str(normalize),).set_size(20)
g.fig.set_size_inches(15, 9)
plt.legend(title= 'Marker Gene', bbox_to_anchor=(1.02, 0.7), loc = [1.1,0.7],  borderaxespad=0.1,)

###plt.savefig(image_save_path+"/"+image_name+".svg", format= "svg",  bbox_inches = 'tight')

In [None]:
plt.figure(dpi=300)
probe = Probe1
if probe == Probe1:
    method = probe1_method
elif probe == Probe2:
    method = probe2_method

Title = "Percent Pos for "+probe+" based on Sliding Threshold\nImage Level"
image_name = "sliding-threshold_ImageLevel_"+probe
g = sns.catplot(data=sliding_IIDF[(sliding_IIDF["Probe"] == probe) & (sliding_IIDF["Method"] == method)],
            x= "Relative Threshold", y= "Percent Positive",
            hue = "Inj Site", 
                #hue_order=["PRN","SC"], 
                palette=injcolordict,
                kind = "point", units = "FileName_RAW", legend_out = False,
               scale=1.5,
               markers = 's')

g.set(ylim= (0,1.1))

g.set_xlabels("Threshold Relative to Minimal " + probe + " Expression Threshold\nNormalized By Cell Area")
#g.set_titles(col_template="{col_name}" ,row_template="{row_name}")
plt.title(Title, pad = 20)#+"\nNormalized by area = " + str(normalize)).set_size(20)
g.fig.set_size_inches(15, 9)
plt.legend(title= 'ZI Projection Type', bbox_to_anchor=(1.02, 0.7), loc = [1.1,0.7],  borderaxespad=0.1)

###plt.savefig(image_save_path+"/"+image_name+".svg", format= "svg",  bbox_inches = 'tight')

In [None]:
#set sliding threshold range from (-4,60)*(1/1000)

plt.figure(dpi=300)
probe = Probe1
if probe == Probe1:
    method = probe1_method
elif probe == Probe2:
    method = probe2_method

Title = "Percent Pos for "+probe+" based on Sliding Threshold\nAnimal Level"
image_name = "sliding-threshold_AnimalLevel_"+probe
g = sns.catplot(data=animalslidingIIDF[(animalslidingIIDF["Probe"] == probe) & (animalslidingIIDF["Method"] == method)],
            x= "Relative Threshold", y= "Proportion Positive",
            hue = "Inj Site", 
                #hue_order=["PRN","SC"], 
                palette=injcolordict,
                kind = "point", units = "Animal ID", legend_out = False,
               scale=1.5,
               markers = 's')

g.set(ylim= (0,1.1))

g.set_xlabels("Threshold Relative to Minimal " + probe + " Expression Threshold\nNormalized By Cell Area")
#g.set_titles(col_template="{col_name}" ,row_template="{row_name}")
plt.title(Title, pad = 20)#+"\nNormalized by area = " + str(normalize)).set_size(20)
g.fig.set_size_inches(15, 9)
plt.legend(title= 'ZI Projection Type', bbox_to_anchor=(1.02, 0.7), loc = [1.1,0.7],  borderaxespad=0.1)
plt.xticks(rotation = 90)

###plt.savefig(image_save_path+"/"+image_name+".svg", format= "svg",  bbox_inches = 'tight')

In [None]:
# set sliding threshold range from (-1,30,3))*(1/100)

plt.figure(dpi=300)
probe = Probe2
if probe == Probe1:
    method = probe1_method
elif probe == Probe2:
    method = probe2_method

Title = "Percent Pos for "+probe+" based on Sliding Threshold\nAnimal Level"
image_name = "sliding-threshold_AnimalLevel_"+probe
g = sns.catplot(data=animalslidingIIDF[(animalslidingIIDF["Probe"] == probe) & (animalslidingIIDF["Method"] == method)],
            x= "Relative Threshold", y= "Proportion Positive",
            hue = "Inj Site", 
                #hue_order=["PRN","SC"], 
                palette=injcolordict,
                kind = "point", units = "Animal ID", legend_out = False,
               scale=1.5,
               markers = 's')

g.set(ylim= (0,1.1))

g.set_xlabels("Threshold Relative to Minimal " + probe + " Expression Threshold\nNormalized By Cell Area")
#g.set_titles(col_template="{col_name}" ,row_template="{row_name}")
plt.title(Title, pad = 20)#+"\nNormalized by area = " + str(normalize)).set_size(20)
g.fig.set_size_inches(15, 9)
plt.legend(title= 'ZI Projection Type', bbox_to_anchor=(1.02, 0.7), loc = [1.1,0.7],  borderaxespad=0.1)

###plt.savefig(image_save_path+"/"+image_name+".svg", format= "svg",  bbox_inches = 'tight')

# Cell Thresholding
## Filtering Full Dataset

In [None]:
#DF for percent pos on all images based on II cutoff values

imagelist = []
probelist = []
probe_pos = []
animal_list= []
inj_list = []
per_pos = []
all_GFP = []
method_list=[]

chosen_IIIDF = pd.DataFrame()

for method in [probe1_method, probe2_method, 'Double']:
    
    if method.find(Probe1) != -1:
        probe= Probe1
        
    elif method.find(Probe2) != -1:
        probe = Probe2
        
    elif method.find('Double') != -1:
        probe = Probe1 + Probe2
        
    else:
        print(method)
        break
        
    #for m in cellDF.columns:      
     #   if m == method:
        
    for image in cellDF["FileName_RAW"].unique():
        
        animal = image[4:image.find("-S")] ##LM_EDIT!
        injection = projdict[animal]
        x = cellDF[cellDF["FileName_RAW"] == image]
        
        if method.find('Double') == -1:
            cutoff = float(III_med_dict[method]) #getting the median integrated intensity value for given method
            
        if method.find('Double') >= 0:
            cutoff1 = float(III_med_dict[probe1_method])
            cutoff2 = float(III_med_dict[probe2_method])

        num_pos = 0
        
        for y in range(len(x)): 
            if (method.find('Double') == -1):
                if (float(x.iloc[y][method]) >= cutoff):
                    num_pos = num_pos + 1
                
            elif (method.find('Double') >= 0):
                if (float(x.iloc[y][probe1_method]) >= cutoff1) & (float(x.iloc[y][probe2_method]) >= cutoff2):
                    num_pos = num_pos + 1

        GFP = len(x)
        percent_pos = num_pos/GFP
        imagelist.append(image)
        probelist.append(probe)
        probe_pos.append(num_pos)
        animal_list.append(animal)
        inj_list.append(injection)
        per_pos.append(percent_pos)
        all_GFP.append(GFP)
        method_list.append(method)
                        
                        
chosen_IIIDF['FileName_RAW'] = imagelist
chosen_IIIDF['Animal ID'] = animal_list
chosen_IIIDF['Inj Site'] = inj_list
chosen_IIIDF['Probe'] = probelist
chosen_IIIDF['Method'] = method_list
chosen_IIIDF['Total GFP'] = all_GFP
chosen_IIIDF['Pos Cells'] = probe_pos 
chosen_IIIDF['Proportion Positive'] = per_pos


if normalize == True:
    
    new = chosen_IIIDF['Method'].str.rsplit("_", n = 2, expand = True) 
    chosen_IIIDF['NumD'] = new[1]
    
elif normalize == False:
    
    new = chosen_IIIDF['Method'].str.rsplit("_", n = 3, expand = True) 
    chosen_IIIDF['NumD'] = new[2]

#adding bregma position
AP_list = []

for i in range(len(chosen_IIIDF)):     
    AP_list.append(AP_dict[chosen_IIIDF.iloc[i]["FileName_RAW"]])    
    
chosen_IIIDF["AP_position"] = AP_list         

In [None]:
chosen_IIIDF

In [None]:
# Binning AP coordinates into 9 position bins and 6 position bins to allow for coarser analysis by position

AP_order = chosen_IIIDF["AP_position"].unique().tolist()
AP_order.sort(reverse= True)


AP_9_bindict= dict(zip(AP_order,[1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10])) ##added the two 10s d/t larger range
AP_6_bindict = dict(zip(AP_order, [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7])) ##added the two 7s d/t larger range

AP_9bin = []
AP_6bin = []



for i in range(len(chosen_IIIDF)):
    AP_9bin.append(AP_9_bindict[chosen_IIIDF["AP_position"].iloc[i]])
    AP_6bin.append(AP_6_bindict[chosen_IIIDF["AP_position"].iloc[i]])

chosen_IIIDF["AP_bin(9)"] = AP_9bin
chosen_IIIDF["AP_bin(6)"] = AP_6bin
#chosen_IIIDF.to_csv(csv_save_path + '/chosen_IIIDF.csv', sep=',', index = False)

In [None]:
#results summated by animal

animal_IIIDF = chosen_IIIDF.groupby(['Animal ID', 'Inj Site', 'Probe', 'Method']).sum(numeric_only=True).reset_index()
animal_IIIDF.drop(["AP_position","AP_bin(9)","AP_bin(6)"], axis=1, inplace= True)
animal_IIIDF["Proportion Positive"] = animal_IIIDF["Pos Cells"] / animal_IIIDF["Total GFP"]
animal_IIIDF
#animal_IIIDF.to_csv(csv_save_path + '/animal_IIIDF.csv', sep=',', index = False)

In [None]:
#results summated by animal x AP bin based on 9 bins

APbin_DF = chosen_IIIDF.groupby(['Animal ID', 'Inj Site', 'Probe', 'Method',"AP_bin(9)"]).sum(numeric_only=True).reset_index()
APbin_DF.drop(["AP_position","AP_bin(6)"], axis=1, inplace= True)
APbin_DF["Proportion Positive"] = APbin_DF["Pos Cells"] / APbin_DF["Total GFP"]
APbin_DF
#APbin_DF.to_csv(csv_save_path + '/APbin_DF.csv', sep=',', index = False)

In [None]:
#refine chosen_IIIDF to remove double pos cells from single pos counts and also include double neg cells

df_list = []

for image in chosen_IIIDF["FileName_RAW"].unique():
    x = chosen_IIIDF[chosen_IIIDF["FileName_RAW"] == image].copy()
    for method in probe1_method,probe2_method:
        singlepos = x[x["Method"] == method]["Pos Cells"].iloc[0] - x[x["Method"] == "Double"]["Pos Cells"].iloc[0]
        x.loc[x["Method"] == method,"Pos Cells"] = singlepos
        
    y = x[x["Method"] == "Double"].copy()
    
    y["Method"] = "Neg"
    y["Probe"] = "Double Negative"
    y["Pos Cells"] = x["Total GFP"].iloc[0] - x["Pos Cells"].sum()
    
    df_list.append(x)
    df_list.append(y)


chosen_IIIDF2 = pd.concat(df_list)
chosen_IIIDF2["Proportion Positive"] = chosen_IIIDF2["Pos Cells"] / chosen_IIIDF2["Total GFP"]

chosen_IIIDF2.rename(columns={"Pos Cells": "Cells"}, inplace = True)
#chosen_IIIDF2.to_csv(csv_save_path + '/chosen_IIIDF2.csv', sep=',', index = False)

In [None]:
animal_IIIDF2 = chosen_IIIDF2.groupby(['Animal ID', 'Inj Site', 'Probe', 'Method']).sum(numeric_only=True).reset_index()
animal_IIIDF2["Proportion Positive"] = animal_IIIDF2["Cells"] / animal_IIIDF2["Total GFP"]
animal_IIIDF2.drop(["AP_position","AP_bin(9)","AP_bin(6)"], axis=1, inplace= True)
#animal_IIIDF2.to_csv(csv_save_path + '/animal_IIIDF2.csv', sep=',', index = False)

In [None]:
APbin_DF2 = chosen_IIIDF2.groupby(['Animal ID', 'Inj Site', 'Probe', 'Method',"AP_bin(9)"]).sum(numeric_only=True).reset_index()
APbin_DF2["Proportion Positive"] = APbin_DF2["Cells"] / APbin_DF2["Total GFP"]
APbin_DF2.drop(["AP_position","AP_bin(6)"], axis=1, inplace= True)
APbin_DF2
#APbin_DF2.to_csv(csv_save_path + '/APbin_DF2.csv', sep=',', index = False)

In [None]:
#### This can show you the full filtered DF
pd.set_option("display.max_rows", None, "display.max_columns", None)
chosen_IIIDF

In [None]:
pd.set_option("display.max_rows", None, "display.max_columns", None)
animal_IIIDF

In [None]:
pd.set_option("display.max_rows", None, "display.max_columns", None)
APbin_DF