# ExtraPy

This python script takes the raw output file from the NucJ ImageJ macro and converts it into the final cell counts for the wells

There are some assumptions this script makes: <br>
- The raw data was produced with NucJ.ijm. A different raw input may not work properly <br>
- All images used have the exact same area. This makes computation much simpler <br>

In [121]:
# Importing packages for conversion of raw data input file into processed csv file.
import datetime
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

## Script Configuration

This script requires a few user inputs to function properly. These parameters are: <br>
- The name of your raw csvfile <br>
- The desired name of your output file <br>
- The area of the images you used <br>
- The seeding area of the plate you used <br>

There are default values input for outfile, ImageArea, and SeedArea. By typing 'yes' when prompted you can have the script use these default values. You can change the default values within the code to be set to the values your lab usually uses.

In [122]:
# Inital User Input for Configuration
csvfile = str(input('Please input the name of your csvfile: '))
default = str(input("Use default parameters? Type 'no' or 'yes' : "))
if default == 'yes':
    now = datetime.datetime.now();
    # Default value for outfile, change below if you want to have the default name be different
    # WARNING if you have another file with the same name in the directory as your outfile, the outfile will overwrite it
    outfile = now.strftime("%m_%d_%y") + '_CellCounts.csv'
    
    # Default value for Image Area in microns. RECOMMENDED: change this to a common value used by your lab.
    ImageArea = 56196.769 # Change this value if you wish to have a different default Image Area
 
    # Default Seeding Area for a 96 well seahorse plate
    SeedArea = 10600000 # Change this value if you wish to have a different default Seeding Area
    
else:
    outfile = input('Please input the name of your outfile: ')
    ImageArea = float(input('Please input the area of the images you used: ')) # assumes consistent image area
    SeedArea = float(input('Please input the size of your seeding area: '))
    
print(csvfile, outfile, ImageArea, SeedArea)

Please input the name of your csvfile: Results.csv
Use default parameters? Type 'no' or 'yes' : yes
Results.csv 07_08_21_CellCounts.csv 56196.769 10600000


In [123]:
# Reading the raw output from the macro
RawData_df = pd.read_csv(csvfile)
print(RawData_df)

                              Slice  Count  Total Area  Average Size  %Area
0  30k.well1.3_hek_after_4-26-18_ND    365    19315.88         52.92  34.37
1  30k.well2.3_hek_after_4-26-18_ND    319    19907.08         62.40  35.42
2  30k.well3.1_hek_after_4-26-18_ND    395    19697.16         49.87  35.05


In [124]:
# Converting to desired format
CellCounts_df = RawData_df

# Getting the names of columns in the dataframe
All_Columns = list(CellCounts_df.columns)
Remove_Columns = list()

# Going through and tracking every name that is not 'Slice' or 'Count'
for column in All_Columns:
    if column != 'Slice':
        if column != 'Count':
            Remove_Columns.append(column)

# Removing every column other than 'Slice' and 'Count'
CellCounts_df = CellCounts_df.drop(Remove_Columns, axis = 1)

# Printing to make sure
print(CellCounts_df)

                              Slice  Count
0  30k.well1.3_hek_after_4-26-18_ND    365
1  30k.well2.3_hek_after_4-26-18_ND    319
2  30k.well3.1_hek_after_4-26-18_ND    395


In [125]:
# Processing to Final Average df ...
# Do same thing as before, but do name check if name ==
#emptyrow = pd.DataFrame([[1, 1]], columns=['Slice', 'Count'])
FinalCount_df = pd.DataFrame([[1, 1]], columns=['Slice', 'Count'])

FinalCount_df.loc[0] = CellCounts_df.loc[0]

# Initializing variables to keep track of the Well we are on and how many images we have looked at
ImageCount = 1;
WellPos = 0;

for x in range(1, CellCounts_df['Slice'].size):
    
    # If we are still on the same well just add the value to count and increment the ImageCount
    if CellCounts_df['Slice'].loc[x] == FinalCount_df['Slice'].loc[WellPos]:
        FinalCount_df.loc[WellPos,('Count')] +=  CellCounts_df.loc[x,('Count')]
        ImageCount += 1
        
    # If they were not the same it means we have moved on to a new well
    else:
        # Making the final count of that well the average of all the counts
        FinalCount_df.loc[WellPos,('Count')] = FinalCount_df.loc[WellPos,('Count')] / ImageCount
        # Appending the first instance of a new well from our cell counts
        FinalCount_df = FinalCount_df.append(CellCounts_df.loc[x], ignore_index=True)
        # Resetting the ImageCount in case an inconsistent number were used and incrementing the well position
        ImageCount = 1
        WellPos += 1
        
# Applying the division for the last well
FinalCount_df.loc[WellPos,('Count')] = FinalCount_df.loc[WellPos,('Count')] / ImageCount

print(FinalCount_df)

                              Slice  Count
0  30k.well1.3_hek_after_4-26-18_ND  365.0
1  30k.well2.3_hek_after_4-26-18_ND  319.0
2  30k.well3.1_hek_after_4-26-18_ND  395.0


In [126]:
# Just scale these counts and write to a csv file
scaling_factor = SeedArea / ImageArea
FinalCount_df['Count'] = (FinalCount_df['Count'] * scaling_factor).apply(lambda x: round(x, 2))

In [127]:
print(FinalCount_df)

                              Slice     Count
0  30k.well1.3_hek_after_4-26-18_ND  68847.37
1  30k.well2.3_hek_after_4-26-18_ND  60170.72
2  30k.well3.1_hek_after_4-26-18_ND  74506.06


In [128]:
# Export data to outfile
FinalCount_df.to_csv(outfile)