## Analysing GFP Expression
This notebook presents an automated data processing of `.csv` files obtained from the `ImageJ` macros. 
This requires both the image sources (raw GFP images) and the csv files derived from the macro. Take note that this script requires you to specify the `strain` of the *C. elegans* at the beginning.

The outputs of this program are `.csv` files:
- `controls.csv`
- `treatment.csv`

They contain the `Mean` intensity, the `Area`, and other `ImageJ` parameters for each neuronal type (see `to_melted_dataframe` function). These `.csv` files will be used in further statistical analysis in `R`. 

### Specify the strain of *C. elegans*
Select below the strain of *C. elegans* to be analyzed. 
1. **LX929**    : GFP-expressed *cholinergic* neurons

2. **BZ555**    : GFP-expressed *dopaminergic* neurons

3. **EG1428**   : GFP-expressed *GABAergic* neurons

4. **OH441**    : GFP-expressed *pan-neuronal*

In [1]:
## 1. LX929 : GFP-expressed cholinergic neurons
strain = 'LX929'
img_type = 'png'


## 2. BZ555 : GFP-expressed dopaminergic neurons
# strain = 'BZ555'
# img_type = 'tif'


# ##3. EG1428 : GFP-expressed GABA-ergic neurons
# strain = 'EG1285'
# img_type = 'tif'


##4. OH441 : GFP-expressed pan-neuronal
# strain = 'OH441'
# img_type = 'tif'


gradient_scheme = ["#2effa4", "#38cfb2", "#41a0bf", "#4a71cc"] ## all
print(strain)


LX929



#### Import and prepare files

In [2]:
import os
import glob
import numpy as np
import pandas as pd

## Folder names and location
infolder = 'output'
imgsrc = 'img_src'

## List of working files
csvfiles = np.sort([f for f in glob.glob(os.path.join(strain, infolder, "*.csv"))])
imgfiles = np.sort([f for f in glob.glob(os.path.join(strain, imgsrc, "*"))])

print(csvfiles[0], len(csvfiles))
print(imgfiles[0], len(imgfiles))

print(csvfiles[1])




LX929/output/dmso1 05-19-2022 Results.csv 129
LX929/img_src/dmso1 05-19-2022.png 129
LX929/output/dmso1 t3 05-19-2022 Results.csv


In [3]:
## Define controls and treatment
## Controls and treatment
control = {'DMSO':None, 'OP50':None, 'Metab':None, 'EGCG':None}
treatment = {'fwl':None, 'fwm':None, 'fwh':None,
            'lwl':None, 'lwm':None, 'lwh':None,
            'fll':None, 'flm':None, 'flh':None,
            'lll':None, 'llm':None, 'llh':None}
            
print(control, [*control.keys()])
print(treatment)

{'DMSO': None, 'OP50': None, 'Metab': None, 'EGCG': None} ['DMSO', 'OP50', 'Metab', 'EGCG']
{'fwl': None, 'fwm': None, 'fwh': None, 'lwl': None, 'lwm': None, 'lwh': None, 'fll': None, 'flm': None, 'flh': None, 'lll': None, 'llm': None, 'llh': None}


## Controls and Treatment
We prepare the necessary dataframe for both controls and treatment. The function `to_melted_dataframe` transforms the raw `imglist` and `csvlist` into a dataframe that consists of the `Area`, `RawIntDen`, `control` and `time` per sample. We then compute for the mean and the standard error of each time point, and visualize the results using `ggplot`. 

In [4]:
## Define function for controls and treatment
import cv2

def to_melted_dataframe(type, cols = ["control", "time", "RawIntDen"], csvfiles = csvfiles, imgfiles = imgfiles, img_type="png", save=True, filepath = 'controls.csv'):
    ''''
    Returns melted dataframe which contains the cols from csvfiles for selected type i.e. control or treatment.
    Inputs:
            type        : control or treatment dictionary
            cols        : column name i.e. default
            csvfiles    : list of csv files
            imgfiles    : list of images 
            img_type    : file extension of the raw images e.g. png, tif
            filepath    : full path (.csv) of the dataframe, set save = True by default
    Output:
            sgdf        : Dataframe with cols of the type i.e. control/treatment
    '''

    sgdf = pd.DataFrame(columns=cols)
    for ctrlname in [*type.keys()]:
        ctrls = [f for f in csvfiles if f.__contains__(ctrlname.lower())]
        for f in ctrls:
            if f.__contains__('t0'):
                pt = 't0'
            elif f.__contains__('t3'):
                pt = 't3'
            elif f.__contains__('t6'):
                pt = 't6'
            elif f.__contains__('t9'):
                pt = 't9'
            elif (len(f.split("/")[2].split(" ")) < 4):
                pt = 't0'

            df = pd.read_csv(f)
            ## Extract image
            imgf = f.replace(" Results.csv", "."+img_type).split("/")[2]

            ## Filtering
            try:
                imgff = [f for f in imgfiles if f.__contains__(imgf.lower())][0]
                img = cv2.imread(imgff)
                G = img[:,:,1]
                G = G[G>0] ## Remove zeroes first
                mean_intensity = np.mean(G) ## Take mean
                r = f.split("/")[2].split(" ")[0][-1]
            except TypeError:
                raise TypeError
    
            ## Filter mean > mean_intensity and area > 1
            sdf = df[(df.Mean > mean_intensity)]
            sdf = sdf[(sdf.Area > 1)]

            ## Sanity check if this part here gets filtered
            # if len(df)!=len(sdf):
            #     print("Filtered: ", imgf, len(df), len(sdf))       
             
            sgdf = pd.concat([sgdf, pd.DataFrame({
                                        "Mean": sdf.Mean,
                                        "Area": sdf.Area,
                                        "RawIntDen":sdf.RawIntDen, 
                                        "control":[ctrlname]*len(sdf.RawIntDen),
                                        "time": [pt]*len(sdf.RawIntDen),
                                        "rep": r
                                        })], ignore_index=True)

    sgdf.RawIntDen = sgdf.RawIntDen.astype('float') 
    if save:
        sgdf.to_csv(filepath)
    return sgdf

## 1. Controls

In [5]:
import os

## Apply the function to the control
ctrl_filepath = os.path.join(strain, 'controls-2.csv')
sgdf = to_melted_dataframe(control, csvfiles = csvfiles, imgfiles = imgfiles, img_type=img_type, save=True, filepath = ctrl_filepath)
print("Control: ", sgdf.control.unique())
sgdf.head(10)


Control:  ['DMSO' 'OP50' 'Metab' 'EGCG']


Unnamed: 0,control,time,RawIntDen,Mean,Area,rep
0,DMSO,t0,1244.0,207.333,6.0,1
1,DMSO,t0,1268.0,211.333,6.0,1
2,DMSO,t0,1566.0,195.75,8.0,1
3,DMSO,t0,508.0,254.0,2.0,1
4,DMSO,t0,550.0,137.5,4.0,1
5,DMSO,t0,348.0,174.0,2.0,1
6,DMSO,t0,954.0,238.5,4.0,1
7,DMSO,t0,488.0,244.0,2.0,1
8,DMSO,t0,508.0,254.0,2.0,1
9,DMSO,t0,504.0,252.0,2.0,1


## 2. Treatment

In [6]:
import os

## Apply the function to the control
tx_filepath = os.path.join(strain, 'treatment-2.csv')
tgdf = to_melted_dataframe(treatment, csvfiles = csvfiles, imgfiles = imgfiles, img_type=img_type, save=True, filepath = tx_filepath)
print("Treatment: ", tgdf.control.unique())
tgdf.head(10)


Treatment:  ['fwl' 'fwm' 'fwh' 'lwl' 'lwm' 'lwh' 'fll' 'flm' 'flh' 'lll' 'llm' 'llh']


Unnamed: 0,control,time,RawIntDen,Mean,Area,rep
0,fwl,t0,1127.0,225.4,5.0,1
1,fwl,t0,414.0,207.0,2.0,1
2,fwl,t0,508.0,254.0,2.0,1
3,fwl,t0,222.0,111.0,2.0,1
4,fwl,t0,1162.0,193.667,6.0,1
5,fwl,t0,244.0,122.0,2.0,1
6,fwl,t0,2847.0,167.471,17.0,1
7,fwl,t0,570.0,142.5,4.0,1
8,fwl,t0,455.0,151.667,3.0,1
9,fwl,t0,290.0,145.0,2.0,1
