Taking the output from DEG-SEQ2, the data "5LS_L2L3Combined.csv" contains the 5 life stages we are interested in:
Embryo, L1 larva, Dauer Larva, L2L3 larva and Adult, lets take a peek of that data

In [1]:
import csv
import os

#user configurable variables
number_of_lines_to_print=10
expressionCountFile=os.path.join(os.getcwd(),'csvs/5LS_L2L3Combined.csv')
#Code Chunk for printing the file
with open(os.path.join(os.getcwd(),expressionCountFile)) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        print(row)
        number_of_lines_to_print-=1
        if number_of_lines_to_print<=0:
            break
        
        

['WBID', 'elongating embryo Ce', 'L1 larva Ce', 'dauer larva Ce', 'adult Ce', 'L2L3_larva']
['WBGene00000001', '4208', '12140', '5547', '2246', '2369']
['WBGene00000002', '12554', '7828', '831', '280', '2591']
['WBGene00000003', '7180', '11253', '570', '212', '2466']
['WBGene00000004', '33305', '26947', '3212', '576', '5391']
['WBGene00000005', '595', '132', '37', '281', '1410']
['WBGene00000006', '425', '12243', '3146', '228', '2446']
['WBGene00000007', '36', '314', '129', '197', '1719']
['WBGene00000008', '0', '19', '663', '19', '182']
['WBGene00000009', '71', '416', '193', '20', '64']


Now, we need to determine the genes that we consider to be life stage biased, here are the some criterias that must be fullfilled to be considered a life stage biased gene:

<ol start="1">   
  <li>This gene has the highest expression in that life stage</li>
  <li>This gene's expression at this life stage has at least a fold difference of 2 comparing the max expression in other life stages</li>
  <li>At least one life stage has a count that is higher than at least 10% of of counts across all life stages. <br>
      <font size="2">*This ensures we dont include genes that have high fold diff due to unbalanced low expression counts, for example, a gene has a count of 1 in one life stage and are not found in other life stages(0 counts),this gene is a uniformly lowly expressed gene in all life stages, however, using the criteria one, this gene would have a fold difference of infinity, by setting a lower bound filter, we exclude these extremely lowly expressed gene counts that are prone to sequencing uncertainties. </font> </li>
  
</ol>


Let's process the expression file using above criterias:

In [2]:
from Code import LifeStageBiased as LSB
#Speficy input and output
LSB.inputFile= expressionCountFile
outputFilePath=os.path.join(os.getcwd(),'csvs/LSB.csv')
LSB.outputFile= outputFilePath
LSB.cutLowPercentile=0.1
LSB.fixedCutValue=0 #This overrides the percentil cut value, set to 0 disables it
LSB.main()


Now the genes that fit into our criteria should be in the *outputFilePath* we set ealier, lets take a look at that:

In [3]:
import pandas as pd 
data = pd.read_csv(outputFilePath)

print(data.columns)
data.head()

data.sort_values(by=['FoldDiff'],ascending=False)





Index(['GeneID', 'LS', 'LS_EXP', 'SecondMax', 'RestMean', 'FoldDiff'], dtype='object')


Unnamed: 0,GeneID,LS,LS_EXP,SecondMax,RestMean,FoldDiff
5139,WBGene00013184,L1 larva Ce,7.0,0.0,0.00,inf
9886,WBGene00219446,dauer larva Ce,924.0,0.0,0.00,inf
8234,WBGene00021108,dauer larva Ce,9.0,0.0,0.00,inf
4800,WBGene00012380,L1 larva Ce,21.0,0.0,0.00,inf
3771,WBGene00009934,L1 larva Ce,6.0,0.0,0.00,inf
...,...,...,...,...,...,...
3890,WBGene00010192,L1 larva Ce,5765.0,2880.0,1131.50,2.001736
4693,WBGene00012132,elongating embryo Ce,1285.0,642.0,246.75,2.001558
3564,WBGene00009477,L1 larva Ce,29770.0,14874.0,11794.75,2.001479
7656,WBGene00019778,L1 larva Ce,7765.0,3880.0,2639.25,2.001289
