# Pandase sissejuhatus

Selles vihikus ma näitan põgusalt kuidas teha lihtsamat andmeanalüüsi Pythoni vihikus kasutades Pandast ja vähemal määral ka Numpyt.

[Pandas](http://pandas.pydata.org/) on Pythoni andme analüüsi teek mis pakub mitmeid mugandusi tavaliste andmeanalüüsi käigus vajaminevate protseduuride tegemiseks. Pandas ise on üles ehitatud suures osas [Numpy](http://www.numpy.org/)-l mida kasutatakse Pythonis keerulisemate arvutuste tegemiseks (arvutused maatriksitega jne). Numpy-t ma antud vihikus väga palju ei tutvusta.

Et kasutada Pandast tuleb see importida. Importimisel Python laeb mällu `pandas` ja `numpy` teegid et neid saaks kasutada.

In [1]:
import numpy as np    #Numpy import
import pandas as pd   #Pandase import

# jooniste kuvamiseks hiljem tuleb sisestada see rida...
%matplotlib notebook

## DataFrame

`DataFrame` on Pandase üks põhilisi andmestruktuure. Põhimõttelised võib seda võtta nagu Exceli tabelit.

DataFrame saab luua Pandase meetodiga `.DataFrame` edastades sellele *dictionary*.

Andmed on paigutatud Pandase `DataFrames`-is tulpadesse. Andmed on indekseeridud rea kaupa. Kui indeks on määramata siis sellek on lihtsalt rea number.

In [2]:
test_dict = {
    'samplename': ['Sample1A' ,'Sample1B' ,'Sample1C',
                   'Sample2A' ,'Sample2B' ,'Sample2C',
                   'Sample3A' ,'Sample3B' ,'Sample3C',
                   'Sample4A' ,'Sample4B' ,'Sample4C',],
    'konstrukt' : ['plasmiid1','plasmiid1','plasmiid1',
                   'plasmiid1','plasmiid1','plasmiid1',
                   'plasmiid2','plasmiid2','plasmiid2',
                   'plasmiid2','plasmiid2','plasmiid2'],
    't88tlus' :   ['dmso','dmso','dmso',
                   'aine','aine','aine',
                   'dmso','dmso','dmso',
                   'aine','aine','aine'],
    'luc2p' : [ 450, 543, 243,
                456, 644, 453,
               1234,1342,1532,
               4500,6453,4800]}

In [3]:
df = pd.DataFrame(test_dict)
df

Unnamed: 0,konstrukt,luc2p,samplename,t88tlus
0,plasmiid1,450,Sample1A,dmso
1,plasmiid1,543,Sample1B,dmso
2,plasmiid1,243,Sample1C,dmso
3,plasmiid1,456,Sample2A,aine
4,plasmiid1,644,Sample2B,aine
5,plasmiid1,453,Sample2C,aine
6,plasmiid2,1234,Sample3A,dmso
7,plasmiid2,1342,Sample3B,dmso
8,plasmiid2,1532,Sample3C,dmso
9,plasmiid2,4500,Sample4A,aine


`DataFrame`-i loomisel saab ette ka anda tulpade nimede järjekorra. Muidu tuleb see tähestikulises järjekorras (vist).

In [4]:
df = pd.DataFrame(test_dict,
                  columns = ['samplename','konstrukt','t88tlus','luc2p'])
df

Unnamed: 0,samplename,konstrukt,t88tlus,luc2p
0,Sample1A,plasmiid1,dmso,450
1,Sample1B,plasmiid1,dmso,543
2,Sample1C,plasmiid1,dmso,243
3,Sample2A,plasmiid1,aine,456
4,Sample2B,plasmiid1,aine,644
5,Sample2C,plasmiid1,aine,453
6,Sample3A,plasmiid2,dmso,1234
7,Sample3B,plasmiid2,dmso,1342
8,Sample3C,plasmiid2,dmso,1532
9,Sample4A,plasmiid2,aine,4500


`DataFrame`-il on andmed paigtatud ridadesse `index`-i järgi. Kui seda pole määratud siis on selleks lihtsalt järjekorra number. Samas on hea kui andmed on paigutatud `DataFrame`-i ridadesse meile endale kasuliku tähistuse järgi. Näiteks proovide nimede järgi:

In [5]:
df = pd.DataFrame(test_dict,
                  columns = ['konstrukt','t88tlus','luc2p'],
                  index = test_dict['samplename'])
df

Unnamed: 0,konstrukt,t88tlus,luc2p
Sample1A,plasmiid1,dmso,450
Sample1B,plasmiid1,dmso,543
Sample1C,plasmiid1,dmso,243
Sample2A,plasmiid1,aine,456
Sample2B,plasmiid1,aine,644
Sample2C,plasmiid1,aine,453
Sample3A,plasmiid2,dmso,1234
Sample3B,plasmiid2,dmso,1342
Sample3C,plasmiid2,dmso,1532
Sample4A,plasmiid2,aine,4500


Juba tehtud DataFrame-ile saab indeksit luua meetodiga `set_index('tulp')`. See tagastab uue `DataFrame`-i millel on meie määratud tulp muudetud index-iks.

In [6]:
df = pd.DataFrame(test_dict,
                  columns = ['samplename','konstrukt','t88tlus','luc2p'])

df.set_index('samplename')

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample1A,plasmiid1,dmso,450
Sample1B,plasmiid1,dmso,543
Sample1C,plasmiid1,dmso,243
Sample2A,plasmiid1,aine,456
Sample2B,plasmiid1,aine,644
Sample2C,plasmiid1,aine,453
Sample3A,plasmiid2,dmso,1234
Sample3B,plasmiid2,dmso,1342
Sample3C,plasmiid2,dmso,1532
Sample4A,plasmiid2,aine,4500


## Andmete laadimine failidest

Andmeid saab laadida `DataFrame`-i ka failidest. Toetatud faili tüüpide nimekiri on pikk ja seda saab vaadata [Pandase dokumentatsioonist](https://pandas.pydata.org/pandas-docs/stable/io.html). Meile ühed tavalisemad ettejuhtuvad failid on Exceli (`.xlsx`) ja *comma separated values* (`.csv`) failid. Mõlemat suudab Pandas ilusti laadida.

Exceli faili laadimiseks tuleb kasutada Pandase `.read_xlsx` meetodit:

In [7]:
df = pd.read_excel("example_data/02-exp1.xlsx")
df

Unnamed: 0,samplename,konstrukt,t88tlus,luc2p
0,Sample1A,plasmiid1,dmso,450
1,Sample1B,plasmiid1,dmso,543
2,Sample1C,plasmiid1,dmso,243
3,Sample2A,plasmiid1,aine,456
4,Sample2B,plasmiid1,aine,644
5,Sample2C,plasmiid1,aine,453
6,Sample3A,plasmiid2,dmso,1234
7,Sample3B,plasmiid2,dmso,1342
8,Sample3C,plasmiid2,dmso,1532
9,Sample4A,plasmiid2,aine,4500


Lihstalt faili nime andes laeb pandas sellest ainult esimese töölehe. Et saada kätte teisi tuleb anda `.read_excel` funktsioonile ka lehe nimi:

In [8]:
df = pd.read_excel("example_data/02-exp1.xlsx", sheetname="Sheet2")
df

Unnamed: 0,samplename,konstrukt,t88tlus,luc2p
0,Sample5A,plasmiid1,dmso,576
1,Sample5B,plasmiid1,dmso,565
2,Sample5C,plasmiid1,dmso,676
3,Sample6A,plasmiid1,aine,908
4,Sample6B,plasmiid1,aine,890
5,Sample6C,plasmiid1,aine,890
6,Sample7A,plasmiid2,dmso,4324
7,Sample7B,plasmiid2,dmso,2421
8,Sample7C,plasmiid2,dmso,1242
9,Sample8A,plasmiid2,aine,6785


Andmete laadimise saab ära määrata ka millist tulpa võiks Pandas kasutada `index`-ina. Seda saab teha `index_col` argumendiga. Tulpade järjekord algab nullist.

In [9]:
df = pd.read_excel("example_data/02-exp1.xlsx", sheetname="Sheet2", index_col=0)
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample5A,plasmiid1,dmso,576
Sample5B,plasmiid1,dmso,565
Sample5C,plasmiid1,dmso,676
Sample6A,plasmiid1,aine,908
Sample6B,plasmiid1,aine,890
Sample6C,plasmiid1,aine,890
Sample7A,plasmiid2,dmso,4324
Sample7B,plasmiid2,dmso,2421
Sample7C,plasmiid2,dmso,1242
Sample8A,plasmiid2,aine,6785


## Andmete väljavõtmine `DataFrame`-ist

Lõike `DataFrame`-ist saab välja võtta meetoditega `.loc` ja `.iloc`. Esimene on selle jaoks et võtta välja lõike ridade ja tulpade nimede järgi. Teist saab kasutada võttes lõike välja nö kordinaatide järgi. All olevate näidete jaoks laen ma andmed exceli faili esimeselt lehelt:

In [10]:
df = pd.read_excel("example_data/02-exp1.xlsx", index_col=0)
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample1A,plasmiid1,dmso,450
Sample1B,plasmiid1,dmso,543
Sample1C,plasmiid1,dmso,243
Sample2A,plasmiid1,aine,456
Sample2B,plasmiid1,aine,644
Sample2C,plasmiid1,aine,453
Sample3A,plasmiid2,dmso,1234
Sample3B,plasmiid2,dmso,1342
Sample3C,plasmiid2,dmso,1532
Sample4A,plasmiid2,aine,4500


### `.loc[ [read], [tulbad] ]`

Selleks tuleb edastada sellele Pythoni *list* või *tuple* nimekirjad vastavate ridade või tulpadega mida tahetakse:
```
df.loc[ [... rea indexite nimekiri...] , [tulpade nimede nimekiri] ]
```

In [11]:
df.loc[['Sample1A', 'Sample2A'],['konstrukt','luc2p']]

Unnamed: 0_level_0,konstrukt,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1
Sample1A,plasmiid1,450
Sample2A,plasmiid1,456


Asendades ühe või teise kooloniga `:` saab kätte kõik nende ridadega tulbad või siis vastavalt
kõik vastavate tulpadega read.

In [12]:
df.loc[['Sample1A', 'Sample2A'],:]

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample1A,plasmiid1,dmso,450
Sample2A,plasmiid1,aine,456


In [13]:
df.loc[:,['konstrukt','luc2p']]

Unnamed: 0_level_0,konstrukt,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1
Sample1A,plasmiid1,450
Sample1B,plasmiid1,543
Sample1C,plasmiid1,243
Sample2A,plasmiid1,456
Sample2B,plasmiid1,644
Sample2C,plasmiid1,453
Sample3A,plasmiid2,1234
Sample3B,plasmiid2,1342
Sample3C,plasmiid2,1532
Sample4A,plasmiid2,4500


### `.iloc[ [ ridade numbrid ], [ tulpade numbrid ] ]`

Teine viis tulpades ja ridades olevatele andmetele ligi pääseda on kasutata `.iloc` meetodit.
Selle meetodiga saab ligi andmetele kasutades tulpade ja ridade numbrilisi indekseid.

In [14]:
df.iloc[[0, 1],[0,1]]

Unnamed: 0_level_0,konstrukt,t88tlus
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1
Sample1A,plasmiid1,dmso
Sample1B,plasmiid1,dmso


In [15]:
df.iloc[[0, 1],:]

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample1A,plasmiid1,dmso,450
Sample1B,plasmiid1,dmso,543


In [16]:
df.iloc[:,[0, 1]]

Unnamed: 0_level_0,konstrukt,t88tlus
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1
Sample1A,plasmiid1,dmso
Sample1B,plasmiid1,dmso
Sample1C,plasmiid1,dmso
Sample2A,plasmiid1,aine
Sample2B,plasmiid1,aine
Sample2C,plasmiid1,aine
Sample3A,plasmiid2,dmso
Sample3B,plasmiid2,dmso
Sample3C,plasmiid2,dmso
Sample4A,plasmiid2,aine


Läbi `iloc` ja `loc` meetodi saab ka olemasolevates tulpades andmeid muuta.

In [17]:
df_muudetud = df.copy()
df_muudetud.loc[["Sample1A"],["luc2p"]] = 10000000
df_muudetud

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample1A,plasmiid1,dmso,10000000
Sample1B,plasmiid1,dmso,543
Sample1C,plasmiid1,dmso,243
Sample2A,plasmiid1,aine,456
Sample2B,plasmiid1,aine,644
Sample2C,plasmiid1,aine,453
Sample3A,plasmiid2,dmso,1234
Sample3B,plasmiid2,dmso,1342
Sample3C,plasmiid2,dmso,1532
Sample4A,plasmiid2,aine,4500


## Arvutuste tegemine `DataFrame`-iga

Arvutuste tegemine `DataFrame`-iga käib tulpade kaupa. Lihtsamateks arvutusteks ei pea kasutama tulpade andmetele ligipääsemiseks `.loc` või `.iloc` meetodeid vaid saab lihtsalt kasutada tulba nime index-ina.

Kasutan jällegi näidisena Exceli failist esimese lehe andmeid:

In [18]:
df_algne = pd.read_excel("example_data/02-exp1.xlsx", index_col=0)
df_algne

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Sample1A,plasmiid1,dmso,450
Sample1B,plasmiid1,dmso,543
Sample1C,plasmiid1,dmso,243
Sample2A,plasmiid1,aine,456
Sample2B,plasmiid1,aine,644
Sample2C,plasmiid1,aine,453
Sample3A,plasmiid2,dmso,1234
Sample3B,plasmiid2,dmso,1342
Sample3C,plasmiid2,dmso,1532
Sample4A,plasmiid2,aine,4500


Nii saab tulpadega teha standard arvutusi: korrutamine, jagamine jne.

In [19]:
# teen siin laetud exceli tabelist koopia lihtsalt eksperimenteerimiseks
df = df_algne.copy()

In [20]:
df["luc2p-10x"] = df["luc2p"]*10
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,luc2p-10x
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sample1A,plasmiid1,dmso,450,4500
Sample1B,plasmiid1,dmso,543,5430
Sample1C,plasmiid1,dmso,243,2430
Sample2A,plasmiid1,aine,456,4560
Sample2B,plasmiid1,aine,644,6440
Sample2C,plasmiid1,aine,453,4530
Sample3A,plasmiid2,dmso,1234,12340
Sample3B,plasmiid2,dmso,1342,13420
Sample3C,plasmiid2,dmso,1532,15320
Sample4A,plasmiid2,aine,4500,45000


Ning kasutada ka teiste tulpade andmeid arvutustes:

In [21]:
df["luc2p-ratio"] = df["luc2p"] - df["luc2p-10x"]
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,luc2p-10x,luc2p-ratio
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sample1A,plasmiid1,dmso,450,4500,-4050
Sample1B,plasmiid1,dmso,543,5430,-4887
Sample1C,plasmiid1,dmso,243,2430,-2187
Sample2A,plasmiid1,aine,456,4560,-4104
Sample2B,plasmiid1,aine,644,6440,-5796
Sample2C,plasmiid1,aine,453,4530,-4077
Sample3A,plasmiid2,dmso,1234,12340,-11106
Sample3B,plasmiid2,dmso,1342,13420,-12078
Sample3C,plasmiid2,dmso,1532,15320,-13788
Sample4A,plasmiid2,aine,4500,45000,-40500


**Märkus/Hoiatus** - Kui te selliseid arvutusi teete siis tasub alati tulemus salvestada uude tulpa. Mitte kasutada tulemuse tulpa praeguses arvutuses või arvutada eelmise tulba väärtust üle. Seda sellepärast et siis hakkab arvutuse tulemus sõltuma sellest mitu korda on koodi läbi jooksutatud!!!

In [22]:
df_vale = df.copy()

In [23]:
df_vale["luc2p"] = df_vale["luc2p"]*10
df_vale

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,luc2p-10x,luc2p-ratio
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sample1A,plasmiid1,dmso,4500,4500,-4050
Sample1B,plasmiid1,dmso,5430,5430,-4887
Sample1C,plasmiid1,dmso,2430,2430,-2187
Sample2A,plasmiid1,aine,4560,4560,-4104
Sample2B,plasmiid1,aine,6440,6440,-5796
Sample2C,plasmiid1,aine,4530,4530,-4077
Sample3A,plasmiid2,dmso,12340,12340,-11106
Sample3B,plasmiid2,dmso,13420,13420,-12078
Sample3C,plasmiid2,dmso,15320,15320,-13788
Sample4A,plasmiid2,aine,45000,45000,-40500


Tulpadega arvutuste jaoks annab kasutada ka Numpy funktsioone. Näiteks

  * `np.log` - 
  * `np.min`
  * `np.max`
  * `np.exp`
  * `np.pow`
  * `np.mean`
  * `np.sdt`

In [24]:
# võtan värske `DataFrame`-i koopiast
df = df_algne.copy()

In [25]:
df["log-luc2p"] = np.log( df["luc2p"] )
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sample1A,plasmiid1,dmso,450,6.109248
Sample1B,plasmiid1,dmso,543,6.297109
Sample1C,plasmiid1,dmso,243,5.493061
Sample2A,plasmiid1,aine,456,6.122493
Sample2B,plasmiid1,aine,644,6.467699
Sample2C,plasmiid1,aine,453,6.115892
Sample3A,plasmiid2,dmso,1234,7.118016
Sample3B,plasmiid2,dmso,1342,7.201916
Sample3C,plasmiid2,dmso,1532,7.334329
Sample4A,plasmiid2,aine,4500,8.411833


`np.min` ja `np.max` funktsioonidega saab kogu tulba miinimumi kasutada arvutustes. Niisama Pandase DataFrame-i sellesse funktsiooni panne santakse kõige väiksema väärtusega rida meile.

In [26]:
np.min(df)

konstrukt    plasmiid1
t88tlus           aine
luc2p              243
log-luc2p      5.49306
dtype: object

Andes ette tulba antakse kogu tulba väikseim/suurim jne väärtus:

In [27]:
np.min(df["luc2p"]), np.max(df["luc2p"]), np.mean(df["luc2p"])

(243, 6453, 1887.5)

In [28]:
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sample1A,plasmiid1,dmso,450,6.109248
Sample1B,plasmiid1,dmso,543,6.297109
Sample1C,plasmiid1,dmso,243,5.493061
Sample2A,plasmiid1,aine,456,6.122493
Sample2B,plasmiid1,aine,644,6.467699
Sample2C,plasmiid1,aine,453,6.115892
Sample3A,plasmiid2,dmso,1234,7.118016
Sample3B,plasmiid2,dmso,1342,7.201916
Sample3C,plasmiid2,dmso,1532,7.334329
Sample4A,plasmiid2,aine,4500,8.411833


Mean-centering

In [29]:
df["luc2p-meancenter"] = df["luc2p"] - np.mean(df["luc2p"])
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p,luc2p-meancenter
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sample1A,plasmiid1,dmso,450,6.109248,-1437.5
Sample1B,plasmiid1,dmso,543,6.297109,-1344.5
Sample1C,plasmiid1,dmso,243,5.493061,-1644.5
Sample2A,plasmiid1,aine,456,6.122493,-1431.5
Sample2B,plasmiid1,aine,644,6.467699,-1243.5
Sample2C,plasmiid1,aine,453,6.115892,-1434.5
Sample3A,plasmiid2,dmso,1234,7.118016,-653.5
Sample3B,plasmiid2,dmso,1342,7.201916,-545.5
Sample3C,plasmiid2,dmso,1532,7.334329,-355.5
Sample4A,plasmiid2,aine,4500,8.411833,2612.5


Näiteks saab teha kogu andmestikul andmete skaleerimise 0 ja 1 piiresse, nö *min-max* skaleerimine:

\begin{equation}
X_{minmax} = \frac{X - X_{min} }{X_{max} - X_{min} } \\
\end{equation}


In [30]:
df["luc2p-minmax"] = ( df["luc2p"] - np.min(df["luc2p"]) ) / ( np.max(df["luc2p"]) - np.min(df["luc2p"]) )
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Sample1A,plasmiid1,dmso,450,6.109248,-1437.5,0.033333
Sample1B,plasmiid1,dmso,543,6.297109,-1344.5,0.048309
Sample1C,plasmiid1,dmso,243,5.493061,-1644.5,0.0
Sample2A,plasmiid1,aine,456,6.122493,-1431.5,0.0343
Sample2B,plasmiid1,aine,644,6.467699,-1243.5,0.064573
Sample2C,plasmiid1,aine,453,6.115892,-1434.5,0.033816
Sample3A,plasmiid2,dmso,1234,7.118016,-653.5,0.159581
Sample3B,plasmiid2,dmso,1342,7.201916,-545.5,0.176973
Sample3C,plasmiid2,dmso,1532,7.334329,-355.5,0.207568
Sample4A,plasmiid2,aine,4500,8.411833,2612.5,0.685507


Et seda skaleerimist oleks mugavam kasutada ka teiste tulpade puhul saab selle jaoks kirjutada funktsiooni:

In [31]:
def minmax_scale(x):
    """Teisendab andmed 0 ja 1 vahele."""
    return ( x - np.min(x) ) / ( np.max(x) - np.min(x) )

In [32]:
df["minmax-norm-func"] = minmax_scale(df["luc2p"])
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Sample1A,plasmiid1,dmso,450,6.109248,-1437.5,0.033333,0.033333
Sample1B,plasmiid1,dmso,543,6.297109,-1344.5,0.048309,0.048309
Sample1C,plasmiid1,dmso,243,5.493061,-1644.5,0.0,0.0
Sample2A,plasmiid1,aine,456,6.122493,-1431.5,0.0343,0.0343
Sample2B,plasmiid1,aine,644,6.467699,-1243.5,0.064573,0.064573
Sample2C,plasmiid1,aine,453,6.115892,-1434.5,0.033816,0.033816
Sample3A,plasmiid2,dmso,1234,7.118016,-653.5,0.159581,0.159581
Sample3B,plasmiid2,dmso,1342,7.201916,-545.5,0.176973,0.176973
Sample3C,plasmiid2,dmso,1532,7.334329,-355.5,0.207568,0.207568
Sample4A,plasmiid2,aine,4500,8.411833,2612.5,0.685507,0.685507


In [33]:
df["minmax-norm-log"] = minmax_scale(df["log-luc2p"])
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func,minmax-norm-log
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Sample1A,plasmiid1,dmso,450,6.109248,-1437.5,0.033333,0.033333,0.187905
Sample1B,plasmiid1,dmso,543,6.297109,-1344.5,0.048309,0.048309,0.245193
Sample1C,plasmiid1,dmso,243,5.493061,-1644.5,0.0,0.0,0.0
Sample2A,plasmiid1,aine,456,6.122493,-1431.5,0.0343,0.0343,0.191944
Sample2B,plasmiid1,aine,644,6.467699,-1243.5,0.064573,0.064573,0.297214
Sample2C,plasmiid1,aine,453,6.115892,-1434.5,0.033816,0.033816,0.189931
Sample3A,plasmiid2,dmso,1234,7.118016,-653.5,0.159581,0.159581,0.495528
Sample3B,plasmiid2,dmso,1342,7.201916,-545.5,0.176973,0.176973,0.521113
Sample3C,plasmiid2,dmso,1532,7.334329,-355.5,0.207568,0.207568,0.561492
Sample4A,plasmiid2,aine,4500,8.411833,2612.5,0.685507,0.685507,0.890076


## Arvutused gruppidel


Et teha arvutusi pandases gruppidel tuleb kasutada meetodit `.groupby`. Saadud gruppidel saab viia läbi lihtsamaid arvutusi kasutades pandase gruppidesse sisse ehitatud funktsioone `.mean(), std(), min(), .max, count()`. Need funktsioonid kõik tagastavad jällegi Pandase `DataFrame`-i.

### `.mean(), std(), min(), .max(), count()`

In [34]:
#praegune arvutustega DataFrame
df

Unnamed: 0_level_0,konstrukt,t88tlus,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func,minmax-norm-log
samplename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Sample1A,plasmiid1,dmso,450,6.109248,-1437.5,0.033333,0.033333,0.187905
Sample1B,plasmiid1,dmso,543,6.297109,-1344.5,0.048309,0.048309,0.245193
Sample1C,plasmiid1,dmso,243,5.493061,-1644.5,0.0,0.0,0.0
Sample2A,plasmiid1,aine,456,6.122493,-1431.5,0.0343,0.0343,0.191944
Sample2B,plasmiid1,aine,644,6.467699,-1243.5,0.064573,0.064573,0.297214
Sample2C,plasmiid1,aine,453,6.115892,-1434.5,0.033816,0.033816,0.189931
Sample3A,plasmiid2,dmso,1234,7.118016,-653.5,0.159581,0.159581,0.495528
Sample3B,plasmiid2,dmso,1342,7.201916,-545.5,0.176973,0.176973,0.521113
Sample3C,plasmiid2,dmso,1532,7.334329,-355.5,0.207568,0.207568,0.561492
Sample4A,plasmiid2,aine,4500,8.411833,2612.5,0.685507,0.685507,0.890076


Need funktsioonid teevad arvutused kõigil tulpadel eraldi:

In [35]:
df.groupby(['konstrukt']).mean()

Unnamed: 0_level_0,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func,minmax-norm-log
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
plasmiid1,464.833333,6.100917,-1422.666667,0.035722,0.035722,0.185365
plasmiid2,3310.166667,7.885794,1422.666667,0.493908,0.493908,0.729661


Kui me oleksime huvitatud ainult ühe tulba keskmistest siis enne grupeerimist peab selle välja valima:

In [36]:
df[["konstrukt","luc2p"]].groupby("konstrukt").mean()

Unnamed: 0_level_0,luc2p
konstrukt,Unnamed: 1_level_1
plasmiid1,464.833333
plasmiid2,3310.166667


Grupeerimist saab läbi viia ka mitmel tasemel. Näiteks grupeerida konstrukti ja töötluse järgi ning seejärel arvutada luc2P signaali keskmised:

In [37]:
df.groupby(['konstrukt','t88tlus']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func,minmax-norm-log
konstrukt,t88tlus,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
plasmiid1,aine,517.666667,6.235361,-1369.833333,0.04423,0.04423,0.226363
plasmiid1,dmso,412.0,5.966473,-1475.5,0.027214,0.027214,0.144366
plasmiid2,aine,5251.0,8.553501,3363.5,0.806441,0.806441,0.933278
plasmiid2,dmso,1369.333333,7.218087,-518.166667,0.181374,0.181374,0.526045


Teine võimalus saada kätte ennast huvitavaid andmeid on need hiljem saadud DataFrame-ist välja võtta:

In [38]:
df_means = df.groupby(['konstrukt','t88tlus']).mean()
df_means

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func,minmax-norm-log
konstrukt,t88tlus,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
plasmiid1,aine,517.666667,6.235361,-1369.833333,0.04423,0.04423,0.226363
plasmiid1,dmso,412.0,5.966473,-1475.5,0.027214,0.027214,0.144366
plasmiid2,aine,5251.0,8.553501,3363.5,0.806441,0.806441,0.933278
plasmiid2,dmso,1369.333333,7.218087,-518.166667,0.181374,0.181374,0.526045


Kasutades selleks niisama indeksi läbi pöördumist:

In [39]:
df_means[ ["luc2p"] ]

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p
konstrukt,t88tlus,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


Või läbi .loc index-i:

In [40]:
df_means.loc[:,["luc2p"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p
konstrukt,t88tlus,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


Praegu saadud `DataFrame`-il on index mitmetasandiline. See võib olla mõndadel juhtudel vajalik kuid et sellest lahti saada saab kasutada `.reset_index()` meetodit:

In [41]:
df_means.reset_index()

Unnamed: 0,konstrukt,t88tlus,luc2p,log-luc2p,luc2p-meancenter,luc2p-minmax,minmax-norm-func,minmax-norm-log
0,plasmiid1,aine,517.666667,6.235361,-1369.833333,0.04423,0.04423,0.226363
1,plasmiid1,dmso,412.0,5.966473,-1475.5,0.027214,0.027214,0.144366
2,plasmiid2,aine,5251.0,8.553501,3363.5,0.806441,0.806441,0.933278
3,plasmiid2,dmso,1369.333333,7.218087,-518.166667,0.181374,0.181374,0.526045


### `.agg` - aggregate

Teine viis teha arvutusi gruppidel on gruppide `.agg` funktsioon.

`.agg` käitub natuke erinevalt vastavalt sellele mis talle ette anda.

Andes listi numpy funtsioonidega arvutab ta iga grupi sees vastava funktsiooni tulemuse:

In [42]:
df.groupby(['konstrukt', 't88tlus']).agg( [np.mean, np.std] )

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p,luc2p,log-luc2p,log-luc2p,luc2p-meancenter,luc2p-meancenter,luc2p-minmax,luc2p-minmax,minmax-norm-func,minmax-norm-func,minmax-norm-log,minmax-norm-log
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,mean,std,mean,std,mean,std,mean,std,mean,std
konstrukt,t88tlus,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
plasmiid1,aine,517.666667,109.418158,6.235361,0.201237,-1369.833333,109.418158,0.04423,0.01762,0.04423,0.01762,0.226363,0.061367
plasmiid1,dmso,412.0,153.567575,5.966473,0.420609,-1475.5,153.567575,0.027214,0.024729,0.027214,0.024729,0.144366,0.128264
plasmiid2,aine,5251.0,1051.714315,8.553501,0.192214,3363.5,1051.714315,0.806441,0.169358,0.806441,0.169358,0.933278,0.058615
plasmiid2,dmso,1369.333333,150.868596,7.218087,0.109059,-518.166667,150.868596,0.181374,0.024294,0.181374,0.024294,0.526045,0.033258


Andes dictionary funktsioonidega siis kasutatakse dictionary võtmeid tulpadele määratud funktsioonide jaoks. Tagastatud `DataFrame`-il on ainult tulbad mille arvutusi sooviti:

In [43]:
df.groupby(['konstrukt', 't88tlus']).agg( {'luc2p':np.mean} )

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p
konstrukt,t88tlus,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


Nii saab dictionarys sisestada ka mitut funktsiooni:

In [44]:
df.groupby(['konstrukt', 't88tlus']).agg({'luc2p':[np.mean, np.std]})

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p,luc2p
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std
konstrukt,t88tlus,Unnamed: 2_level_2,Unnamed: 3_level_2
plasmiid1,aine,517.666667,109.418158
plasmiid1,dmso,412.0,153.567575
plasmiid2,aine,5251.0,1051.714315
plasmiid2,dmso,1369.333333,150.868596


Ja ka ise kirjutatud funktsioone.

In [45]:
def tobe_funktsioon(xgroup):
    """Funktsioon mis lihtsalt paneb grupi siseselt null väärtused"""
    return 0

In [46]:
df.groupby(['konstrukt', 't88tlus']).agg({'luc2p':[np.mean, np.std, tobe_funktsioon]})

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p,luc2p,luc2p
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,tobe_funktsioon
konstrukt,t88tlus,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
plasmiid1,aine,517.666667,109.418158,0
plasmiid1,dmso,412.0,153.567575,0
plasmiid2,aine,5251.0,1051.714315,0
plasmiid2,dmso,1369.333333,150.868596,0


## Näide - kontrolli vastu normaliseerimine


Üks kasulik asi tulpadega arvutusi tehes on see et Pandas teeb arvutusi tulpade vahel vastavalt ridade index-ile. See lubab meil index-iga mängides normaliseeriga andmeid kontrolli vastu.

Arvutame kõigepealt katse sisesed keskmised:

In [47]:
df_mean = df.groupby(['konstrukt', 't88tlus']).agg( {'luc2p':np.mean} )
df_mean

Unnamed: 0_level_0,Unnamed: 1_level_0,luc2p
konstrukt,t88tlus,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


Kuna meil keskmiste tabelil on mitmetasandiline index (konstrukt ja t88tlus) siis saame sellest lahti praegu:

In [48]:
df_mean = df_mean.reset_index()
df_mean

Unnamed: 0,konstrukt,t88tlus,luc2p
0,plasmiid1,aine,517.666667
1,plasmiid1,dmso,412.0
2,plasmiid2,aine,5251.0
3,plasmiid2,dmso,1369.333333


Kuna me tahame normaliseerida konstruktide siseselt siis peame index-i üles ehitama konstruktil:

In [49]:
df_allmeans = df_mean.set_index('konstrukt')
df_allmeans

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


Võtame esialgsest keskmiste tabelist välja ainult kontrolli väärtused kõigepealt muutes index-i "t88tlus"-eks, seejärel valides välja ainult kontrolli (ehk "dmso") väärtused. Siis eemaldades index-i ja muutes indeksi uuesti "konstrukt"-iks.


In [50]:
df_mean.set_index('t88tlus')

Unnamed: 0_level_0,konstrukt,luc2p
t88tlus,Unnamed: 1_level_1,Unnamed: 2_level_1
aine,plasmiid1,517.666667
dmso,plasmiid1,412.0
aine,plasmiid2,5251.0
dmso,plasmiid2,1369.333333


In [51]:
df_mean.set_index('t88tlus').loc[["dmso"],:]

Unnamed: 0_level_0,konstrukt,luc2p
t88tlus,Unnamed: 1_level_1,Unnamed: 2_level_1
dmso,plasmiid1,412.0
dmso,plasmiid2,1369.333333


In [52]:
df_mean.set_index('t88tlus').loc[["dmso"],:].reset_index()

Unnamed: 0,t88tlus,konstrukt,luc2p
0,dmso,plasmiid1,412.0
1,dmso,plasmiid2,1369.333333


In [53]:
df_mean.set_index('t88tlus').loc[["dmso"],:].reset_index().set_index('konstrukt')

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,dmso,412.0
plasmiid2,dmso,1369.333333


Salvestame DataFrame-i normaliseerimise väärtustega eraldi

In [54]:
df_control = df_mean.set_index('t88tlus').loc[["dmso"],:].reset_index().set_index('konstrukt')
df_control

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,dmso,412.0
plasmiid2,dmso,1369.333333


Nüüd on meil kaks `DataFrame`-i:
  * `df_allmeans` - sisaldab kõiki meie katse keskmisi
  * `df_control` - sisaldab ainult kontrolli (ehk "dmso") keskmisi


In [55]:
df_allmeans

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


In [56]:
df_control

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,dmso,412.0
plasmiid2,dmso,1369.333333


Kuna arvutusi tehakse index-i spetsiifiliselt siis saab nüüd normaliseerida kummagi index-i rühma väärtused eraldi:

In [57]:
df_allmeans["ctr-norm"] = df_allmeans["luc2p"] / df_control["luc2p"]

In [58]:
df_allmeans

Unnamed: 0_level_0,t88tlus,luc2p,ctr-norm
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
plasmiid1,aine,517.666667,1.256472
plasmiid1,dmso,412.0,1.0
plasmiid2,aine,5251.0,3.834713
plasmiid2,dmso,1369.333333,1.0


## Andmete salvestamine Excelisse

Olles saanud kätte meid huvitava `DataFrame`-i saame selle salvestada excelisse meetodiga `.to_excel`:

https://pandas.pydata.org/pandas-docs/stable/io.html#io-excel-writer


In [59]:
df_allmeans.to_excel("example_data/single-exp-means.xlsx", sheet_name="means")

Kui me tahame salvestada mitut `DataFrame`-i tuleb seda teha eraldi kasutades Pandase ExcelWriter klassi:

In [60]:
with pd.ExcelWriter("example_data/manysheets.xlsx") as writer:
    df_mean.to_excel(writer, sheet_name="means")
    df_allmeans.to_excel(writer, sheet_name="allmeans")

## Kogu katse andmete analüüs

Olles teinud mitu katset on võimalik nende analüüs kokku võtta mitmel viisil. Kõige lihtsam oleks teha sarnaselt eelnevale katse sisene keskmiste arvutus ja normaliseerimine eraldi igal katseandmete `DataFrame`-il. Seejärel need andmed ühendada ja teha teha sarnane analüüs (koos ka t-testiga) ühendatud `DataFrame`-il.

In [61]:
df_katse1 = pd.read_excel("example_data/02-exp1.xlsx", sheetname="Sheet1")
df_katse2 = pd.read_excel("example_data/02-exp1.xlsx", sheetname="Sheet2")
df_katse3 = pd.read_excel("example_data/02-exp1.xlsx", sheetname="Sheet3")

In [62]:
df_katse1_means = df_katse1.groupby(['konstrukt','t88tlus']).mean().reset_index().set_index('konstrukt')
df_katse1_means

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,aine,517.666667
plasmiid1,dmso,412.0
plasmiid2,aine,5251.0
plasmiid2,dmso,1369.333333


In [63]:
df_katse1_control = df_katse1.groupby(['konstrukt','t88tlus']).mean().reset_index().set_index('t88tlus').loc["dmso",:].reset_index().set_index("konstrukt")
df_katse1_control

Unnamed: 0_level_0,t88tlus,luc2p
konstrukt,Unnamed: 1_level_1,Unnamed: 2_level_1
plasmiid1,dmso,412.0
plasmiid2,dmso,1369.333333


In [64]:
df_katse1_means["fold"] = df_katse1_means["luc2p"] / df_katse1_control["luc2p"]

In [65]:
df_katse1_means = df_katse1_means.reset_index()
df_katse1_means

Unnamed: 0,konstrukt,t88tlus,luc2p,fold
0,plasmiid1,aine,517.666667,1.256472
1,plasmiid1,dmso,412.0,1.0
2,plasmiid2,aine,5251.0,3.834713
3,plasmiid2,dmso,1369.333333,1.0


Sama moodi saab teha arvutused ära teistele katsete andmetele:

In [66]:
df_katse2_means = df_katse2.groupby(['konstrukt','t88tlus']).mean().reset_index().set_index('konstrukt')
df_katse2_control = df_katse2.groupby(['konstrukt','t88tlus']).mean().reset_index().set_index('t88tlus').loc["dmso",:].reset_index().set_index("konstrukt")
df_katse2_means["fold"] = df_katse2_means["luc2p"] / df_katse2_control["luc2p"]
df_katse2_means = df_katse2_means.reset_index()

In [67]:
df_katse3_means = df_katse3.groupby(['konstrukt','t88tlus']).mean().reset_index().set_index('konstrukt')
df_katse3_control = df_katse3.groupby(['konstrukt','t88tlus']).mean().reset_index().set_index('t88tlus').loc["dmso",:].reset_index().set_index("konstrukt")
df_katse3_means["fold"] = df_katse3_means["luc2p"] / df_katse3_control["luc2p"]
df_katse3_means = df_katse3_means.reset_index()

Nüüd on meil kolm erinevat tabelit katsesiseste normaliseeritud andmetega. Need saab omavahel ühendada kasutades pandase `.concat()` funktsiooni. Selle jaoks tuleb tabelid paigutada *list*-i ja edastada `.concat`-ile argumendina:

In [68]:
df_koos = pd.concat( [ df_katse1_means, df_katse2_means, df_katse3_means ] )
df_koos

Unnamed: 0,konstrukt,t88tlus,luc2p,fold
0,plasmiid1,aine,517.666667,1.256472
1,plasmiid1,dmso,412.0,1.0
2,plasmiid2,aine,5251.0,3.834713
3,plasmiid2,dmso,1369.333333,1.0
0,plasmiid1,aine,896.0,1.479362
1,plasmiid1,dmso,605.666667,1.0
2,plasmiid2,aine,6082.666667,2.284713
3,plasmiid2,dmso,2662.333333,1.0
0,plasmiid1,aine,1329.0,1.11431
1,plasmiid1,dmso,1192.666667,1.0


Arvutamaks katsetevaheliste foldide keskmisi saab jälle kasutada group by-d:

In [69]:
df_koos.groupby(['konstrukt','t88tlus']).agg({"fold":[np.mean, np.std]})

Unnamed: 0_level_0,Unnamed: 1_level_0,fold,fold
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std
konstrukt,t88tlus,Unnamed: 2_level_2,Unnamed: 3_level_2
plasmiid1,aine,1.283381,0.184008
plasmiid1,dmso,1.0,0.0
plasmiid2,aine,2.649877,1.050962
plasmiid2,dmso,1.0,0.0


Et teha t-testi arvutatud foldidel kasutame Scipy stats funktsiooni `ttest_rel`.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind

In [76]:
from scipy.stats import ttest_ind

Sellega saab arvutada t-testi edastades sellele parameetritena kaks gruppi mida omavahel testitakse. Tagastatavas muutujas on p-value kirjas:

In [77]:
test = ttest_rel([1.0, 1.0, 1.0 ],[10.0, 10.0, 100])
test.pvalue

0.32324703181604053

Et saaks t-testi jooksutada Pandase grupeeringutega võime selle jaoks kirjutada funktsiooni:

In [78]:
def calc_p_value(x):
    #kontroll grupi väärtused
    ctr_group = x.size * [1.0]
    pval = ttest_ind(ctr_group, x)
    return pval.pvalue

In [79]:
df_koos.groupby(['konstrukt','t88tlus']).agg({"fold":[np.mean, np.std, calc_p_value]})

  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = cond0 & (x <= self.a)


Unnamed: 0_level_0,Unnamed: 1_level_0,fold,fold,fold
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,calc_p_value
konstrukt,t88tlus,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
plasmiid1,aine,1.283381,0.184008,0.055954
plasmiid1,dmso,1.0,0.0,
plasmiid2,aine,2.649877,1.050962,0.053036
plasmiid2,dmso,1.0,0.0,
