# Introduction
This jupiter notebook compute some statistical test (Chi2) and general information on the data

### Statistical analysis
1. Number of mice/experiments in the database
2. Proportion of survivor mice (Alive) and non survivor mice (Dead) per infection
3. Ratio of alive and dead mice in function of applierd threshold of sacrifice.
4. Statistical test between thoses proportion (Chi2)

### Supplementary information
1. Proportion of mice that has less than 30% weight loss
2. Number of mice per ethical authorization
3. Average weight loss for dead or alive mice

## Import packages and load data

In [63]:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from scipy.stats import chisquare
import ast

In [64]:
df = pd.read_excel("./data/df_for_analysis.xlsx",index_col=0)

## General information and statistical analysis

Select the data you want to analyze between two dates

In [65]:
start_date = datetime(2013,1,1)
end_date = datetime.now()

#Select data between two dates, here everything is used (from 2013)
mask_date = (df['Date'] > start_date) & (df['Date'] <= end_date)
df = df.loc[mask_date]

Display the main dataframe

In [66]:
df

Unnamed: 0,Mouse_ID,ID_Experiment,Cage,Strain,Date,Experiment,Group,Group_info,H0,Pre_traitment,...,survival_0.07,time_0.06,survival_0.06,time_0.05,survival_0.05,time_original,survival_original,max_loss_weight_percentage,exp,sub_exp
0,TRO-05432,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,9.0,1,0.629181,1,A
1,TRO-05433,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,9.0,1,0.660748,1,A
2,TRO-05434,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,2.5,1,2.5,1,9.0,1,0.639184,1,A
3,TRO-05435,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,6.0,1,0.664051,1,A
4,TRO-05456,ID_001,B,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,7.0,1,0.707420,1,A
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2352,TRO-028337,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1,1.5,1,1.5,1,5.0,1,0.761733,3,no
2353,TRO-028338,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1,2.5,1,2.5,1,4.0,1,0.865900,3,no
2354,TRO-028339,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1,5.5,1,5.5,1,6.0,1,0.926829,3,no
2355,TRO-028342,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,0,8.0,0,8.0,0,8.0,0,0.996350,3,no


Note: the column: 'survival_original' is the column that contain death information of the raw data. (no artificial threshold applied)

In [67]:
df['survival_original'] = df['survival_original'].replace({1:'Dead',0:'Alive'})

Compute:
1. number of mice per infection
2. number of experiment per infection
3. dead and alive mice per infection

In [68]:
#RESULT: N_Experiment; N_Mice
group_by_infection = df.groupby(['Infection'])
n_unique = group_by_infection.nunique()
n_unique_infos = n_unique.loc[:,:'ID_Experiment']
n_unique_infos = n_unique_infos.rename(columns={'ID_Experiment':'N_Experiment','Mouse_ID':'N_Mice'})

#RESULT: Alive; Dead
dead_alive = group_by_infection['survival_original'].value_counts().sort_index(ascending=False).unstack()

Concat and display results

In [69]:
result = pd.concat([n_unique_infos,dead_alive],axis=1)
result

Unnamed: 0_level_0,N_Mice,N_Experiment,Alive,Dead
Infection,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
C. albicans,252,6,164,88
H1N1,336,19,203,133
Listeria,1048,39,555,493
S. pneumoniae,721,32,328,393


Find same information but for all mice (regarding of infection)

In [70]:
# Create a dictionary with the sum of each column
total_sum = result.sum()

# Create a DataFrame from the sum with the index name 'Total'
total_df = pd.DataFrame(total_sum).T
total_df.index = ['Total']
total_df.index.name = 'Infection'
total_df

Unnamed: 0_level_0,N_Mice,N_Experiment,Alive,Dead
Infection,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Total,2357,96,1250,1107


In [71]:
general_info = pd.concat([result,total_df],axis=0)

Compute the mortality and display the full information

In [72]:
general_info['mortality'] = round(general_info['Dead']/(general_info['Alive']+general_info['Dead'])*100,1)
general_info

Unnamed: 0_level_0,N_Mice,N_Experiment,Alive,Dead,mortality
Infection,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
C. albicans,252,6,164,88,34.9
H1N1,336,19,203,133,39.6
Listeria,1048,39,555,493,47.0
S. pneumoniae,721,32,328,393,54.5
Total,2357,96,1250,1107,47.0


## Ratio of alive and dead mice in function of applierd threshold of sacrifice.

take only relevant Data, i.e threshold of interest:
1. Raw data
2. 30% maximum weight loss
3. 25% maximum weight loss
4. 20% maximum weight loss
5. 15% maximum weight loss
6. 10% maximum weight loss


In [73]:
data = df.loc[:,['Infection','survival_original','survival_0.3','survival_0.25','survival_0.2','survival_0.15','survival_0.1']]
data = data.replace({0:'Alive',1:'Dead'})
data

Unnamed: 0,Infection,survival_original,survival_0.3,survival_0.25,survival_0.2,survival_0.15,survival_0.1
0,C. albicans,Dead,Dead,Dead,Dead,Dead,Dead
1,C. albicans,Dead,Dead,Dead,Dead,Dead,Dead
2,C. albicans,Dead,Dead,Dead,Dead,Dead,Dead
3,C. albicans,Dead,Dead,Dead,Dead,Dead,Dead
4,C. albicans,Dead,Dead,Dead,Dead,Dead,Dead
...,...,...,...,...,...,...,...
2352,S. pneumoniae,Dead,Dead,Dead,Dead,Dead,Dead
2353,S. pneumoniae,Dead,Dead,Dead,Dead,Dead,Dead
2354,S. pneumoniae,Dead,Dead,Dead,Dead,Dead,Dead
2355,S. pneumoniae,Alive,Alive,Alive,Alive,Alive,Alive


Transform dataframe to tidy data and group per infeciton

In [74]:
survival = data.melt(id_vars=["Infection"],value_name="survival",var_name="threshold")
survival = survival.groupby('Infection').value_counts().reset_index().rename(columns={0:"number_of_mice"})
survival

Unnamed: 0,Infection,threshold,survival,number_of_mice
0,C. albicans,survival_0.1,Dead,220
1,C. albicans,survival_0.15,Dead,187
2,C. albicans,survival_original,Alive,164
3,C. albicans,survival_0.2,Dead,149
4,C. albicans,survival_0.3,Alive,148
5,C. albicans,survival_0.25,Alive,131
6,C. albicans,survival_0.25,Dead,121
7,C. albicans,survival_0.3,Dead,104
8,C. albicans,survival_0.2,Alive,103
9,C. albicans,survival_original,Dead,88


Compute ratio and Supplementary death

In [75]:
# Pivot the table to have 'Infection' as columns and calculate the ratio
statistics = survival.pivot(index=['Infection',"threshold"], columns='survival', values='number_of_mice')
statistics['Ratio'] = round(statistics['Dead'] / (statistics['Dead'] + statistics['Alive'])*100,1)
statistics = statistics.reset_index(level=1)
statistics['supplementary_death'] = statistics.groupby('Infection').apply(lambda x: x['Dead'] - x[x['threshold']=="survival_original"]['Dead']).values
statistics = statistics.reset_index()
statistics

survival,Infection,threshold,Alive,Dead,Ratio,supplementary_death
0,C. albicans,survival_0.1,32,220,87.3,132
1,C. albicans,survival_0.15,65,187,74.2,99
2,C. albicans,survival_0.2,103,149,59.1,61
3,C. albicans,survival_0.25,131,121,48.0,33
4,C. albicans,survival_0.3,148,104,41.3,16
5,C. albicans,survival_original,164,88,34.9,0
6,H1N1,survival_0.1,53,283,84.2,150
7,H1N1,survival_0.15,87,249,74.1,116
8,H1N1,survival_0.2,127,209,62.2,76
9,H1N1,survival_0.25,158,178,53.0,45


### Chi2 tests

In [76]:
def chi_square_in_lambda_function(x):
    observed = x[x['threshold']=='survival_original'][['Dead','Alive']].values.tolist()[0]
    result = []
    for index, rows in x.iterrows():
        data = rows[['Dead','Alive']]
        chi2, p = chisquare(observed,data)
        result += [p]
    x.index
    return pd.Series(result,x.index.values)

In [77]:
chi_result = statistics.groupby("Infection").apply(lambda x: chi_square_in_lambda_function(x))
statistics["chi2"] = chi_result.values
statistics


survival,Infection,threshold,Alive,Dead,Ratio,supplementary_death,chi2
0,C. albicans,survival_0.1,32,220,87.3,132,1.172261e-137
1,C. albicans,survival_0.15,65,187,74.2,99,4.191188e-46
2,C. albicans,survival_0.2,103,149,59.1,61,5.426556e-15
3,C. albicans,survival_0.25,131,121,48.0,33,3.170145e-05
4,C. albicans,survival_0.3,148,104,41.3,16,0.04063269
5,C. albicans,survival_original,164,88,34.9,0,1.0
6,H1N1,survival_0.1,53,283,84.2,150,1.259931e-111
7,H1N1,survival_0.15,87,249,74.1,116,2.630289e-47
8,H1N1,survival_0.2,127,209,62.2,76,1.222074e-17
9,H1N1,survival_0.25,158,178,53.0,45,8.715478e-07


### rearange the dataframe

In [78]:
statistics.pivot_table(values=["Alive","Dead","Ratio","supplementary_death","chi2"],index=["threshold","Infection"])

Unnamed: 0_level_0,survival,Alive,Dead,Ratio,chi2,supplementary_death
threshold,Infection,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
survival_0.1,C. albicans,32,220,87.3,1.172261e-137,132
survival_0.1,H1N1,53,283,84.2,1.259931e-111,150
survival_0.1,Listeria,272,776,74.0,1.786476e-88,283
survival_0.1,S. pneumoniae,275,446,61.9,4.832398e-05,53
survival_0.15,C. albicans,65,187,74.2,4.191188e-46,99
survival_0.15,H1N1,87,249,74.1,2.630289e-47,116
survival_0.15,Listeria,371,677,64.6,1.40803e-32,184
survival_0.15,S. pneumoniae,306,415,57.6,0.09737833,22
survival_0.2,C. albicans,103,149,59.1,5.426556e-15,61
survival_0.2,H1N1,127,209,62.2,1.222074e-17,76


## Supplementary information
### Mice under 30% THR

In [79]:
df

Unnamed: 0,Mouse_ID,ID_Experiment,Cage,Strain,Date,Experiment,Group,Group_info,H0,Pre_traitment,...,survival_0.07,time_0.06,survival_0.06,time_0.05,survival_0.05,time_original,survival_original,max_loss_weight_percentage,exp,sub_exp
0,TRO-05432,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,9.0,Dead,0.629181,1,A
1,TRO-05433,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,9.0,Dead,0.660748,1,A
2,TRO-05434,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,2.5,1,2.5,1,9.0,Dead,0.639184,1,A
3,TRO-05435,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,6.0,Dead,0.664051,1,A
4,TRO-05456,ID_001,B,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1,1.5,1,1.5,1,7.0,Dead,0.707420,1,A
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2352,TRO-028337,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1,1.5,1,1.5,1,5.0,Dead,0.761733,3,no
2353,TRO-028338,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1,2.5,1,2.5,1,4.0,Dead,0.865900,3,no
2354,TRO-028339,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1,5.5,1,5.5,1,6.0,Dead,0.926829,3,no
2355,TRO-028342,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,0,8.0,0,8.0,0,8.0,Alive,0.996350,3,no


In [80]:
df_under_30 = df[df['max_loss_weight_percentage']<0.7]
df_under_30.groupby('Infection')['Mouse_ID'].count()

Infection
C. albicans      64
H1N1             68
Listeria          8
S. pneumoniae     2
Name: Mouse_ID, dtype: int64

In [81]:
len(df_under_30)

142

In [82]:
df_under_30_survivor = df_under_30[df_under_30['survival_original']=='Alive']
df_under_30_survivor.groupby('Infection')['Mouse_ID'].count()

Infection
C. albicans      16
H1N1             17
Listeria          5
S. pneumoniae     1
Name: Mouse_ID, dtype: int64

In [83]:
len(df_under_30_survivor)

39

In [84]:
df_under_30['cohort'] = df_under_30['Date'].apply(lambda x: 'cohort_1' if x < datetime(2018,5,1) else 'cohort_2')
df_under_30.groupby(['Infection','cohort'])['Mouse_ID'].count()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_under_30['cohort'] = df_under_30['Date'].apply(lambda x: 'cohort_1' if x < datetime(2018,5,1) else 'cohort_2')


Infection      cohort  
C. albicans    cohort_1    64
H1N1           cohort_1    12
               cohort_2    56
Listeria       cohort_1     4
               cohort_2     4
S. pneumoniae  cohort_1     1
               cohort_2     1
Name: Mouse_ID, dtype: int64

### Number of Mice per ethical autorization (included in analysis)

In [85]:
df_auth1 = df[df['Date'] < datetime(2018,5,1)] # 2 were already excluded from the data
df_auth2 = df[df['Date']>= datetime(2018,5,1)] # 41 were already excluded from the data
df_auth2.head()

Unnamed: 0,Mouse_ID,ID_Experiment,Cage,Strain,Date,Experiment,Group,Group_info,H0,Pre_traitment,...,survival_0.07,time_0.06,survival_0.06,time_0.05,survival_0.05,time_original,survival_original,max_loss_weight_percentage,exp,sub_exp
777,TRO-18099,ID_029,ETRO-00940,C57BL/6J,2018-05-28,Listeria/Clodronate/Training,1,Zymosam + PBS liposome + Listeria,-1,training/zymosan,...,0,6.5,1,2.5,1,8.0,Alive,0.935644,1,no
778,TRO-18100,ID_029,ETRO-00940,C57BL/6J,2018-05-28,Listeria/Clodronate/Training,1,Zymosam + PBS liposome + Listeria,-1,training/zymosan,...,1,2.5,1,2.5,1,8.0,Alive,0.876289,1,no
779,TRO-18101,ID_029,ETRO-00940,C57BL/6J,2018-05-28,Listeria/Clodronate/Training,1,Zymosam + PBS liposome + Listeria,-1,training/zymosan,...,0,8.0,0,8.0,0,8.0,Alive,0.973404,1,no
780,TRO-18102,ID_029,ETRO-00940,C57BL/6J,2018-05-28,Listeria/Clodronate/Training,1,Zymosam + PBS liposome + Listeria,-1,training/zymosan,...,1,2.5,1,2.5,1,8.0,Alive,0.893023,1,no
781,TRO-18103,ID_029,ETRO-00940,C57BL/6J,2018-05-28,Listeria/Clodronate/Training,1,Zymosam + PBS liposome + Listeria,-1,training/zymosan,...,0,6.5,1,2.5,1,8.0,Alive,0.935135,1,no


#### Authorization 1, from 2013-2018

In [86]:
df_auth1.groupby(['Infection'])['Mouse_ID'].count()

Infection
C. albicans      252
H1N1              63
Listeria         263
S. pneumoniae    211
Name: Mouse_ID, dtype: int64

In [87]:
#total
df_auth1.groupby(['Infection'])['Mouse_ID'].count().sum()

789

#### Authorization 2, from 2018-2023

In [88]:
df_auth2.groupby(['Infection'])['Mouse_ID'].count()

Infection
H1N1             273
Listeria         785
S. pneumoniae    510
Name: Mouse_ID, dtype: int64

In [89]:
#total
df_auth2.groupby(['Infection'])['Mouse_ID'].count().sum()

1568

### Average maximum weight loss for non survivor mice

In [90]:
df_dead = df[df['survival_original']=='Dead']
df_dead.groupby('Infection').max_loss_weight_percentage.mean()*100

Infection
C. albicans      72.624314
H1N1             75.015886
Listeria         81.643642
S. pneumoniae    84.984221
Name: max_loss_weight_percentage, dtype: float64

In [91]:
df_dead.max_loss_weight_percentage.mean()*100

81.31940006860864

### Average maximum weight loss for survivor mice

In [92]:
df_alive = df[df['survival_original']=='Alive']
df_alive.groupby('Infection').max_loss_weight_percentage.mean()*100

Infection
C. albicans      82.209562
H1N1             83.462556
Listeria         89.000203
S. pneumoniae    93.997456
Name: max_loss_weight_percentage, dtype: float64

In [93]:
df_alive.max_loss_weight_percentage.mean()*100

88.52123632035364

### Score vs Weight loss

In [94]:
import ast
import numpy as np

In [95]:
df['Scores'] = df['Scores'].apply(lambda x: ast.literal_eval(x))

In [96]:
df['max_score'] = df['Scores'].apply(lambda x : max(x) if x  else np.nan)

In [97]:
def map_percentage_to_score(percentage, infection_type):
    if infection_type == 'acute':
        if percentage > 1:
            return 0
        elif 0.9 <= percentage <= 1:
            return 1
        elif 0.8 <= percentage < 0.9:
            return 2
        else:
            return 3
    elif infection_type == 'chronic':
        if percentage > 1:
            return 0
        elif 0.9 <= percentage <= 1:
            return 1
        elif 0.7 <= percentage < 0.9:
            return 2
        else:
            return 3
    else:
        return None  # Handle unexpected infection_type values

In [98]:
df['infection_type'] = df['Infection'].apply(lambda x : "acute" if x =='Listeria' else "chronic")

In [99]:
df['max_weight_score'] = df.apply(lambda x: map_percentage_to_score(x.max_loss_weight_percentage,x.infection_type),axis=1)


In [100]:
df_score = df[df['max_score'].notna()]
df_score

Unnamed: 0,Mouse_ID,ID_Experiment,Cage,Strain,Date,Experiment,Group,Group_info,H0,Pre_traitment,...,time_0.05,survival_0.05,time_original,survival_original,max_loss_weight_percentage,exp,sub_exp,max_score,infection_type,max_weight_score
858,TRO-18673,ID_032,ETRO-01107,C57BL/6J Sirt2 WT,2018-10-16,Listeria/SIRT5/Sub-lethal,0,WT,0,no,...,7.0,0,7.0,Alive,0.959821,0,no,1.0,acute,1
859,TRO-18674,ID_032,ETRO-01107,C57BL/6J Sirt2 WT,2018-10-16,Listeria/SIRT5/Sub-lethal,0,WT,0,no,...,2.5,1,7.0,Alive,0.945455,0,no,1.0,acute,1
860,TRO-18675,ID_032,ETRO-01108,C57BL/6J Sirt2 WT,2018-10-16,Listeria/SIRT5/Sub-lethal,0,WT,0,no,...,7.0,0,7.0,Alive,0.950980,0,no,1.0,acute,1
861,TRO-18676,ID_032,ETRO-01108,C57BL/6J Sirt2 WT,2018-10-16,Listeria/SIRT5/Sub-lethal,0,WT,0,no,...,3.5,1,7.0,Alive,0.919048,0,no,1.0,acute,1
862,TRO-18677,ID_032,ETRO-01108,C57BL/6J Sirt2 WT,2018-10-16,Listeria/SIRT5/Sub-lethal,0,WT,0,no,...,7.0,0,7.0,Alive,0.955000,0,no,1.0,acute,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2352,TRO-028337,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1.5,1,5.0,Dead,0.761733,3,no,6.0,chronic,2
2353,TRO-028338,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,2.5,1,4.0,Dead,0.865900,3,no,3.0,chronic,2
2354,TRO-028339,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,5.5,1,6.0,Dead,0.926829,3,no,6.0,chronic,1
2355,TRO-028342,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,8.0,0,8.0,Alive,0.996350,3,no,1.0,chronic,1


In [101]:
def condition_to_sac(x):
    health_status = x.max_score - x.max_weight_score
    weight_score = x.max_weight_score
    score = x.max_score
    #if health_status >= 6 and weight_score < 3:
        #return 'sac_by_health_status_only'
    #elif health_status >= 6 and weight_score >= 3:
        #return 'sac_by_health_status_and_weight'
    if score >= 6 and weight_score >= 0:
        return 'sac_by_combine_effect'
    elif score < 6 and weight_score >= 3:
        return 'sac_by_weight_only'
    elif x.max_score <6 and x.max_weight_score <3 and x.survival_original == 'Dead':
        return 'found_dead'
    else:
        return 'not_sac'

In [102]:
df_score['sacrifice'] = df_score.apply(lambda x: condition_to_sac(x),axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_score['sacrifice'] = df_score.apply(lambda x: condition_to_sac(x),axis=1)


In [103]:
df_score_dead = df_score[df_score['survival_original']=='Dead']

In [104]:
df_score.survival_original.value_counts()

Dead     744
Alive    741
Name: survival_original, dtype: int64

In [106]:
df_score_dead_counts = df_score_dead.groupby('Infection')['sacrifice'].value_counts() #.to_excel("./death_type.xlsx")
df_score_dead_counts = df_score_dead_counts.reset_index(level=0)
df_score_dead_counts = df_score_dead_counts.rename(columns={'sacrifice.1':'counts','sacrifice':'type_of_death'})
df_score_dead_counts.to_excel("./results/supplementary/death_type.xlsx")

### Expected mortality

In [59]:
df

Unnamed: 0,Mouse_ID,ID_Experiment,Cage,Strain,Date,Experiment,Group,Group_info,H0,Pre_traitment,...,time_0.05,survival_0.05,time_original,survival_original,max_loss_weight_percentage,exp,sub_exp,max_score,infection_type,max_weight_score
0,TRO-05432,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1.5,1,9.0,Dead,0.629181,1,A,,chronic,3
1,TRO-05433,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1.5,1,9.0,Dead,0.660748,1,A,,chronic,3
2,TRO-05434,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,2.5,1,9.0,Dead,0.639184,1,A,,chronic,3
3,TRO-05435,ID_001,A,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1.5,1,6.0,Dead,0.664051,1,A,,chronic,3
4,TRO-05456,ID_001,B,BALB/cByJ,2014-06-05,Candida/Propionate,1A,Propionate / 2*10^5,1,propionate,...,1.5,1,7.0,Dead,0.707420,1,A,,chronic,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2352,TRO-028337,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,1.5,1,5.0,Dead,0.761733,3,no,6.0,chronic,2
2353,TRO-028338,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,2.5,1,4.0,Dead,0.865900,3,no,3.0,chronic,2
2354,TRO-028339,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,5.5,1,6.0,Dead,0.926829,3,no,6.0,chronic,1
2355,TRO-028342,ID_096,ETRO-01911,C57BL/6J,2023-03-03,Pneumococcus/Training/Cross-fostering/male,3,D. Zy-Zy,1,training/cross-fostering,...,8.0,0,8.0,Alive,0.996350,3,no,1.0,chronic,1


In [60]:
df_control = df[df['exp']==0]
mortality = df_control.groupby(['Infection'])['survival_original'].value_counts()

In [61]:
mortality

Infection      survival_original
C. albicans    Alive                 49
               Dead                  33
H1N1           Dead                  52
               Alive                 49
Listeria       Dead                 165
               Alive                 78
S. pneumoniae  Dead                 138
               Alive                 55
Name: survival_original, dtype: int64

In [62]:
ratio = mortality[:,'Dead']/(mortality[:,'Alive'] + mortality[:,'Dead'])*100
ratio

Infection
C. albicans      40.243902
H1N1             51.485149
Listeria         67.901235
S. pneumoniae    71.502591
Name: survival_original, dtype: float64