### Data:

File `cell-count.csv` contains cell count information for various immune cell populations of each patient sample. There are five populations: `b_cell`, `cd8_t_cell`, `cd4_t_cell`, `nk_cell`, and `monocyte`. Each row in the file corresponds to a biological sample.

### Task1: 
Write a python program to **convert cell count in cell-count.csv to relative frequency (in percentage) of total cell count for each sample**. Total cell count of each sample is the sum of cells in the five populations of that sample. Please return an output file in csv format with cell count and relative frequency of each population of each sample per line. The output file should have the following columns:

- **sample:** the sample id as in column sample in cell-count.csv
- **total_count:** total cell count of sample
- **population:** name of the immune cell population (e.g. b_cell, cd8_t_cell, etc.)
- **count:** cell count
- **percentage:** relative frequency in percentage

In [1]:
# Importing required libraries
import pandas as pd

In [2]:
# Importing data
try:
    # Load the data
    data = pd.read_csv('../data/cell-count.csv')

    # Display the first few rows of the data frame
    print("Succesfully loaded data:")
    print(data.head())

    # Basic Operation: Calculate the total number of samples
    total_samples = data.shape[0]
    print(f"Total  number of samples: {total_samples}")

except Exception as e:
    print(f"An error occured: {e}")

Succesfully loaded data:
  project subject condition  age sex treatment response sample sample_type  \
0    prj1    sbj1  melanoma   70   F       tr1        y     s1        PBMC   
1    prj1    sbj1  melanoma   70   F       tr1        y     s2        PBMC   
2    prj1    sbj1  melanoma   70   F       tr1        y     s3        PBMC   
3    prj1    sbj2   healthy   65   F      none      NaN     s4        PBMC   
4    prj1    sbj3  melanoma   75   M       tr1        n     s5        PBMC   

   time_from_treatment_start  b_cell  cd8_t_cell  cd4_t_cell  nk_cell  \
0                        0.0   36000       24000       42000     6000   
1                        7.0   30000       22000       40000     2000   
2                       14.0   35000       26250       37500    10000   
3                        NaN   27900       17100       18000     4500   
4                        0.0   60000       30000       37500     4500   

   monocyte  
0     12000  
1      6000  
2     16250  
3     22500

In [3]:
populations = ["b_cell", "cd8_t_cell", "cd4_t_cell", "nk_cell", "monocyte"]

In [4]:
# Calculate total count of each sample
data['total_count'] = data[populations].sum(axis=1)

In [5]:
# Prepare the output DataFrame
output = pd.DataFrame()

In [6]:
# Loop through each population to calculate percentage
for population in populations:
    temp_df = pd.DataFrame()
    temp_df['sample'] = data['sample']
    temp_df['total_count'] = data['total_count']
    temp_df['population'] = population
    temp_df['count'] = data[population]
    temp_df['percentage'] = (data[population] / data['total_count']) * 100
    output = pd.concat([output, temp_df])


In [7]:
# Updating the index values before saving as CSV
output = output.reset_index(drop=True)

In [8]:
# Save the processed data to a new CSV file
output.to_csv("../data/updated-cell-count.csv", index=False)

In [9]:
# Print head for preview
print(output.head())

  sample  total_count population  count  percentage
0     s1       120000     b_cell  36000        30.0
1     s2       100000     b_cell  30000        30.0
2     s3       125000     b_cell  35000        28.0
3     s4        90000     b_cell  27900        31.0
4     s5       150000     b_cell  60000        40.0
