
# Transcriptomic Descriptor Analysis

This notebook provides an analysis of transcriptomic data, calculating key statistical descriptors for each gene to understand expression patterns across samples.

## Objectives:
- Calculate biologically relevant metrics per gene.
- Summarize the distribution and variability of gene expression data.


In [None]:

# Import required libraries
import pandas as pd
from scipy.stats import skew, kurtosis

# Load the dataset
file_path = '41598_2022_12463_MOESM1_ESM.csv'  # Adjust the path if necessary
data = pd.read_csv(file_path)

# Display the first few rows of the dataset
data.head()


## Calculation of Transcriptomic Descriptors

In [None]:

# Extract numeric columns (samples data)
expression_data = data.iloc[:, 3:]

# Calculate metrics
results = pd.DataFrame({
    'Gene': data['Gene_name'],
    'Mean': expression_data.mean(axis=1),
    'SD': expression_data.std(axis=1),
    'CV': expression_data.std(axis=1) / expression_data.mean(axis=1),
    'Skewness': expression_data.apply(skew, axis=1),
    'Kurtosis': expression_data.apply(kurtosis, axis=1),
    'Range': expression_data.max(axis=1) - expression_data.min(axis=1)
})

# Display the results
results.head()


## Visualization of Results

In [None]:

import matplotlib.pyplot as plt

# Plot distribution of mean expression
plt.figure(figsize=(10, 6))
plt.hist(results['Mean'], bins=30, alpha=0.7, label='Mean Expression')
plt.xlabel('Expression Level')
plt.ylabel('Frequency')
plt.title('Distribution of Mean Expression')
plt.legend()
plt.show()

# Plot Coefficient of Variation vs. Mean
plt.figure(figsize=(10, 6))
plt.scatter(results['Mean'], results['CV'], alpha=0.6, label='CV vs Mean')
plt.xlabel('Mean Expression')
plt.ylabel('Coefficient of Variation')
plt.title('CV vs Mean Expression')
plt.legend()
plt.show()



# Conclusions

- The Mean expression levels indicate the overall activity of genes across the samples.
- High Coefficient of Variation (CV) values suggest greater variability relative to the mean expression.
- Skewness and Kurtosis help identify genes with asymmetric distributions and extreme expression values, respectively.
- The range metric highlights genes with the broadest expression fluctuation.

These metrics provide insights into gene behavior across different conditions and can guide further biological investigations.
