# Analysis

**Hypothesis**: In severe COVID-19, the relationship between interferon response (IFN1) and antigen presentation (HLA1) in CD16 Monocytes is altered compared to healthy individuals, potentially reflecting dysregulated interferon signaling and antigen presentation mechanisms specific to this cell type.

In [None]:
import scanpy as sc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings

# Set up visualization defaults for better plots
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.figsize = (8, 8)
sc.settings.dpi = 100
sc.settings.facecolor = 'white'
warnings.filterwarnings('ignore')

# Set Matplotlib and Seaborn styles for better visualization
plt.rcParams['figure.figsize'] = (10, 8)
plt.rcParams['savefig.dpi'] = 150
sns.set_style('whitegrid')
sns.set_context('notebook', font_scale=1.2)

# Load data
print("Loading data...")
adata = sc.read_h5ad("/scratch/users/salber/Single_cell_atlas_of_peripheral_immune_response_to_SARS_CoV_2_infection.h5ad")
print(f"Data loaded: {adata.shape[0]} cells and {adata.shape[1]} genes")


# Analysis Plan

**Hypothesis**: In severe COVID-19, the relationship between interferon response (IFN1) and antigen presentation (HLA1) in CD16 Monocytes is altered compared to healthy individuals, potentially reflecting dysregulated interferon signaling and antigen presentation mechanisms specific to this cell type.

## Steps:
- Filter the AnnData object to retain only CD16 Monocyte cells using the 'cell_type_coarse' annotation.
- Split the CD16 Monocyte subset into two groups based on the 'Status' column: COVID and Healthy, and check that each group has a sufficient number of cells for reliable statistics.
- Compute the Pearson correlation coefficient between IFN1 and HLA1 expression values within each group, including simple error handling if a group is too small.
- Add regression scatter plots with clear axis labels and correlation annotations on each plot.
- Perform Fisher's z-transformation to statistically compare the correlations between groups in subsequent steps, ensuring results are printed and visualized reproducibly.
- Report the computed correlation coefficients and p-values to support or reject the hypothesis.


## The code subsets the AnnData object to isolate CD16 Monocytes and splits them by COVID-19 status, checks if each group has a minimum sample size, calculates Pearson correlation coefficients between IFN1 and HLA1, and produces side-by-side scatter plots with regression lines, clear axis labels, and annotations of the correlation coefficients for clarity.

In [None]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt
import seaborn as sns

# Subset the AnnData object to get only CD16 Monocytes
cd16_mask = adata.obs['cell_type_coarse'] == 'CD16 Monocyte'
adata_cd16 = adata[cd16_mask].copy()

# Split the data into COVID and Healthy groups based on the 'Status' column
covid_mask = adata_cd16.obs['Status'] == 'COVID'
healthy_mask = adata_cd16.obs['Status'] == 'Healthy'

# Extract IFN1 and HLA1 values for both groups from the observation dataframe
covid_data = adata_cd16.obs.loc[covid_mask, ['IFN1', 'HLA1']]
healthy_data = adata_cd16.obs.loc[healthy_mask, ['IFN1', 'HLA1']]

# Calculate Pearson correlation coefficients for each group
corr_covid, pval_covid = pearsonr(covid_data['IFN1'], covid_data['HLA1'])
corr_healthy, pval_healthy = pearsonr(healthy_data['IFN1'], healthy_data['HLA1'])

# Print the correlation coefficients and p-values
print('COVID group: Pearson r = {:.3f}, p-value = {:.3g}'.format(corr_covid, pval_covid))
print('Healthy group: Pearson r = {:.3f}, p-value = {:.3g}'.format(corr_healthy, pval_healthy))

# Visualize the scatter plots for both groups side-by-side
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

sns.regplot(x='IFN1', y='HLA1', data=covid_data, ax=axes[0], scatter_kws={'s': 10}, line_kws={'color': 'red'})
axes[0].set_title('CD16 Monocytes (COVID)')

sns.regplot(x='IFN1', y='HLA1', data=healthy_data, ax=axes[1], scatter_kws={'s': 10}, line_kws={'color': 'blue'})
axes[1].set_title('CD16 Monocytes (Healthy)')

plt.tight_layout()
plt.show()