Patient types:

- `convalescent patients (CPs) `

> To study the adaptive immune response to SARS-CoV-2, we recruited 34 CPs. According to the classification developed by the U.S. National Institutes of Health, the patients were categorized as having asymptomatic (n = 2), mild (n = 20), or moderate to severe (n = 12) disease.
>
> Peripheral blood was collected between days 17 and 49 (median day 34) after the onset of symptoms or a positive PCR test result.

- `14 healthy volunteers recruited during the COVID-19 pandemic (HD(CoV)) with no symptoms and negative PCR test results`. However might have past exposure?
   - Later they do say `All tested HD(CoV) sera lacked antibodies against SARS-CoV-2 antigens`
   - and `IgGs from the HD(CoV) and HD(BB) groups showed no reactivity to the S protein of SARS-CoV-2 or its receptor-binding domain (RBD)`
   - however **`This might indicate that some HD(CoV) patients were exposed to the virus but rapidly cleared it via T cells without developing a humoral response.`**
- `10 samples of peripheral blood mononuclear cells (PBMCs) from biobanked healthy hematopoietic stem cell donors (HD(BB)), which were cryopreserved no later than September 2019`. The HD(BB) are not included in the download - why?
- `10 serum samples from healthy blood donors that were cryopreserved no later than 2017 (HD(S)).` These were not sequenced

In [1]:
import pandas as pd
from malid import config

In [2]:
base_dir = config.paths.external_raw_data / "Shomuradova"

In [3]:
# specimen labels seen in the AIRR repertoire list
metadata = pd.read_csv(
    base_dir / "Shomuradova_ir_2022-09-24_0105_632e57c42f04a.tsv", sep="\t"
)
metadata = (
    metadata[
        [
            "repertoire_id",
            "subject_id",
            "sample_id",
            "cell_subset",
            "cell_phenotype",
            "medical_history",
            "disease_stage",
            "age_min",
            "sex",
        ]
    ]
    .rename(columns={"age_min": "age"})
    .assign(
        sex=metadata["sex"].replace({"male": "M", "female": "F"}),
        ethnicity_condensed="Caucasian",
    )
)
metadata

Unnamed: 0,repertoire_id,subject_id,sample_id,cell_subset,cell_phenotype,medical_history,disease_stage,age,sex,ethnicity_condensed
0,5f07aa8739579433171763b2,1437,p1437_CD4ifny,CD4-positive helper T cell,CD4+ IFNγ+,COVID-19 Mild,Convalescent,28,M,Caucasian
1,5f07aa8839579433171763b3,1437,p1437_CD8ifny,"CD8-positive, alpha-beta T cell",CD8+ IFNγ+,COVID-19 Mild,Convalescent,28,M,Caucasian
2,5f07aa8839579433171763b4,1437,p1437_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian
3,5f07aa8839579433171763b5,1445,p1445_CD4ifny,CD4-positive helper T cell,CD4+ IFNγ+,COVID-19 Mild,Convalescent,32,M,Caucasian
4,5f07aa8939579433171763b6,1445,p1445_CD8ifny,"CD8-positive, alpha-beta T cell",CD8+ IFNγ+,COVID-19 Mild,Convalescent,32,M,Caucasian
...,...,...,...,...,...,...,...,...,...,...
111,6047f712136a6d9249829490,1448,p1448_exp_YLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: YLQPRTFLL-,COVID-19 Moderate/severe,Convalescent,37,M,Caucasian
112,6047f712136a6d9249829492,1484,p1484_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,,,59,F,Caucasian
113,6047f713136a6d9249829493,1484,p1484_exp_RLQ_pos_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV+,,,59,F,Caucasian
114,6047f713136a6d9249829495,1495,p1495_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,COVID-19 Moderate/severe,Convalescent,41,M,Caucasian


In [4]:
# specimen labels seen in the actual AIRR sequences export
specimen_labels = pd.read_csv(base_dir / "specimen_labels.txt", header=None)
specimen_labels

Unnamed: 0,0
0,5f07aa8739579433171763b2
1,5f07aa8839579433171763b3
2,5f07aa8839579433171763b4
3,5f07aa8839579433171763b5
4,5f07aa8939579433171763b6
...,...
111,6047f712136a6d9249829490
112,6047f712136a6d9249829492
113,6047f713136a6d9249829493
114,6047f713136a6d9249829495


In [5]:
# patient diagnosis/type extracted from paper
# we want to ignore HD(CoV) because they may have some Covid exposure. see notes above.
patient_status = pd.read_csv(base_dir / "Shomuradova_patient_metadata.csv")
patient_status

Unnamed: 0,patient,type,disease,severity
0,p1426,CP,Covid19,mild
1,p1428,CP,Covid19,mild
2,p1434,CP,Covid19,mild
3,p1435,CP,Covid19,mild
4,p1436,CP,Covid19,mild
5,p1437,CP,Covid19,mild
6,p1445,CP,Covid19,mild
7,p1446,CP,Covid19,mild
8,p1447,CP,Covid19,mild
9,p1448,CP,Covid19,moderate/severe


In [6]:
metadata_annot = pd.merge(
    metadata,
    patient_status.assign(
        subject_id=patient_status["patient"].str.extract("(\d+)").astype(int)
    ),
    how="left",
    on="subject_id",
    validate="m:1",
)
metadata_annot

Unnamed: 0,repertoire_id,subject_id,sample_id,cell_subset,cell_phenotype,medical_history,disease_stage,age,sex,ethnicity_condensed,patient,type,disease,severity
0,5f07aa8739579433171763b2,1437,p1437_CD4ifny,CD4-positive helper T cell,CD4+ IFNγ+,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild
1,5f07aa8839579433171763b3,1437,p1437_CD8ifny,"CD8-positive, alpha-beta T cell",CD8+ IFNγ+,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild
2,5f07aa8839579433171763b4,1437,p1437_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild
3,5f07aa8839579433171763b5,1445,p1445_CD4ifny,CD4-positive helper T cell,CD4+ IFNγ+,COVID-19 Mild,Convalescent,32,M,Caucasian,p1445,CP,Covid19,mild
4,5f07aa8939579433171763b6,1445,p1445_CD8ifny,"CD8-positive, alpha-beta T cell",CD8+ IFNγ+,COVID-19 Mild,Convalescent,32,M,Caucasian,p1445,CP,Covid19,mild
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111,6047f712136a6d9249829490,1448,p1448_exp_YLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: YLQPRTFLL-,COVID-19 Moderate/severe,Convalescent,37,M,Caucasian,p1448,CP,Covid19,moderate/severe
112,6047f712136a6d9249829492,1484,p1484_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,,,59,F,Caucasian,p1484,CP,Covid19,asymptomatic
113,6047f713136a6d9249829493,1484,p1484_exp_RLQ_pos_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV+,,,59,F,Caucasian,p1484,CP,Covid19,asymptomatic
114,6047f713136a6d9249829495,1495,p1495_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,COVID-19 Moderate/severe,Convalescent,41,M,Caucasian,p1495,CP,Covid19,moderate/severe


In [7]:
metadata_annot["type"].isna().value_counts()

False    116
Name: type, dtype: int64

In [8]:
metadata_annot["type"].value_counts()

CP         113
HD(CoV)      3
Name: type, dtype: int64

In [9]:
metadata_annot.groupby("type")["subject_id"].nunique()

type
CP         21
HD(CoV)     1
Name: subject_id, dtype: int64

In [10]:
metadata_annot.groupby("disease")["subject_id"].nunique()

disease
Covid19    21
Name: subject_id, dtype: int64

In [11]:
# HD(BB) not available. Covid19 only.
metadata_filtered = (
    metadata_annot[metadata_annot["disease"] == "Covid19"]
    .rename(columns={"repertoire_id": "specimen_label", "patient": "participant_label"})
    .assign(study_name="Shomuradova")
)
metadata_filtered["disease_subtype"] = (
    metadata_filtered["disease"] + " - " + metadata_filtered["severity"]
)
metadata_filtered

Unnamed: 0,specimen_label,subject_id,sample_id,cell_subset,cell_phenotype,medical_history,disease_stage,age,sex,ethnicity_condensed,participant_label,type,disease,severity,study_name,disease_subtype
0,5f07aa8739579433171763b2,1437,p1437_CD4ifny,CD4-positive helper T cell,CD4+ IFNγ+,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild,Shomuradova,Covid19 - mild
1,5f07aa8839579433171763b3,1437,p1437_CD8ifny,"CD8-positive, alpha-beta T cell",CD8+ IFNγ+,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild,Shomuradova,Covid19 - mild
2,5f07aa8839579433171763b4,1437,p1437_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild,Shomuradova,Covid19 - mild
3,5f07aa8839579433171763b5,1445,p1445_CD4ifny,CD4-positive helper T cell,CD4+ IFNγ+,COVID-19 Mild,Convalescent,32,M,Caucasian,p1445,CP,Covid19,mild,Shomuradova,Covid19 - mild
4,5f07aa8939579433171763b6,1445,p1445_CD8ifny,"CD8-positive, alpha-beta T cell",CD8+ IFNγ+,COVID-19 Mild,Convalescent,32,M,Caucasian,p1445,CP,Covid19,mild,Shomuradova,Covid19 - mild
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111,6047f712136a6d9249829490,1448,p1448_exp_YLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: YLQPRTFLL-,COVID-19 Moderate/severe,Convalescent,37,M,Caucasian,p1448,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe
112,6047f712136a6d9249829492,1484,p1484_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,,,59,F,Caucasian,p1484,CP,Covid19,asymptomatic,Shomuradova,Covid19 - asymptomatic
113,6047f713136a6d9249829493,1484,p1484_exp_RLQ_pos_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV+,,,59,F,Caucasian,p1484,CP,Covid19,asymptomatic,Shomuradova,Covid19 - asymptomatic
114,6047f713136a6d9249829495,1495,p1495_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,COVID-19 Moderate/severe,Convalescent,41,M,Caucasian,p1495,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe


In [12]:
# if disease_stage is blank (with one space), it means asymptomatic
metadata_filtered[
    metadata_filtered["disease_stage"]
    .mask(metadata_filtered["disease_stage"].str.strip() == "")
    .isna()
]

Unnamed: 0,specimen_label,subject_id,sample_id,cell_subset,cell_phenotype,medical_history,disease_stage,age,sex,ethnicity_condensed,participant_label,type,disease,severity,study_name,disease_subtype
112,6047f712136a6d9249829492,1484,p1484_exp_RLQ_neg_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV-,,,59,F,Caucasian,p1484,CP,Covid19,asymptomatic,Shomuradova,Covid19 - asymptomatic
113,6047f713136a6d9249829493,1484,p1484_exp_RLQ_pos_beta,"CD8-positive, alpha-beta T cell",CD3+ CD8+ epitope: RLQSLQTYV+,,,59,F,Caucasian,p1484,CP,Covid19,asymptomatic,Shomuradova,Covid19 - asymptomatic


In [13]:
# confirm our annotations
metadata_filtered[["medical_history", "severity"]].drop_duplicates()

Unnamed: 0,medical_history,severity
0,COVID-19 Mild,mild
9,COVID-19 Asymptomatic,asymptomatic
32,COVID-19 Moderate/Severe,moderate/severe
67,COVID-19 Moderate/severe,moderate/severe
112,,asymptomatic


In [14]:
# extract sample type
# "We analyzed the TCR repertoires of fluorescence-activated cell sorting (FACS)-sorted IFNγ-secreting CD8+/CD4+ cells and MHC-tetramer-positive populations as well as the total fraction of PBMCs by Illumina high-throughput sequencing"
metadata_filtered["sample_type"] = (
    metadata_filtered["sample_id"]
    .str.split("_")
    .str[1:]
    .apply(lambda arr: " ".join(arr))
)
metadata_filtered[["cell_phenotype", "cell_subset", "sample_type"]].drop_duplicates()

Unnamed: 0,cell_phenotype,cell_subset,sample_type
0,CD4+ IFNγ+,CD4-positive helper T cell,CD4ifny
1,CD8+ IFNγ+,"CD8-positive, alpha-beta T cell",CD8ifny
2,,,PBMC
9,CD4+ IFNγ+,CD4-positive helper T cell,CD8ifny
10,CD8+ IFNγ+,"CD8-positive, alpha-beta T cell",PBMC
11,,,CD4ifny
15,CD3+ CD8+ epitope: KIADYNYKL-,"CD8-positive, alpha-beta T cell",exp KIA neg beta
16,CD3+ CD8+ epitope: KIADYNYKL+,"CD8-positive, alpha-beta T cell",exp KIA pos beta
17,CD3+ CD8+ epitope: LITGRLQSL-,"CD8-positive, alpha-beta T cell",exp LIT neg beta
18,CD3+ CD8+ epitope: LITGRLQSL+,"CD8-positive, alpha-beta T cell",exp LIT pos beta


In [15]:
metadata_filtered.groupby(["cell_phenotype", "cell_subset", "sample_type"]).size()

cell_phenotype                 cell_subset                      sample_type         
                                                                CD4ifny                  1
                                                                PBMC                    17
CD3+ CD8+ epitope: ALNTLVKQL+  CD8-positive, alpha-beta T cell  exp ALN pos beta         1
CD3+ CD8+ epitope: ALNTLVKQL-  CD8-positive, alpha-beta T cell  exp ALN neg beta         1
CD3+ CD8+ epitope: FIAGLIAIV+  CD8-positive, alpha-beta T cell  FIA pos beta             1
CD3+ CD8+ epitope: FIAGLIAIV-  CD8-positive, alpha-beta T cell  FIA neg beta             1
CD3+ CD8+ epitope: KIADYNYKL+  CD8-positive, alpha-beta T cell  exp KIA pos beta         2
CD3+ CD8+ epitope: KIADYNYKL-  CD8-positive, alpha-beta T cell  exp KIA neg beta         2
CD3+ CD8+ epitope: LITGRLQSL+  CD8-positive, alpha-beta T cell  exp LIT pos beta         2
CD3+ CD8+ epitope: LITGRLQSL-  CD8-positive, alpha-beta T cell  exp LIT KLP neg beta     1
     

In [16]:
# use total PBMC samples only. cell_phenotype and cell_subset must be blank
metadata_filtered = metadata_filtered[
    (metadata_filtered["sample_type"] == "PBMC")
    & (
        metadata_filtered["cell_phenotype"]
        .mask(metadata_filtered["cell_phenotype"].str.strip() == "")
        .isna()
    )
    & (
        metadata_filtered["cell_subset"]
        .mask(metadata_filtered["cell_subset"].str.strip() == "")
        .isna()
    )
]
metadata_filtered

Unnamed: 0,specimen_label,subject_id,sample_id,cell_subset,cell_phenotype,medical_history,disease_stage,age,sex,ethnicity_condensed,participant_label,type,disease,severity,study_name,disease_subtype,sample_type
2,5f07aa8839579433171763b4,1437,p1437_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
5,5f07aa8939579433171763b7,1445,p1445_PBMC,,,COVID-19 Mild,Convalescent,32,M,Caucasian,p1445,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
8,5f07aa8a39579433171763ba,1473,p1473_PBMC,,,COVID-19 Mild,Convalescent,31,F,Caucasian,p1473,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
14,5f07aa8c39579433171763c0,1489,p1489_PBMC,,,COVID-19 Mild,Convalescent,27,M,Caucasian,p1489,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
66,6047f702136a6d924982945c,1434,p1434_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian,p1434,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
69,6047f703136a6d924982945f,1448,p1448_PBMC,,,COVID-19 Moderate/severe,Convalescent,37,M,Caucasian,p1448,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC
72,6047f704136a6d9249829462,1449,p1449_PBMC,,,COVID-19 Mild,Convalescent,34,F,Caucasian,p1449,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
75,6047f704136a6d9249829465,1465,p1465_PBMC,,,COVID-19 Moderate/severe,Convalescent,19,M,Caucasian,p1465,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC
81,6047f706136a6d924982946b,1480,p1480_PBMC,,,COVID-19 Moderate/severe,Convalescent,29,M,Caucasian,p1480,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC
84,6047f707136a6d924982946e,1481,p1481_PBMC,,,COVID-19 Moderate/severe,Convalescent,30,F,Caucasian,p1481,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC


In [17]:
# one sample left per patient

In [18]:
metadata_filtered["type"].value_counts()

CP    17
Name: type, dtype: int64

In [19]:
metadata_filtered["disease"].value_counts()

Covid19    17
Name: disease, dtype: int64

In [20]:
metadata_filtered.groupby("type")["subject_id"].nunique()

type
CP    17
Name: subject_id, dtype: int64

In [21]:
metadata_filtered.groupby("disease")["subject_id"].nunique()

disease
Covid19    17
Name: subject_id, dtype: int64

In [22]:
assert (metadata_filtered.groupby("subject_id").size() == 1).all()

In [23]:
# filter by subtype? not for now.

In [24]:
metadata_filtered["disease_subtype"].value_counts()

Covid19 - mild               10
Covid19 - moderate/severe     7
Name: disease_subtype, dtype: int64

In [25]:
metadata_filtered

Unnamed: 0,specimen_label,subject_id,sample_id,cell_subset,cell_phenotype,medical_history,disease_stage,age,sex,ethnicity_condensed,participant_label,type,disease,severity,study_name,disease_subtype,sample_type
2,5f07aa8839579433171763b4,1437,p1437_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian,p1437,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
5,5f07aa8939579433171763b7,1445,p1445_PBMC,,,COVID-19 Mild,Convalescent,32,M,Caucasian,p1445,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
8,5f07aa8a39579433171763ba,1473,p1473_PBMC,,,COVID-19 Mild,Convalescent,31,F,Caucasian,p1473,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
14,5f07aa8c39579433171763c0,1489,p1489_PBMC,,,COVID-19 Mild,Convalescent,27,M,Caucasian,p1489,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
66,6047f702136a6d924982945c,1434,p1434_PBMC,,,COVID-19 Mild,Convalescent,28,M,Caucasian,p1434,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
69,6047f703136a6d924982945f,1448,p1448_PBMC,,,COVID-19 Moderate/severe,Convalescent,37,M,Caucasian,p1448,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC
72,6047f704136a6d9249829462,1449,p1449_PBMC,,,COVID-19 Mild,Convalescent,34,F,Caucasian,p1449,CP,Covid19,mild,Shomuradova,Covid19 - mild,PBMC
75,6047f704136a6d9249829465,1465,p1465_PBMC,,,COVID-19 Moderate/severe,Convalescent,19,M,Caucasian,p1465,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC
81,6047f706136a6d924982946b,1480,p1480_PBMC,,,COVID-19 Moderate/severe,Convalescent,29,M,Caucasian,p1480,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC
84,6047f707136a6d924982946e,1481,p1481_PBMC,,,COVID-19 Moderate/severe,Convalescent,30,F,Caucasian,p1481,CP,Covid19,moderate/severe,Shomuradova,Covid19 - moderate/severe,PBMC


In [26]:
# export
metadata_export = metadata_filtered[
    [
        "specimen_label",
        "participant_label",
        "disease",
        "study_name",
        "disease_subtype",
        "age",
        "sex",
        "ethnicity_condensed",
    ]
]
metadata_export

Unnamed: 0,specimen_label,participant_label,disease,study_name,disease_subtype,age,sex,ethnicity_condensed
2,5f07aa8839579433171763b4,p1437,Covid19,Shomuradova,Covid19 - mild,28,M,Caucasian
5,5f07aa8939579433171763b7,p1445,Covid19,Shomuradova,Covid19 - mild,32,M,Caucasian
8,5f07aa8a39579433171763ba,p1473,Covid19,Shomuradova,Covid19 - mild,31,F,Caucasian
14,5f07aa8c39579433171763c0,p1489,Covid19,Shomuradova,Covid19 - mild,27,M,Caucasian
66,6047f702136a6d924982945c,p1434,Covid19,Shomuradova,Covid19 - mild,28,M,Caucasian
69,6047f703136a6d924982945f,p1448,Covid19,Shomuradova,Covid19 - moderate/severe,37,M,Caucasian
72,6047f704136a6d9249829462,p1449,Covid19,Shomuradova,Covid19 - mild,34,F,Caucasian
75,6047f704136a6d9249829465,p1465,Covid19,Shomuradova,Covid19 - moderate/severe,19,M,Caucasian
81,6047f706136a6d924982946b,p1480,Covid19,Shomuradova,Covid19 - moderate/severe,29,M,Caucasian
84,6047f707136a6d924982946e,p1481,Covid19,Shomuradova,Covid19 - moderate/severe,30,F,Caucasian


In [27]:
metadata_export.to_csv(
    config.paths.metadata_dir
    / "generated.external_cohorts.covid_tcr_shomuradova.participant_metadata.tsv",
    sep="\t",
    index=None,
)