In [26]:
import pandas as pd
from sklearn.preprocessing import StandardScaler


#Step 1: Data Generation and Preprocessing
##1.1 Data Generation
###Obtain Gene Expression Data:

- Source gene expression data from TCGA (The Cancer Genome Atlas) and PCAWG (Pan-Cancer Analysis of Whole Genomes).
- Normalize the data and screen out samples that have undergone prior therapy to ensure consistency and reliability.

### Generate TIDE Results:

- Use the TIDE (Tumor Immune Dysfunction and Exclusion) method to generate prediction scores.
- TIDE provides CTL (Cytotoxic T Lymphocyte) level, Dysfunction score, and Exclusion score for each sample.

##1.2 Data Preprocessing
###Align TIDE Results with Gene Expression Data:

- Align the TIDE results with miRNA expression data based on sample IDs to ensure each sample has corresponding miRNA expression and TIDE scores.

### Split Data:

- Divide the data into two groups: training/testing data and external validation data.
- Training/Testing Data: TCGA, 19 tumor types.
- External Validation Data: TCGA, 12 tumor types from PCAWG.


За проектот ќе го користиме cohort: GDC TCGA Breast Cancer (BRCA) податочното множество

##Читање dataframe на генска експресија

- Ensembl_ID: Unique identifier for each gene, provided by Ensembl.
- Sample IDs: Each subsequent column represents a different sample, identified by its TCGA (The Cancer Genome Atlas) code. For example, TCGA-E9-A1NI-01A, TCGA-A1-A0SP-01A, etc.
- Gene Entries: Each row represents a different gene, identified by its Ensembl ID. The expression levels for that gene are listed for each sample across the columns.

In [27]:
expression_data = pd.read_csv("/content/TCGA-BRCA.htseq_counts.tsv",sep="\t",on_bad_lines='skip')
expression_data

Unnamed: 0,Ensembl_ID,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
0,ENSG00000000003.13,8.787903,12.064743,11.801304,10.723661,11.040290,10.771489,11.139551,10.337622,12.717462,...,12.378566,10.688250,11.690435,13.150699,10.623881,10.429407,11.678160,11.845098,11.272630,10.865733
1,ENSG00000000005.5,0.000000,2.807355,4.954196,6.658211,6.357552,2.807355,5.672425,2.807355,2.807355,...,7.011227,0.000000,3.584963,6.129283,3.906891,5.209453,11.076816,6.303781,1.584963,5.954196
2,ENSG00000000419.11,11.054604,11.292897,11.314017,11.214926,10.375039,10.496854,10.839991,11.372321,11.139551,...,10.949827,11.018200,12.171177,13.513604,10.878817,10.264443,10.339850,10.768184,10.447083,11.433064
3,ENSG00000000457.12,10.246741,9.905387,11.117643,12.093748,10.696098,11.532843,9.992938,11.583083,12.091435,...,10.754888,11.181152,11.136991,10.614710,11.276706,10.369597,10.203348,11.501837,11.363040,10.713387
4,ENSG00000000460.15,8.965784,10.053926,9.957102,9.503826,8.546894,8.797662,8.727920,9.754888,9.016808,...,8.791163,9.548822,11.692616,10.384784,10.432542,9.052568,8.118941,9.609179,9.136991,9.927778
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4183,ENSG00000113441.14,11.874597,11.383164,12.697185,13.497602,12.314583,12.244364,11.115694,13.274524,10.585901,...,12.709299,12.845882,11.523562,10.938844,12.951649,11.381543,12.614480,13.509899,12.183015,12.131535
4184,ENSG00000113448.15,8.276124,7.900867,9.700440,8.971544,9.556506,6.614710,9.154818,10.512740,9.658211,...,9.509775,9.469642,8.689998,8.071462,9.076816,7.539159,9.348728,10.962173,8.758223,8.562242
4185,ENSG00000113456.17,10.359750,11.898223,11.233020,11.601307,10.412570,10.963619,10.524542,11.066089,9.098032,...,10.973697,11.300353,11.775199,11.864960,10.316282,9.987264,9.943980,11.547859,10.792790,11.893681
4186,ENSG00000113460.11,10.809768,12.677940,11.410981,12.012275,9.997179,10.952741,10.638436,11.028597,10.462502,...,10.874213,10.659996,12.179909,13.065079,10.228819,10.453271,10.087463,10.951285,10.825754,12.274378


In [28]:
expression_data.head()

Unnamed: 0,Ensembl_ID,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
0,ENSG00000000003.13,8.787903,12.064743,11.801304,10.723661,11.04029,10.771489,11.139551,10.337622,12.717462,...,12.378566,10.68825,11.690435,13.150699,10.623881,10.429407,11.67816,11.845098,11.27263,10.865733
1,ENSG00000000005.5,0.0,2.807355,4.954196,6.658211,6.357552,2.807355,5.672425,2.807355,2.807355,...,7.011227,0.0,3.584963,6.129283,3.906891,5.209453,11.076816,6.303781,1.584963,5.954196
2,ENSG00000000419.11,11.054604,11.292897,11.314017,11.214926,10.375039,10.496854,10.839991,11.372321,11.139551,...,10.949827,11.0182,12.171177,13.513604,10.878817,10.264443,10.33985,10.768184,10.447083,11.433064
3,ENSG00000000457.12,10.246741,9.905387,11.117643,12.093748,10.696098,11.532843,9.992938,11.583083,12.091435,...,10.754888,11.181152,11.136991,10.61471,11.276706,10.369597,10.203348,11.501837,11.36304,10.713387
4,ENSG00000000460.15,8.965784,10.053926,9.957102,9.503826,8.546894,8.797662,8.72792,9.754888,9.016808,...,8.791163,9.548822,11.692616,10.384784,10.432542,9.052568,8.118941,9.609179,9.136991,9.927778


In [29]:
expression_data.shape

(4188, 1218)

In [30]:
expression_data.isnull().sum()

Ensembl_ID          0
TCGA-E9-A1NI-01A    0
TCGA-A1-A0SP-01A    0
TCGA-BH-A201-01A    0
TCGA-E2-A14T-01A    0
                   ..
TCGA-EW-A1P3-01A    1
TCGA-A7-A13F-11A    1
TCGA-A2-A0T6-01A    1
TCGA-B6-A0RN-01A    1
TCGA-BH-A203-01A    1
Length: 1218, dtype: int64

In [31]:
(expression_data.dtypes)


Ensembl_ID           object
TCGA-E9-A1NI-01A    float64
TCGA-A1-A0SP-01A    float64
TCGA-BH-A201-01A    float64
TCGA-E2-A14T-01A    float64
                     ...   
TCGA-EW-A1P3-01A    float64
TCGA-A7-A13F-11A    float64
TCGA-A2-A0T6-01A    float64
TCGA-B6-A0RN-01A    float64
TCGA-BH-A203-01A    float64
Length: 1218, dtype: object

In [32]:
expression_data = expression_data.dropna()

In [33]:
expression_data.isnull().sum()

Ensembl_ID          0
TCGA-E9-A1NI-01A    0
TCGA-A1-A0SP-01A    0
TCGA-BH-A201-01A    0
TCGA-E2-A14T-01A    0
                   ..
TCGA-EW-A1P3-01A    0
TCGA-A7-A13F-11A    0
TCGA-A2-A0T6-01A    0
TCGA-B6-A0RN-01A    0
TCGA-BH-A203-01A    0
Length: 1218, dtype: int64

In [34]:
expression_data.shape

(4187, 1218)

In [35]:
expression_data

Unnamed: 0,Ensembl_ID,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
0,ENSG00000000003.13,8.787903,12.064743,11.801304,10.723661,11.040290,10.771489,11.139551,10.337622,12.717462,...,12.378566,10.688250,11.690435,13.150699,10.623881,10.429407,11.678160,11.845098,11.272630,10.865733
1,ENSG00000000005.5,0.000000,2.807355,4.954196,6.658211,6.357552,2.807355,5.672425,2.807355,2.807355,...,7.011227,0.000000,3.584963,6.129283,3.906891,5.209453,11.076816,6.303781,1.584963,5.954196
2,ENSG00000000419.11,11.054604,11.292897,11.314017,11.214926,10.375039,10.496854,10.839991,11.372321,11.139551,...,10.949827,11.018200,12.171177,13.513604,10.878817,10.264443,10.339850,10.768184,10.447083,11.433064
3,ENSG00000000457.12,10.246741,9.905387,11.117643,12.093748,10.696098,11.532843,9.992938,11.583083,12.091435,...,10.754888,11.181152,11.136991,10.614710,11.276706,10.369597,10.203348,11.501837,11.363040,10.713387
4,ENSG00000000460.15,8.965784,10.053926,9.957102,9.503826,8.546894,8.797662,8.727920,9.754888,9.016808,...,8.791163,9.548822,11.692616,10.384784,10.432542,9.052568,8.118941,9.609179,9.136991,9.927778
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4182,ENSG00000113430.8,2.584963,6.845490,7.651052,5.000000,7.169925,1.584963,6.942515,3.000000,1.000000,...,8.882643,3.906891,11.245553,6.906891,2.321928,6.000000,7.643856,3.807355,5.357552,9.884171
4183,ENSG00000113441.14,11.874597,11.383164,12.697185,13.497602,12.314583,12.244364,11.115694,13.274524,10.585901,...,12.709299,12.845882,11.523562,10.938844,12.951649,11.381543,12.614480,13.509899,12.183015,12.131535
4184,ENSG00000113448.15,8.276124,7.900867,9.700440,8.971544,9.556506,6.614710,9.154818,10.512740,9.658211,...,9.509775,9.469642,8.689998,8.071462,9.076816,7.539159,9.348728,10.962173,8.758223,8.562242
4185,ENSG00000113456.17,10.359750,11.898223,11.233020,11.601307,10.412570,10.963619,10.524542,11.066089,9.098032,...,10.973697,11.300353,11.775199,11.864960,10.316282,9.987264,9.943980,11.547859,10.792790,11.893681


In [36]:
expression_data.describe()


Unnamed: 0,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,TCGA-E9-A1RB-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
count,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,...,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0,4187.0
mean,9.08305,9.692134,9.893116,9.971461,9.497832,9.138921,9.279112,9.699673,8.85347,9.93567,...,9.751829,9.479251,9.556586,9.640426,9.038742,9.090279,9.42558,9.853628,9.164182,9.650759
std,3.550153,3.539231,3.517139,3.647938,3.351477,3.7837,3.339217,3.742886,3.755599,3.774636,...,3.403941,3.631519,3.650342,3.638427,3.793627,3.405331,3.450078,3.575071,3.696664,3.577989
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,7.70044,8.409391,8.638435,8.53527,8.290011,7.467606,8.149738,8.283086,7.076816,8.442943,...,8.523557,8.044394,7.997177,8.169925,7.475733,7.829719,8.184867,8.552668,7.50382,8.238405
50%,10.087463,10.681238,10.866506,10.956376,10.41996,10.286558,10.202124,10.814582,9.865733,11.051209,...,10.719389,10.481799,10.638436,10.675957,10.142107,10.04576,10.384784,10.867279,10.291171,10.62022
75%,11.451469,12.015937,12.224755,12.412967,11.682995,11.760713,11.396605,12.227164,11.473706,12.445791,...,11.991522,11.875749,12.010004,12.061202,11.629357,11.337901,11.655754,12.2172,11.650151,12.019417
max,18.723908,19.418256,19.733926,18.45204,19.746504,16.956592,18.308272,19.578509,19.324963,18.784047,...,18.24026,18.612095,18.399862,19.037928,17.786397,18.110636,18.415065,20.146744,20.017795,20.074719


Скалирање на податоците

In [37]:
ensembl_ids = expression_data[['Ensembl_ID']]
data_to_scale = expression_data.drop(columns=['Ensembl_ID'])

scaler = StandardScaler()
scaled_data = scaler.fit_transform(data_to_scale)

scaled_data_df = pd.DataFrame(scaled_data, columns=data_to_scale.columns, index=data_to_scale.index)
scaled_data_df = pd.concat([ensembl_ids, scaled_data_df], axis=1)

In [39]:
scaled_data_df

Unnamed: 0,Ensembl_ID,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
0,ENSG00000000003.13,-0.083146,0.670454,0.542605,0.206223,0.460287,0.431526,0.557215,0.170463,1.028985,...,0.771767,0.332958,0.584631,0.964893,0.417893,0.393292,0.652985,0.557110,0.570433,0.339610
1,ENSG00000000005.5,-2.558801,-1.945508,-1.404411,-0.908361,-0.937096,-1.673579,-1.080229,-1.841665,-1.610086,...,-0.805222,-2.610584,-1.636104,-0.965132,-1.352918,-1.139769,0.478665,-0.993063,-2.050531,-1.033263
2,ENSG00000000419.11,0.555410,0.452345,0.404042,0.340909,0.261769,0.358933,0.467494,0.446941,0.608786,...,0.351986,0.423826,0.716345,1.064647,0.485102,0.344843,0.265031,0.255845,0.347084,0.498190
3,ENSG00000000457.12,0.327825,0.060261,0.348202,0.581847,0.357576,0.632769,0.213796,0.503257,0.862273,...,0.294711,0.468703,0.432999,0.267808,0.589998,0.375726,0.225462,0.461084,0.594893,0.297026
4,ENSG00000000460.15,-0.033035,0.102235,0.018195,-0.128207,-0.283771,-0.090203,-0.165086,0.014754,0.043497,...,-0.282255,0.019160,0.585229,0.204607,0.367449,-0.011075,-0.378773,-0.068384,-0.007356,0.077432
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4182,ENSG00000113430.8,-1.830587,-0.804408,-0.637544,-1.362977,-0.694674,-1.996686,-0.699828,-1.790189,-2.091386,...,-0.255377,-1.534627,0.462743,-0.751386,-1.770764,-0.907591,-0.516492,-1.691434,-1.029870,0.065243
4183,ENSG00000113441.14,0.786412,0.477853,0.797354,0.966728,0.840551,0.820841,0.550069,0.955219,0.461348,...,0.868941,0.927170,0.538911,0.356905,1.031565,0.672927,0.924408,1.022835,0.816734,0.693427
4184,ENSG00000113448.15,-0.227320,-0.506178,-0.054789,-0.274137,0.017509,-0.667207,-0.037227,0.217256,0.214304,...,-0.071118,-0.002646,-0.237428,-0.431272,0.010037,-0.455552,-0.022278,0.310113,-0.109831,-0.304262
4185,ENSG00000113456.17,0.359661,0.623399,0.381010,0.446839,0.272968,0.482310,0.373015,0.365114,0.065127,...,0.359000,0.501531,0.607855,0.611473,0.336800,0.263438,0.150275,0.473958,0.440614,0.626942


## Generate TIDE data

In [40]:
!pip install git+https://github.com/jingxinfu/TIDEpy.git

Collecting git+https://github.com/jingxinfu/TIDEpy.git
  Cloning https://github.com/jingxinfu/TIDEpy.git to /tmp/pip-req-build-jyrjxu4o
  Running command git clone --filter=blob:none --quiet https://github.com/jingxinfu/TIDEpy.git /tmp/pip-req-build-jyrjxu4o
  Resolved https://github.com/jingxinfu/TIDEpy.git to commit 1132179a401522d9ba4d18e5e06fb1f49fb03612
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: tidepy
  Building wheel for tidepy (setup.py) ... [?25l[?25hdone
  Created wheel for tidepy: filename=tidepy-1.3.8-py3-none-any.whl size=1880266 sha256=e7236e93c15814008527e0fc4627c8237559337f2e550e9eab3c82b81b11a3a6
  Stored in directory: /tmp/pip-ephem-wheel-cache-86do9jao/wheels/1f/b3/2b/1345f358f26f833fa6c81ccbf048ea034d90f6cc4e090c259a
Successfully built tidepy
Installing collected packages: tidepy
Successfully installed tidepy-1.3.8


In [41]:
from tidepy.pred import TIDE


In [56]:
df = scaled_data_df.apply(pd.to_numeric, errors='coerce')


In [58]:
df=df.drop(columns=['Ensembl_ID'], axis=1)

In [47]:
expression_data.head()

Unnamed: 0,Ensembl_ID,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
0,ENSG00000000003.13,8.787903,12.064743,11.801304,10.723661,11.04029,10.771489,11.139551,10.337622,12.717462,...,12.378566,10.68825,11.690435,13.150699,10.623881,10.429407,11.67816,11.845098,11.27263,10.865733
1,ENSG00000000005.5,0.0,2.807355,4.954196,6.658211,6.357552,2.807355,5.672425,2.807355,2.807355,...,7.011227,0.0,3.584963,6.129283,3.906891,5.209453,11.076816,6.303781,1.584963,5.954196
2,ENSG00000000419.11,11.054604,11.292897,11.314017,11.214926,10.375039,10.496854,10.839991,11.372321,11.139551,...,10.949827,11.0182,12.171177,13.513604,10.878817,10.264443,10.33985,10.768184,10.447083,11.433064
3,ENSG00000000457.12,10.246741,9.905387,11.117643,12.093748,10.696098,11.532843,9.992938,11.583083,12.091435,...,10.754888,11.181152,11.136991,10.61471,11.276706,10.369597,10.203348,11.501837,11.36304,10.713387
4,ENSG00000000460.15,8.965784,10.053926,9.957102,9.503826,8.546894,8.797662,8.72792,9.754888,9.016808,...,8.791163,9.548822,11.692616,10.384784,10.432542,9.052568,8.118941,9.609179,9.136991,9.927778


In [59]:
df.head()

Unnamed: 0,TCGA-E9-A1NI-01A,TCGA-A1-A0SP-01A,TCGA-BH-A201-01A,TCGA-E2-A14T-01A,TCGA-AC-A8OS-01A,TCGA-A8-A09K-01A,TCGA-OL-A5RY-01A,TCGA-E9-A24A-01A,TCGA-E2-A1LS-01A,TCGA-E9-A1RB-01A,...,TCGA-BH-A0DT-11A,TCGA-BH-A1EV-01A,TCGA-AR-A1AY-01A,TCGA-B6-A409-01A,TCGA-A8-A09W-01A,TCGA-EW-A1P3-01A,TCGA-A7-A13F-11A,TCGA-A2-A0T6-01A,TCGA-B6-A0RN-01A,TCGA-BH-A203-01A
0,-0.083146,0.670454,0.542605,0.206223,0.460287,0.431526,0.557215,0.170463,1.028985,0.338144,...,0.771767,0.332958,0.584631,0.964893,0.417893,0.393292,0.652985,0.55711,0.570433,0.33961
1,-2.558801,-1.945508,-1.404411,-0.908361,-0.937096,-1.673579,-1.080229,-1.841665,-1.610086,-2.017321,...,-0.805222,-2.610584,-1.636104,-0.965132,-1.352918,-1.139769,0.478665,-0.993063,-2.050531,-1.033263
2,0.55541,0.452345,0.404042,0.340909,0.261769,0.358933,0.467494,0.446941,0.608786,0.579162,...,0.351986,0.423826,0.716345,1.064647,0.485102,0.344843,0.265031,0.255845,0.347084,0.49819
3,0.327825,0.060261,0.348202,0.581847,0.357576,0.632769,0.213796,0.503257,0.862273,0.382378,...,0.294711,0.468703,0.432999,0.267808,0.589998,0.375726,0.225462,0.461084,0.594893,0.297026
4,-0.033035,0.102235,0.018195,-0.128207,-0.283771,-0.090203,-0.165086,0.014754,0.043497,0.34785,...,-0.282255,0.01916,0.585229,0.204607,0.367449,-0.011075,-0.378773,-0.068384,-0.007356,0.077432


In [63]:
tide = TIDE(df, cancer="Other", pretreat=False, vthres=0.)


[WARN] Missing Gene:29126for signature CD274
[WARN] Missing Gene:6772,4283for signature IFNG
[WARN] 86.1% MSI signature genes are missing on input expression profile.


Materials and methods
Data collection and preprocessing for gene expression profiles
To derive predictive immune response outcomes using the TIDE algorithm, bulk RNA-seq data were acquired
from The Cancer Genome Atlas (TCGA) UCSC Xena browser (GDC repository) (https://g​ dc.x​ enahu
​ bs.n
​ et)49. A
total of 21 tumor types in TCGA cohorts were gathered, and the same tumor types are available on the TIDE web
browser (http://​tide.​dfci.​harva​rd.​edu/)8 (Table 1). Subsequently, gene expression data were used to predict the
tumor immune response using the TIDE web browser. Gene expression values for all samples were normalized
by subtracting the average log2 (FPKM + 1) value from each gene expression v­ alue8. Concurrently, ensemble ID
for each gene was converted into gene symbol using the R package "org.Hs.eg.db" (version 3.16.0). Genes with
duplicate symbols were replaced by calculating the average expression values.
Furthermore, a stringent filtering process was applied to the experiments, ensuring the inclusion of only
TCGA samples lacking any prior treatment history, as it was not definitively confirmed whether immunotherapy
had been administered to the samples. In addition, the study was limited to solid tumors; therefore, cases with
acute myeloid leukemia were excluded. Finally, 8,037 samples harboring 35,096 genes across 20 tumor types
were included.
ICB response prediction based on TIDE
ICB response prediction was performed based on the TIDE method using TCGA gene expression data (Fig. 1A).
These outcomes were conveyed through CTL level, dysfunction, exclusion values, and TIDE score. The CTL level
was represented as either “True” or “False”, indicating high or low CTL levels, respectively. The TIDE scores were
influenced by dysfunction and exclusion values. Specifically, when the CTL level was “True”, the dysfunction
score was adopted as the TIDE score; conversely, if the CTL was “False”, the exclusion score was taken as the
TIDE score. A sample with a positive TIDE score indicated that it was a non-responder, whereas a sample with
a negative TIDE score was a responder.
MicroRNA expression data
miRNA expression quantification (stem loop) data were also downloaded from the TCGA UCSC Xena browser
(GDC repository) (https://​gdc.​xenah​ubs.​net)49. This dataset comprised 1,881 miRNA expression values (log2
(RPM + 1)) per sample, encompassing 20 tumor types, identical to the TCGA gene expression data (Table 1). Nor-
mal samples were excluded from miRNA expression data. The GBM tumor type was excluded because the GBM
included only five normal samples. In total, the dataset comprised 7721 samples from 19 tumor types (Fig. 1B).
For independent validation purposes, a validation dataset with 12 tumor types distinct from the 21 types avail-
able in the TIDE browser was used (Table S10). The validation dataset encompassed 1,947 samples (Fig. 1B,C).