# Generate Protein Embeddings

As an example, here's how to proccess the frog reference proteome with ESM1b.

You can find proteome fastas on ENSEMBL: https://uswest.ensembl.org/info/data/ftp/index.html

We use a pretrained Transformer model from https://github.com/facebookresearch/esm. These models were trained on hundreds of millions of protein sequences from across the tree of life.

**NOTE:** These protein embedding scripts require an older version of the ESM Repo: you should checkout commit:
[`839c5b82c6cd9e18baa7a88dcbed3bd4b6d48e47`](https://github.com/facebookresearch/esm/commit/839c5b82c6cd9e18baa7a88dcbed3bd4b6d48e47)

**Clone the ESM repo.**

## Step 1: Download reference proteome

In [2]:
!mkdir data

In [34]:
import os
NAME = "Xenopus_tropicalis.UCB_Xtro_10.0.pep.all" # CHANGE THIS TO THE NAME OF THE REFERENCE PROTEOME YOU WANT
DATA_PATH = os.path.abspath(os.getcwd()) + "/data" # PATH TO DATA DIRECTORY (YOU CAN USE THE ONE IN THIS DIRECTORY)
ESM_PATH = "/lfs/local/0/yanay/esm/" # MAKE SURE TO CHANGE THIS TO THE PATH YOU CLONED THE ESM REPO TO ESM PATH
TORCH_HOME = "/dfs/project/cross-species/yanay/torch_home" # MAKE SURE TO CHANGE THIS TO YOUR DESIRED DIRECTORY
DEVICE=6 # GPU NUMBER, CHANGE THIS

# PATH TO ENSMBL PROTEOME FASTA, CHANGE THIS
FASTA_URL = "https://ftp.ensembl.org/pub/release-108/fasta/xenopus_tropicalis/pep/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.fa.gz" 

In [39]:
!wget -r {FASTA_URL} \
        -O data/{NAME}.fa.gz

will be placed in the single file you specified.

--2022-11-14 15:30:40--  https://ftp.ensembl.org/pub/release-108/fasta/xenopus_tropicalis/pep/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.fa.gz
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.139
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11316909 (11M) [application/x-gzip]
Saving to: ‘data/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.fa.gz’


2022-11-14 15:31:06 (429 KB/s) - ‘data/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.fa.gz’ saved [11316909/11316909]

FINISHED --2022-11-14 15:31:06--
Total wall clock time: 26s
Downloaded: 1 files, 11M in 26s (429 KB/s)


In [40]:
!gunzip data/{NAME}.fa.gz

gzip: data/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.fa already exists; do you wish to overwrite (y or n)? ^C


## Step 2: Clean Fasta

In [45]:
!python clean_fasta.py \
--data_path=./data/{NAME}.fa \
--save_path=./data/{NAME}.clean.fa

Number of original sequences = 49,792
100%|█████████████████████████████████| 49792/49792 [00:00<00:00, 146933.93it/s]
Number of cleaned sequences = 49,792


## Step 3: Run ESM

In [47]:
# THE MODELS ARE VERY LARGE AND TAKE A WHILE TO RUN

!export TORCH_HOME={TORCH_HOME}; cd {ESM_PATH}/scripts/; \
CUDA_VISIBLE_DEVICES={DEVICE} python extract.py esm1b_t33_650M_UR50S \
{DATA_PATH}/{NAME}.clean.fa \
{DATA_PATH}/{NAME}.clean.fa_esm1b \
--include mean  --truncate

Transferred model to GPU
Read /dfs/project/cross-species/yanay/code/SPEAR/protein_embeddings/data/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.clean.fa with 49792 sequences
Processing 1 of 8762 batches (73 sequences)
Processing 2 of 8762 batches (61 sequences)
Processing 3 of 8762 batches (57 sequences)
Processing 4 of 8762 batches (53 sequences)
Processing 5 of 8762 batches (51 sequences)
Processing 6 of 8762 batches (50 sequences)
Processing 7 of 8762 batches (48 sequences)
Processing 8 of 8762 batches (47 sequences)
Processing 9 of 8762 batches (46 sequences)
Processing 10 of 8762 batches (45 sequences)
Processing 11 of 8762 batches (44 sequences)
Processing 12 of 8762 batches (43 sequences)
Processing 13 of 8762 batches (42 sequences)
Processing 14 of 8762 batches (41 sequences)
Processing 15 of 8762 batches (41 sequences)
Processing 16 of 8762 batches (40 sequences)
Processing 17 of 8762 batches (40 sequences)
Processing 18 of 8762 batches (39 sequences)
Processing 19 of 8762 batches 

Processing 178 of 8762 batches (21 sequences)
Processing 179 of 8762 batches (21 sequences)
Processing 180 of 8762 batches (21 sequences)
Processing 181 of 8762 batches (21 sequences)
Processing 182 of 8762 batches (21 sequences)
Processing 183 of 8762 batches (21 sequences)
Processing 184 of 8762 batches (21 sequences)
Processing 185 of 8762 batches (21 sequences)
Processing 186 of 8762 batches (21 sequences)
Processing 187 of 8762 batches (21 sequences)
Processing 188 of 8762 batches (21 sequences)
Processing 189 of 8762 batches (21 sequences)
Processing 190 of 8762 batches (20 sequences)
Processing 191 of 8762 batches (20 sequences)
Processing 192 of 8762 batches (20 sequences)
Processing 193 of 8762 batches (20 sequences)
Processing 194 of 8762 batches (20 sequences)
Processing 195 of 8762 batches (20 sequences)
Processing 196 of 8762 batches (20 sequences)
Processing 197 of 8762 batches (20 sequences)
Processing 198 of 8762 batches (20 sequences)
Processing 199 of 8762 batches (20

Processing 357 of 8762 batches (16 sequences)
Processing 358 of 8762 batches (16 sequences)
Processing 359 of 8762 batches (16 sequences)
Processing 360 of 8762 batches (16 sequences)
Processing 361 of 8762 batches (16 sequences)
Processing 362 of 8762 batches (16 sequences)
Processing 363 of 8762 batches (16 sequences)
Processing 364 of 8762 batches (16 sequences)
Processing 365 of 8762 batches (16 sequences)
Processing 366 of 8762 batches (16 sequences)
Processing 367 of 8762 batches (16 sequences)
Processing 368 of 8762 batches (16 sequences)
Processing 369 of 8762 batches (16 sequences)
Processing 370 of 8762 batches (16 sequences)
Processing 371 of 8762 batches (16 sequences)
Processing 372 of 8762 batches (16 sequences)
Processing 373 of 8762 batches (16 sequences)
Processing 374 of 8762 batches (16 sequences)
Processing 375 of 8762 batches (16 sequences)
Processing 376 of 8762 batches (16 sequences)
Processing 377 of 8762 batches (16 sequences)
Processing 378 of 8762 batches (16

Processing 536 of 8762 batches (14 sequences)
Processing 537 of 8762 batches (14 sequences)
Processing 538 of 8762 batches (14 sequences)
Processing 539 of 8762 batches (14 sequences)
Processing 540 of 8762 batches (14 sequences)
Processing 541 of 8762 batches (14 sequences)
Processing 542 of 8762 batches (14 sequences)
Processing 543 of 8762 batches (14 sequences)
Processing 544 of 8762 batches (14 sequences)
Processing 545 of 8762 batches (14 sequences)
Processing 546 of 8762 batches (14 sequences)
Processing 547 of 8762 batches (14 sequences)
Processing 548 of 8762 batches (14 sequences)
Processing 549 of 8762 batches (14 sequences)
Processing 550 of 8762 batches (14 sequences)
Processing 551 of 8762 batches (14 sequences)
Processing 552 of 8762 batches (14 sequences)
Processing 553 of 8762 batches (14 sequences)
Processing 554 of 8762 batches (14 sequences)
Processing 555 of 8762 batches (14 sequences)
Processing 556 of 8762 batches (14 sequences)
Processing 557 of 8762 batches (14

Processing 715 of 8762 batches (12 sequences)
Processing 716 of 8762 batches (12 sequences)
Processing 717 of 8762 batches (12 sequences)
Processing 718 of 8762 batches (12 sequences)
Processing 719 of 8762 batches (12 sequences)
Processing 720 of 8762 batches (12 sequences)
Processing 721 of 8762 batches (12 sequences)
Processing 722 of 8762 batches (12 sequences)
Processing 723 of 8762 batches (12 sequences)
Processing 724 of 8762 batches (12 sequences)
Processing 725 of 8762 batches (12 sequences)
Processing 726 of 8762 batches (12 sequences)
Processing 727 of 8762 batches (12 sequences)
Processing 728 of 8762 batches (12 sequences)
Processing 729 of 8762 batches (12 sequences)
Processing 730 of 8762 batches (12 sequences)
Processing 731 of 8762 batches (12 sequences)
Processing 732 of 8762 batches (12 sequences)
Processing 733 of 8762 batches (12 sequences)
Processing 734 of 8762 batches (12 sequences)
Processing 735 of 8762 batches (12 sequences)
Processing 736 of 8762 batches (12

Processing 894 of 8762 batches (12 sequences)
Processing 895 of 8762 batches (12 sequences)
Processing 896 of 8762 batches (12 sequences)
Processing 897 of 8762 batches (12 sequences)
Processing 898 of 8762 batches (12 sequences)
Processing 899 of 8762 batches (12 sequences)
Processing 900 of 8762 batches (12 sequences)
Processing 901 of 8762 batches (12 sequences)
Processing 902 of 8762 batches (12 sequences)
Processing 903 of 8762 batches (12 sequences)
Processing 904 of 8762 batches (12 sequences)
Processing 905 of 8762 batches (12 sequences)
Processing 906 of 8762 batches (12 sequences)
Processing 907 of 8762 batches (12 sequences)
Processing 908 of 8762 batches (11 sequences)
Processing 909 of 8762 batches (11 sequences)
Processing 910 of 8762 batches (11 sequences)
Processing 911 of 8762 batches (11 sequences)
Processing 912 of 8762 batches (11 sequences)
Processing 913 of 8762 batches (11 sequences)
Processing 914 of 8762 batches (11 sequences)
Processing 915 of 8762 batches (11

Processing 1071 of 8762 batches (11 sequences)
Processing 1072 of 8762 batches (11 sequences)
Processing 1073 of 8762 batches (11 sequences)
Processing 1074 of 8762 batches (11 sequences)
Processing 1075 of 8762 batches (11 sequences)
Processing 1076 of 8762 batches (11 sequences)
Processing 1077 of 8762 batches (11 sequences)
Processing 1078 of 8762 batches (11 sequences)
Processing 1079 of 8762 batches (11 sequences)
Processing 1080 of 8762 batches (11 sequences)
Processing 1081 of 8762 batches (11 sequences)
Processing 1082 of 8762 batches (11 sequences)
Processing 1083 of 8762 batches (11 sequences)
Processing 1084 of 8762 batches (11 sequences)
Processing 1085 of 8762 batches (11 sequences)
Processing 1086 of 8762 batches (11 sequences)
Processing 1087 of 8762 batches (11 sequences)
Processing 1088 of 8762 batches (11 sequences)
Processing 1089 of 8762 batches (11 sequences)
Processing 1090 of 8762 batches (11 sequences)
Processing 1091 of 8762 batches (11 sequences)
Processing 10

Processing 1246 of 8762 batches (10 sequences)
Processing 1247 of 8762 batches (10 sequences)
Processing 1248 of 8762 batches (10 sequences)
Processing 1249 of 8762 batches (10 sequences)
Processing 1250 of 8762 batches (10 sequences)
Processing 1251 of 8762 batches (10 sequences)
Processing 1252 of 8762 batches (10 sequences)
Processing 1253 of 8762 batches (10 sequences)
Processing 1254 of 8762 batches (10 sequences)
Processing 1255 of 8762 batches (10 sequences)
Processing 1256 of 8762 batches (10 sequences)
Processing 1257 of 8762 batches (10 sequences)
Processing 1258 of 8762 batches (10 sequences)
Processing 1259 of 8762 batches (10 sequences)
Processing 1260 of 8762 batches (10 sequences)
Processing 1261 of 8762 batches (10 sequences)
Processing 1262 of 8762 batches (10 sequences)
Processing 1263 of 8762 batches (10 sequences)
Processing 1264 of 8762 batches (10 sequences)
Processing 1265 of 8762 batches (10 sequences)
Processing 1266 of 8762 batches (10 sequences)
Processing 12

Processing 1422 of 8762 batches (9 sequences)
Processing 1423 of 8762 batches (9 sequences)
Processing 1424 of 8762 batches (9 sequences)
Processing 1425 of 8762 batches (9 sequences)
Processing 1426 of 8762 batches (9 sequences)
Processing 1427 of 8762 batches (9 sequences)
Processing 1428 of 8762 batches (9 sequences)
Processing 1429 of 8762 batches (9 sequences)
Processing 1430 of 8762 batches (9 sequences)
Processing 1431 of 8762 batches (9 sequences)
Processing 1432 of 8762 batches (9 sequences)
Processing 1433 of 8762 batches (9 sequences)
Processing 1434 of 8762 batches (9 sequences)
Processing 1435 of 8762 batches (9 sequences)
Processing 1436 of 8762 batches (9 sequences)
Processing 1437 of 8762 batches (9 sequences)
Processing 1438 of 8762 batches (9 sequences)
Processing 1439 of 8762 batches (9 sequences)
Processing 1440 of 8762 batches (9 sequences)
Processing 1441 of 8762 batches (9 sequences)
Processing 1442 of 8762 batches (9 sequences)
Processing 1443 of 8762 batches (9

Processing 1601 of 8762 batches (9 sequences)
Processing 1602 of 8762 batches (9 sequences)
Processing 1603 of 8762 batches (9 sequences)
Processing 1604 of 8762 batches (9 sequences)
Processing 1605 of 8762 batches (9 sequences)
Processing 1606 of 8762 batches (9 sequences)
Processing 1607 of 8762 batches (9 sequences)
Processing 1608 of 8762 batches (9 sequences)
Processing 1609 of 8762 batches (9 sequences)
Processing 1610 of 8762 batches (9 sequences)
Processing 1611 of 8762 batches (9 sequences)
Processing 1612 of 8762 batches (9 sequences)
Processing 1613 of 8762 batches (9 sequences)
Processing 1614 of 8762 batches (9 sequences)
Processing 1615 of 8762 batches (9 sequences)
Processing 1616 of 8762 batches (9 sequences)
Processing 1617 of 8762 batches (9 sequences)
Processing 1618 of 8762 batches (9 sequences)
Processing 1619 of 8762 batches (9 sequences)
Processing 1620 of 8762 batches (9 sequences)
Processing 1621 of 8762 batches (9 sequences)
Processing 1622 of 8762 batches (9

Processing 1780 of 8762 batches (8 sequences)
Processing 1781 of 8762 batches (8 sequences)
Processing 1782 of 8762 batches (8 sequences)
Processing 1783 of 8762 batches (8 sequences)
Processing 1784 of 8762 batches (8 sequences)
Processing 1785 of 8762 batches (8 sequences)
Processing 1786 of 8762 batches (8 sequences)
Processing 1787 of 8762 batches (8 sequences)
Processing 1788 of 8762 batches (8 sequences)
Processing 1789 of 8762 batches (8 sequences)
Processing 1790 of 8762 batches (8 sequences)
Processing 1791 of 8762 batches (8 sequences)
Processing 1792 of 8762 batches (8 sequences)
Processing 1793 of 8762 batches (8 sequences)
Processing 1794 of 8762 batches (8 sequences)
Processing 1795 of 8762 batches (8 sequences)
Processing 1796 of 8762 batches (8 sequences)
Processing 1797 of 8762 batches (8 sequences)
Processing 1798 of 8762 batches (8 sequences)
Processing 1799 of 8762 batches (8 sequences)
Processing 1800 of 8762 batches (8 sequences)
Processing 1801 of 8762 batches (8

Processing 1959 of 8762 batches (8 sequences)
Processing 1960 of 8762 batches (8 sequences)
Processing 1961 of 8762 batches (8 sequences)
Processing 1962 of 8762 batches (8 sequences)
Processing 1963 of 8762 batches (8 sequences)
Processing 1964 of 8762 batches (8 sequences)
Processing 1965 of 8762 batches (8 sequences)
Processing 1966 of 8762 batches (8 sequences)
Processing 1967 of 8762 batches (8 sequences)
Processing 1968 of 8762 batches (8 sequences)
Processing 1969 of 8762 batches (8 sequences)
Processing 1970 of 8762 batches (8 sequences)
Processing 1971 of 8762 batches (8 sequences)
Processing 1972 of 8762 batches (8 sequences)
Processing 1973 of 8762 batches (8 sequences)
Processing 1974 of 8762 batches (8 sequences)
Processing 1975 of 8762 batches (8 sequences)
Processing 1976 of 8762 batches (8 sequences)
Processing 1977 of 8762 batches (8 sequences)
Processing 1978 of 8762 batches (8 sequences)
Processing 1979 of 8762 batches (8 sequences)
Processing 1980 of 8762 batches (8

Processing 2138 of 8762 batches (8 sequences)
Processing 2139 of 8762 batches (8 sequences)
Processing 2140 of 8762 batches (8 sequences)
Processing 2141 of 8762 batches (8 sequences)
Processing 2142 of 8762 batches (8 sequences)
Processing 2143 of 8762 batches (8 sequences)
Processing 2144 of 8762 batches (8 sequences)
Processing 2145 of 8762 batches (8 sequences)
Processing 2146 of 8762 batches (8 sequences)
Processing 2147 of 8762 batches (8 sequences)
Processing 2148 of 8762 batches (8 sequences)
Processing 2149 of 8762 batches (8 sequences)
Processing 2150 of 8762 batches (8 sequences)
Processing 2151 of 8762 batches (8 sequences)
Processing 2152 of 8762 batches (8 sequences)
Processing 2153 of 8762 batches (8 sequences)
Processing 2154 of 8762 batches (8 sequences)
Processing 2155 of 8762 batches (8 sequences)
Processing 2156 of 8762 batches (8 sequences)
Processing 2157 of 8762 batches (8 sequences)
Processing 2158 of 8762 batches (8 sequences)
Processing 2159 of 8762 batches (8

Processing 2317 of 8762 batches (7 sequences)
Processing 2318 of 8762 batches (7 sequences)
Processing 2319 of 8762 batches (7 sequences)
Processing 2320 of 8762 batches (7 sequences)
Processing 2321 of 8762 batches (7 sequences)
Processing 2322 of 8762 batches (7 sequences)
Processing 2323 of 8762 batches (7 sequences)
Processing 2324 of 8762 batches (7 sequences)
Processing 2325 of 8762 batches (7 sequences)
Processing 2326 of 8762 batches (7 sequences)
Processing 2327 of 8762 batches (7 sequences)
Processing 2328 of 8762 batches (7 sequences)
Processing 2329 of 8762 batches (7 sequences)
Processing 2330 of 8762 batches (7 sequences)
Processing 2331 of 8762 batches (7 sequences)
Processing 2332 of 8762 batches (7 sequences)
Processing 2333 of 8762 batches (7 sequences)
Processing 2334 of 8762 batches (7 sequences)
Processing 2335 of 8762 batches (7 sequences)
Processing 2336 of 8762 batches (7 sequences)
Processing 2337 of 8762 batches (7 sequences)
Processing 2338 of 8762 batches (7

Processing 2496 of 8762 batches (7 sequences)
Processing 2497 of 8762 batches (7 sequences)
Processing 2498 of 8762 batches (7 sequences)
Processing 2499 of 8762 batches (7 sequences)
Processing 2500 of 8762 batches (7 sequences)
Processing 2501 of 8762 batches (7 sequences)
Processing 2502 of 8762 batches (7 sequences)
Processing 2503 of 8762 batches (7 sequences)
Processing 2504 of 8762 batches (7 sequences)
Processing 2505 of 8762 batches (7 sequences)
Processing 2506 of 8762 batches (7 sequences)
Processing 2507 of 8762 batches (7 sequences)
Processing 2508 of 8762 batches (7 sequences)
Processing 2509 of 8762 batches (7 sequences)
Processing 2510 of 8762 batches (7 sequences)
Processing 2511 of 8762 batches (7 sequences)
Processing 2512 of 8762 batches (7 sequences)
Processing 2513 of 8762 batches (7 sequences)
Processing 2514 of 8762 batches (7 sequences)
Processing 2515 of 8762 batches (7 sequences)
Processing 2516 of 8762 batches (7 sequences)
Processing 2517 of 8762 batches (7

Processing 2675 of 8762 batches (7 sequences)
Processing 2676 of 8762 batches (7 sequences)
Processing 2677 of 8762 batches (7 sequences)
Processing 2678 of 8762 batches (7 sequences)
Processing 2679 of 8762 batches (7 sequences)
Processing 2680 of 8762 batches (7 sequences)
Processing 2681 of 8762 batches (7 sequences)
Processing 2682 of 8762 batches (7 sequences)
Processing 2683 of 8762 batches (7 sequences)
Processing 2684 of 8762 batches (7 sequences)
Processing 2685 of 8762 batches (7 sequences)
Processing 2686 of 8762 batches (7 sequences)
Processing 2687 of 8762 batches (7 sequences)
Processing 2688 of 8762 batches (7 sequences)
Processing 2689 of 8762 batches (7 sequences)
Processing 2690 of 8762 batches (7 sequences)
Processing 2691 of 8762 batches (7 sequences)
Processing 2692 of 8762 batches (7 sequences)
Processing 2693 of 8762 batches (6 sequences)
Processing 2694 of 8762 batches (6 sequences)
Processing 2695 of 8762 batches (6 sequences)
Processing 2696 of 8762 batches (6

Processing 2854 of 8762 batches (6 sequences)
Processing 2855 of 8762 batches (6 sequences)
Processing 2856 of 8762 batches (6 sequences)
Processing 2857 of 8762 batches (6 sequences)
Processing 2858 of 8762 batches (6 sequences)
Processing 2859 of 8762 batches (6 sequences)
Processing 2860 of 8762 batches (6 sequences)
Processing 2861 of 8762 batches (6 sequences)
Processing 2862 of 8762 batches (6 sequences)
Processing 2863 of 8762 batches (6 sequences)
Processing 2864 of 8762 batches (6 sequences)
Processing 2865 of 8762 batches (6 sequences)
Processing 2866 of 8762 batches (6 sequences)
Processing 2867 of 8762 batches (6 sequences)
Processing 2868 of 8762 batches (6 sequences)
Processing 2869 of 8762 batches (6 sequences)
Processing 2870 of 8762 batches (6 sequences)
Processing 2871 of 8762 batches (6 sequences)
Processing 2872 of 8762 batches (6 sequences)
Processing 2873 of 8762 batches (6 sequences)
Processing 2874 of 8762 batches (6 sequences)
Processing 2875 of 8762 batches (6

Processing 3033 of 8762 batches (6 sequences)
Processing 3034 of 8762 batches (6 sequences)
Processing 3035 of 8762 batches (6 sequences)
Processing 3036 of 8762 batches (6 sequences)
Processing 3037 of 8762 batches (6 sequences)
Processing 3038 of 8762 batches (6 sequences)
Processing 3039 of 8762 batches (6 sequences)
Processing 3040 of 8762 batches (6 sequences)
Processing 3041 of 8762 batches (6 sequences)
Processing 3042 of 8762 batches (6 sequences)
Processing 3043 of 8762 batches (6 sequences)
Processing 3044 of 8762 batches (6 sequences)
Processing 3045 of 8762 batches (6 sequences)
Processing 3046 of 8762 batches (6 sequences)
Processing 3047 of 8762 batches (6 sequences)
Processing 3048 of 8762 batches (6 sequences)
Processing 3049 of 8762 batches (6 sequences)
Processing 3050 of 8762 batches (6 sequences)
Processing 3051 of 8762 batches (6 sequences)
Processing 3052 of 8762 batches (6 sequences)
Processing 3053 of 8762 batches (6 sequences)
Processing 3054 of 8762 batches (6

Processing 3212 of 8762 batches (6 sequences)
Processing 3213 of 8762 batches (6 sequences)
Processing 3214 of 8762 batches (6 sequences)
Processing 3215 of 8762 batches (6 sequences)
Processing 3216 of 8762 batches (6 sequences)
Processing 3217 of 8762 batches (6 sequences)
Processing 3218 of 8762 batches (6 sequences)
Processing 3219 of 8762 batches (6 sequences)
Processing 3220 of 8762 batches (6 sequences)
Processing 3221 of 8762 batches (6 sequences)
Processing 3222 of 8762 batches (6 sequences)
Processing 3223 of 8762 batches (6 sequences)
Processing 3224 of 8762 batches (6 sequences)
Processing 3225 of 8762 batches (6 sequences)
Processing 3226 of 8762 batches (6 sequences)
Processing 3227 of 8762 batches (6 sequences)
Processing 3228 of 8762 batches (6 sequences)
Processing 3229 of 8762 batches (6 sequences)
Processing 3230 of 8762 batches (6 sequences)
Processing 3231 of 8762 batches (6 sequences)
Processing 3232 of 8762 batches (6 sequences)
Processing 3233 of 8762 batches (6

Processing 3391 of 8762 batches (5 sequences)
Processing 3392 of 8762 batches (5 sequences)
Processing 3393 of 8762 batches (5 sequences)
Processing 3394 of 8762 batches (5 sequences)
Processing 3395 of 8762 batches (5 sequences)
Processing 3396 of 8762 batches (5 sequences)
Processing 3397 of 8762 batches (5 sequences)
Processing 3398 of 8762 batches (5 sequences)
Processing 3399 of 8762 batches (5 sequences)
Processing 3400 of 8762 batches (5 sequences)
Processing 3401 of 8762 batches (5 sequences)
Processing 3402 of 8762 batches (5 sequences)
Processing 3403 of 8762 batches (5 sequences)
Processing 3404 of 8762 batches (5 sequences)
Processing 3405 of 8762 batches (5 sequences)
Processing 3406 of 8762 batches (5 sequences)
Processing 3407 of 8762 batches (5 sequences)
Processing 3408 of 8762 batches (5 sequences)
Processing 3409 of 8762 batches (5 sequences)
Processing 3410 of 8762 batches (5 sequences)
Processing 3411 of 8762 batches (5 sequences)
Processing 3412 of 8762 batches (5

Processing 3570 of 8762 batches (5 sequences)
Processing 3571 of 8762 batches (5 sequences)
Processing 3572 of 8762 batches (5 sequences)
Processing 3573 of 8762 batches (5 sequences)
Processing 3574 of 8762 batches (5 sequences)
Processing 3575 of 8762 batches (5 sequences)
Processing 3576 of 8762 batches (5 sequences)
Processing 3577 of 8762 batches (5 sequences)
Processing 3578 of 8762 batches (5 sequences)
Processing 3579 of 8762 batches (5 sequences)
Processing 3580 of 8762 batches (5 sequences)
Processing 3581 of 8762 batches (5 sequences)
Processing 3582 of 8762 batches (5 sequences)
Processing 3583 of 8762 batches (5 sequences)
Processing 3584 of 8762 batches (5 sequences)
Processing 3585 of 8762 batches (5 sequences)
Processing 3586 of 8762 batches (5 sequences)
Processing 3587 of 8762 batches (5 sequences)
Processing 3588 of 8762 batches (5 sequences)
Processing 3589 of 8762 batches (5 sequences)
Processing 3590 of 8762 batches (5 sequences)
Processing 3591 of 8762 batches (5

Processing 3749 of 8762 batches (5 sequences)
Processing 3750 of 8762 batches (5 sequences)
Processing 3751 of 8762 batches (5 sequences)
Processing 3752 of 8762 batches (5 sequences)
Processing 3753 of 8762 batches (5 sequences)
Processing 3754 of 8762 batches (5 sequences)
Processing 3755 of 8762 batches (5 sequences)
Processing 3756 of 8762 batches (5 sequences)
Processing 3757 of 8762 batches (5 sequences)
Processing 3758 of 8762 batches (5 sequences)
Processing 3759 of 8762 batches (5 sequences)
Processing 3760 of 8762 batches (5 sequences)
Processing 3761 of 8762 batches (5 sequences)
Processing 3762 of 8762 batches (5 sequences)
Processing 3763 of 8762 batches (5 sequences)
Processing 3764 of 8762 batches (5 sequences)
Processing 3765 of 8762 batches (5 sequences)
Processing 3766 of 8762 batches (5 sequences)
Processing 3767 of 8762 batches (5 sequences)
Processing 3768 of 8762 batches (5 sequences)
Processing 3769 of 8762 batches (5 sequences)
Processing 3770 of 8762 batches (5

Processing 3928 of 8762 batches (5 sequences)
Processing 3929 of 8762 batches (5 sequences)
Processing 3930 of 8762 batches (5 sequences)
Processing 3931 of 8762 batches (5 sequences)
Processing 3932 of 8762 batches (5 sequences)
Processing 3933 of 8762 batches (5 sequences)
Processing 3934 of 8762 batches (5 sequences)
Processing 3935 of 8762 batches (5 sequences)
Processing 3936 of 8762 batches (5 sequences)
Processing 3937 of 8762 batches (5 sequences)
Processing 3938 of 8762 batches (5 sequences)
Processing 3939 of 8762 batches (5 sequences)
Processing 3940 of 8762 batches (5 sequences)
Processing 3941 of 8762 batches (5 sequences)
Processing 3942 of 8762 batches (5 sequences)
Processing 3943 of 8762 batches (5 sequences)
Processing 3944 of 8762 batches (5 sequences)
Processing 3945 of 8762 batches (5 sequences)
Processing 3946 of 8762 batches (5 sequences)
Processing 3947 of 8762 batches (5 sequences)
Processing 3948 of 8762 batches (5 sequences)
Processing 3949 of 8762 batches (5

Processing 4107 of 8762 batches (4 sequences)
Processing 4108 of 8762 batches (4 sequences)
Processing 4109 of 8762 batches (4 sequences)
Processing 4110 of 8762 batches (4 sequences)
Processing 4111 of 8762 batches (4 sequences)
Processing 4112 of 8762 batches (4 sequences)
Processing 4113 of 8762 batches (4 sequences)
Processing 4114 of 8762 batches (4 sequences)
Processing 4115 of 8762 batches (4 sequences)
Processing 4116 of 8762 batches (4 sequences)
Processing 4117 of 8762 batches (4 sequences)
Processing 4118 of 8762 batches (4 sequences)
Processing 4119 of 8762 batches (4 sequences)
Processing 4120 of 8762 batches (4 sequences)
Processing 4121 of 8762 batches (4 sequences)
Processing 4122 of 8762 batches (4 sequences)
Processing 4123 of 8762 batches (4 sequences)
Processing 4124 of 8762 batches (4 sequences)
Processing 4125 of 8762 batches (4 sequences)
Processing 4126 of 8762 batches (4 sequences)
Processing 4127 of 8762 batches (4 sequences)
Processing 4128 of 8762 batches (4

Processing 4286 of 8762 batches (4 sequences)
Processing 4287 of 8762 batches (4 sequences)
Processing 4288 of 8762 batches (4 sequences)
Processing 4289 of 8762 batches (4 sequences)
Processing 4290 of 8762 batches (4 sequences)
Processing 4291 of 8762 batches (4 sequences)
Processing 4292 of 8762 batches (4 sequences)
Processing 4293 of 8762 batches (4 sequences)
Processing 4294 of 8762 batches (4 sequences)
Processing 4295 of 8762 batches (4 sequences)
Processing 4296 of 8762 batches (4 sequences)
Processing 4297 of 8762 batches (4 sequences)
Processing 4298 of 8762 batches (4 sequences)
Processing 4299 of 8762 batches (4 sequences)
Processing 4300 of 8762 batches (4 sequences)
Processing 4301 of 8762 batches (4 sequences)
Processing 4302 of 8762 batches (4 sequences)
Processing 4303 of 8762 batches (4 sequences)
Processing 4304 of 8762 batches (4 sequences)
Processing 4305 of 8762 batches (4 sequences)
Processing 4306 of 8762 batches (4 sequences)
Processing 4307 of 8762 batches (4

Processing 4465 of 8762 batches (4 sequences)
Processing 4466 of 8762 batches (4 sequences)
Processing 4467 of 8762 batches (4 sequences)
Processing 4468 of 8762 batches (4 sequences)
Processing 4469 of 8762 batches (4 sequences)
Processing 4470 of 8762 batches (4 sequences)
Processing 4471 of 8762 batches (4 sequences)
Processing 4472 of 8762 batches (4 sequences)
Processing 4473 of 8762 batches (4 sequences)
Processing 4474 of 8762 batches (4 sequences)
Processing 4475 of 8762 batches (4 sequences)
Processing 4476 of 8762 batches (4 sequences)
Processing 4477 of 8762 batches (4 sequences)
Processing 4478 of 8762 batches (4 sequences)
Processing 4479 of 8762 batches (4 sequences)
Processing 4480 of 8762 batches (4 sequences)
Processing 4481 of 8762 batches (4 sequences)
Processing 4482 of 8762 batches (4 sequences)
Processing 4483 of 8762 batches (4 sequences)
Processing 4484 of 8762 batches (4 sequences)
Processing 4485 of 8762 batches (4 sequences)
Processing 4486 of 8762 batches (4

Processing 4644 of 8762 batches (4 sequences)
Processing 4645 of 8762 batches (4 sequences)
Processing 4646 of 8762 batches (4 sequences)
Processing 4647 of 8762 batches (4 sequences)
Processing 4648 of 8762 batches (4 sequences)
Processing 4649 of 8762 batches (4 sequences)
Processing 4650 of 8762 batches (4 sequences)
Processing 4651 of 8762 batches (4 sequences)
Processing 4652 of 8762 batches (4 sequences)
Processing 4653 of 8762 batches (4 sequences)
Processing 4654 of 8762 batches (4 sequences)
Processing 4655 of 8762 batches (4 sequences)
Processing 4656 of 8762 batches (4 sequences)
Processing 4657 of 8762 batches (4 sequences)
Processing 4658 of 8762 batches (4 sequences)
Processing 4659 of 8762 batches (4 sequences)
Processing 4660 of 8762 batches (4 sequences)
Processing 4661 of 8762 batches (4 sequences)
Processing 4662 of 8762 batches (4 sequences)
Processing 4663 of 8762 batches (4 sequences)
Processing 4664 of 8762 batches (4 sequences)
Processing 4665 of 8762 batches (4

Processing 4823 of 8762 batches (4 sequences)
Processing 4824 of 8762 batches (4 sequences)
Processing 4825 of 8762 batches (4 sequences)
Processing 4826 of 8762 batches (4 sequences)
Processing 4827 of 8762 batches (4 sequences)
Processing 4828 of 8762 batches (4 sequences)
Processing 4829 of 8762 batches (4 sequences)
Processing 4830 of 8762 batches (4 sequences)
Processing 4831 of 8762 batches (4 sequences)
Processing 4832 of 8762 batches (4 sequences)
Processing 4833 of 8762 batches (4 sequences)
Processing 4834 of 8762 batches (4 sequences)
Processing 4835 of 8762 batches (4 sequences)
Processing 4836 of 8762 batches (4 sequences)
Processing 4837 of 8762 batches (4 sequences)
Processing 4838 of 8762 batches (4 sequences)
Processing 4839 of 8762 batches (4 sequences)
Processing 4840 of 8762 batches (4 sequences)
Processing 4841 of 8762 batches (4 sequences)
Processing 4842 of 8762 batches (4 sequences)
Processing 4843 of 8762 batches (4 sequences)
Processing 4844 of 8762 batches (4

Processing 5002 of 8762 batches (4 sequences)
Processing 5003 of 8762 batches (4 sequences)
Processing 5004 of 8762 batches (4 sequences)
Processing 5005 of 8762 batches (4 sequences)
Processing 5006 of 8762 batches (4 sequences)
Processing 5007 of 8762 batches (4 sequences)
Processing 5008 of 8762 batches (4 sequences)
Processing 5009 of 8762 batches (4 sequences)
Processing 5010 of 8762 batches (4 sequences)
Processing 5011 of 8762 batches (4 sequences)
Processing 5012 of 8762 batches (4 sequences)
Processing 5013 of 8762 batches (4 sequences)
Processing 5014 of 8762 batches (4 sequences)
Processing 5015 of 8762 batches (4 sequences)
Processing 5016 of 8762 batches (4 sequences)
Processing 5017 of 8762 batches (4 sequences)
Processing 5018 of 8762 batches (4 sequences)
Processing 5019 of 8762 batches (4 sequences)
Processing 5020 of 8762 batches (4 sequences)
Processing 5021 of 8762 batches (4 sequences)
Processing 5022 of 8762 batches (4 sequences)
Processing 5023 of 8762 batches (4

Processing 5181 of 8762 batches (3 sequences)
Processing 5182 of 8762 batches (3 sequences)
Processing 5183 of 8762 batches (3 sequences)
Processing 5184 of 8762 batches (3 sequences)
Processing 5185 of 8762 batches (3 sequences)
Processing 5186 of 8762 batches (3 sequences)
Processing 5187 of 8762 batches (3 sequences)
Processing 5188 of 8762 batches (3 sequences)
Processing 5189 of 8762 batches (3 sequences)
Processing 5190 of 8762 batches (3 sequences)
Processing 5191 of 8762 batches (3 sequences)
Processing 5192 of 8762 batches (3 sequences)
Processing 5193 of 8762 batches (3 sequences)
Processing 5194 of 8762 batches (3 sequences)
Processing 5195 of 8762 batches (3 sequences)
Processing 5196 of 8762 batches (3 sequences)
Processing 5197 of 8762 batches (3 sequences)
Processing 5198 of 8762 batches (3 sequences)
Processing 5199 of 8762 batches (3 sequences)
Processing 5200 of 8762 batches (3 sequences)
Processing 5201 of 8762 batches (3 sequences)
Processing 5202 of 8762 batches (3

Processing 5360 of 8762 batches (3 sequences)
Processing 5361 of 8762 batches (3 sequences)
Processing 5362 of 8762 batches (3 sequences)
Processing 5363 of 8762 batches (3 sequences)
Processing 5364 of 8762 batches (3 sequences)
Processing 5365 of 8762 batches (3 sequences)
Processing 5366 of 8762 batches (3 sequences)
Processing 5367 of 8762 batches (3 sequences)
Processing 5368 of 8762 batches (3 sequences)
Processing 5369 of 8762 batches (3 sequences)
Processing 5370 of 8762 batches (3 sequences)
Processing 5371 of 8762 batches (3 sequences)
Processing 5372 of 8762 batches (3 sequences)
Processing 5373 of 8762 batches (3 sequences)
Processing 5374 of 8762 batches (3 sequences)
Processing 5375 of 8762 batches (3 sequences)
Processing 5376 of 8762 batches (3 sequences)
Processing 5377 of 8762 batches (3 sequences)
Processing 5378 of 8762 batches (3 sequences)
Processing 5379 of 8762 batches (3 sequences)
Processing 5380 of 8762 batches (3 sequences)
Processing 5381 of 8762 batches (3

Processing 5539 of 8762 batches (3 sequences)
Processing 5540 of 8762 batches (3 sequences)
Processing 5541 of 8762 batches (3 sequences)
Processing 5542 of 8762 batches (3 sequences)
Processing 5543 of 8762 batches (3 sequences)
Processing 5544 of 8762 batches (3 sequences)
Processing 5545 of 8762 batches (3 sequences)
Processing 5546 of 8762 batches (3 sequences)
Processing 5547 of 8762 batches (3 sequences)
Processing 5548 of 8762 batches (3 sequences)
Processing 5549 of 8762 batches (3 sequences)
Processing 5550 of 8762 batches (3 sequences)
Processing 5551 of 8762 batches (3 sequences)
Processing 5552 of 8762 batches (3 sequences)
Processing 5553 of 8762 batches (3 sequences)
Processing 5554 of 8762 batches (3 sequences)
Processing 5555 of 8762 batches (3 sequences)
Processing 5556 of 8762 batches (3 sequences)
Processing 5557 of 8762 batches (3 sequences)
Processing 5558 of 8762 batches (3 sequences)
Processing 5559 of 8762 batches (3 sequences)
Processing 5560 of 8762 batches (3

Processing 5718 of 8762 batches (3 sequences)
Processing 5719 of 8762 batches (3 sequences)
Processing 5720 of 8762 batches (3 sequences)
Processing 5721 of 8762 batches (3 sequences)
Processing 5722 of 8762 batches (3 sequences)
Processing 5723 of 8762 batches (3 sequences)
Processing 5724 of 8762 batches (3 sequences)
Processing 5725 of 8762 batches (3 sequences)
Processing 5726 of 8762 batches (3 sequences)
Processing 5727 of 8762 batches (3 sequences)
Processing 5728 of 8762 batches (3 sequences)
Processing 5729 of 8762 batches (3 sequences)
Processing 5730 of 8762 batches (3 sequences)
Processing 5731 of 8762 batches (3 sequences)
Processing 5732 of 8762 batches (3 sequences)
Processing 5733 of 8762 batches (3 sequences)
Processing 5734 of 8762 batches (3 sequences)
Processing 5735 of 8762 batches (3 sequences)
Processing 5736 of 8762 batches (3 sequences)
Processing 5737 of 8762 batches (3 sequences)
Processing 5738 of 8762 batches (3 sequences)
Processing 5739 of 8762 batches (3

Processing 5897 of 8762 batches (3 sequences)
Processing 5898 of 8762 batches (3 sequences)
Processing 5899 of 8762 batches (3 sequences)
Processing 5900 of 8762 batches (3 sequences)
Processing 5901 of 8762 batches (3 sequences)
Processing 5902 of 8762 batches (3 sequences)
Processing 5903 of 8762 batches (3 sequences)
Processing 5904 of 8762 batches (3 sequences)
Processing 5905 of 8762 batches (3 sequences)
Processing 5906 of 8762 batches (3 sequences)
Processing 5907 of 8762 batches (3 sequences)
Processing 5908 of 8762 batches (3 sequences)
Processing 5909 of 8762 batches (3 sequences)
Processing 5910 of 8762 batches (3 sequences)
Processing 5911 of 8762 batches (3 sequences)
Processing 5912 of 8762 batches (3 sequences)
Processing 5913 of 8762 batches (3 sequences)
Processing 5914 of 8762 batches (3 sequences)
Processing 5915 of 8762 batches (3 sequences)
Processing 5916 of 8762 batches (3 sequences)
Processing 5917 of 8762 batches (3 sequences)
Processing 5918 of 8762 batches (3

Processing 6076 of 8762 batches (3 sequences)
Processing 6077 of 8762 batches (3 sequences)
Processing 6078 of 8762 batches (3 sequences)
Processing 6079 of 8762 batches (3 sequences)
Processing 6080 of 8762 batches (3 sequences)
Processing 6081 of 8762 batches (3 sequences)
Processing 6082 of 8762 batches (3 sequences)
Processing 6083 of 8762 batches (3 sequences)
Processing 6084 of 8762 batches (3 sequences)
Processing 6085 of 8762 batches (3 sequences)
Processing 6086 of 8762 batches (3 sequences)
Processing 6087 of 8762 batches (3 sequences)
Processing 6088 of 8762 batches (3 sequences)
Processing 6089 of 8762 batches (3 sequences)
Processing 6090 of 8762 batches (3 sequences)
Processing 6091 of 8762 batches (3 sequences)
Processing 6092 of 8762 batches (3 sequences)
Processing 6093 of 8762 batches (3 sequences)
Processing 6094 of 8762 batches (3 sequences)
Processing 6095 of 8762 batches (3 sequences)
Processing 6096 of 8762 batches (3 sequences)
Processing 6097 of 8762 batches (3

Processing 6255 of 8762 batches (2 sequences)
Processing 6256 of 8762 batches (2 sequences)
Processing 6257 of 8762 batches (2 sequences)
Processing 6258 of 8762 batches (2 sequences)
Processing 6259 of 8762 batches (2 sequences)
Processing 6260 of 8762 batches (2 sequences)
Processing 6261 of 8762 batches (2 sequences)
Processing 6262 of 8762 batches (2 sequences)
Processing 6263 of 8762 batches (2 sequences)
Processing 6264 of 8762 batches (2 sequences)
Processing 6265 of 8762 batches (2 sequences)
Processing 6266 of 8762 batches (2 sequences)
Processing 6267 of 8762 batches (2 sequences)
Processing 6268 of 8762 batches (2 sequences)
Processing 6269 of 8762 batches (2 sequences)
Processing 6270 of 8762 batches (2 sequences)
Processing 6271 of 8762 batches (2 sequences)
Processing 6272 of 8762 batches (2 sequences)
Processing 6273 of 8762 batches (2 sequences)
Processing 6274 of 8762 batches (2 sequences)
Processing 6275 of 8762 batches (2 sequences)
Processing 6276 of 8762 batches (2

Processing 6434 of 8762 batches (2 sequences)
Processing 6435 of 8762 batches (2 sequences)
Processing 6436 of 8762 batches (2 sequences)
Processing 6437 of 8762 batches (2 sequences)
Processing 6438 of 8762 batches (2 sequences)
Processing 6439 of 8762 batches (2 sequences)
Processing 6440 of 8762 batches (2 sequences)
Processing 6441 of 8762 batches (2 sequences)
Processing 6442 of 8762 batches (2 sequences)
Processing 6443 of 8762 batches (2 sequences)
Processing 6444 of 8762 batches (2 sequences)
Processing 6445 of 8762 batches (2 sequences)
Processing 6446 of 8762 batches (2 sequences)
Processing 6447 of 8762 batches (2 sequences)
Processing 6448 of 8762 batches (2 sequences)
Processing 6449 of 8762 batches (2 sequences)
Processing 6450 of 8762 batches (2 sequences)
Processing 6451 of 8762 batches (2 sequences)
Processing 6452 of 8762 batches (2 sequences)
Processing 6453 of 8762 batches (2 sequences)
Processing 6454 of 8762 batches (2 sequences)
Processing 6455 of 8762 batches (2

Processing 6613 of 8762 batches (2 sequences)
Processing 6614 of 8762 batches (2 sequences)
Processing 6615 of 8762 batches (2 sequences)
Processing 6616 of 8762 batches (2 sequences)
Processing 6617 of 8762 batches (2 sequences)
Processing 6618 of 8762 batches (2 sequences)
Processing 6619 of 8762 batches (2 sequences)
Processing 6620 of 8762 batches (2 sequences)
Processing 6621 of 8762 batches (2 sequences)
Processing 6622 of 8762 batches (2 sequences)
Processing 6623 of 8762 batches (2 sequences)
Processing 6624 of 8762 batches (2 sequences)
Processing 6625 of 8762 batches (2 sequences)
Processing 6626 of 8762 batches (2 sequences)
Processing 6627 of 8762 batches (2 sequences)
Processing 6628 of 8762 batches (2 sequences)
Processing 6629 of 8762 batches (2 sequences)
Processing 6630 of 8762 batches (2 sequences)
Processing 6631 of 8762 batches (2 sequences)
Processing 6632 of 8762 batches (2 sequences)
Processing 6633 of 8762 batches (2 sequences)
Processing 6634 of 8762 batches (2

Processing 6792 of 8762 batches (2 sequences)
Processing 6793 of 8762 batches (2 sequences)
Processing 6794 of 8762 batches (2 sequences)
Processing 6795 of 8762 batches (2 sequences)
Processing 6796 of 8762 batches (2 sequences)
Processing 6797 of 8762 batches (2 sequences)
Processing 6798 of 8762 batches (2 sequences)
Processing 6799 of 8762 batches (2 sequences)
Processing 6800 of 8762 batches (2 sequences)
Processing 6801 of 8762 batches (2 sequences)
Processing 6802 of 8762 batches (2 sequences)
Processing 6803 of 8762 batches (2 sequences)
Processing 6804 of 8762 batches (2 sequences)
Processing 6805 of 8762 batches (2 sequences)
Processing 6806 of 8762 batches (2 sequences)
Processing 6807 of 8762 batches (2 sequences)
Processing 6808 of 8762 batches (2 sequences)
Processing 6809 of 8762 batches (2 sequences)
Processing 6810 of 8762 batches (2 sequences)
Processing 6811 of 8762 batches (2 sequences)
Processing 6812 of 8762 batches (2 sequences)
Processing 6813 of 8762 batches (2

Processing 6971 of 8762 batches (2 sequences)
Processing 6972 of 8762 batches (2 sequences)
Processing 6973 of 8762 batches (2 sequences)
Processing 6974 of 8762 batches (2 sequences)
Processing 6975 of 8762 batches (2 sequences)
Processing 6976 of 8762 batches (2 sequences)
Processing 6977 of 8762 batches (2 sequences)
Processing 6978 of 8762 batches (2 sequences)
Processing 6979 of 8762 batches (2 sequences)
Processing 6980 of 8762 batches (2 sequences)
Processing 6981 of 8762 batches (2 sequences)
Processing 6982 of 8762 batches (2 sequences)
Processing 6983 of 8762 batches (2 sequences)
Processing 6984 of 8762 batches (2 sequences)
Processing 6985 of 8762 batches (2 sequences)
Processing 6986 of 8762 batches (2 sequences)
Processing 6987 of 8762 batches (2 sequences)
Processing 6988 of 8762 batches (2 sequences)
Processing 6989 of 8762 batches (2 sequences)
Processing 6990 of 8762 batches (2 sequences)
Processing 6991 of 8762 batches (2 sequences)
Processing 6992 of 8762 batches (2

Processing 7150 of 8762 batches (2 sequences)
Processing 7151 of 8762 batches (2 sequences)
Processing 7152 of 8762 batches (2 sequences)
Processing 7153 of 8762 batches (2 sequences)
Processing 7154 of 8762 batches (2 sequences)
Processing 7155 of 8762 batches (2 sequences)
Processing 7156 of 8762 batches (2 sequences)
Processing 7157 of 8762 batches (2 sequences)
Processing 7158 of 8762 batches (2 sequences)
Processing 7159 of 8762 batches (2 sequences)
Processing 7160 of 8762 batches (2 sequences)
Processing 7161 of 8762 batches (2 sequences)
Processing 7162 of 8762 batches (2 sequences)
Processing 7163 of 8762 batches (2 sequences)
Processing 7164 of 8762 batches (2 sequences)
Processing 7165 of 8762 batches (2 sequences)
Processing 7166 of 8762 batches (2 sequences)
Processing 7167 of 8762 batches (2 sequences)
Processing 7168 of 8762 batches (2 sequences)
Processing 7169 of 8762 batches (2 sequences)
Processing 7170 of 8762 batches (2 sequences)
Processing 7171 of 8762 batches (2

Processing 7329 of 8762 batches (2 sequences)
Processing 7330 of 8762 batches (2 sequences)
Processing 7331 of 8762 batches (2 sequences)
Processing 7332 of 8762 batches (2 sequences)
Processing 7333 of 8762 batches (2 sequences)
Processing 7334 of 8762 batches (2 sequences)
Processing 7335 of 8762 batches (2 sequences)
Processing 7336 of 8762 batches (2 sequences)
Processing 7337 of 8762 batches (2 sequences)
Processing 7338 of 8762 batches (2 sequences)
Processing 7339 of 8762 batches (2 sequences)
Processing 7340 of 8762 batches (2 sequences)
Processing 7341 of 8762 batches (2 sequences)
Processing 7342 of 8762 batches (2 sequences)
Processing 7343 of 8762 batches (2 sequences)
Processing 7344 of 8762 batches (2 sequences)
Processing 7345 of 8762 batches (2 sequences)
Processing 7346 of 8762 batches (2 sequences)
Processing 7347 of 8762 batches (2 sequences)
Processing 7348 of 8762 batches (2 sequences)
Processing 7349 of 8762 batches (2 sequences)
Processing 7350 of 8762 batches (2

Processing 7508 of 8762 batches (1 sequences)
Processing 7509 of 8762 batches (1 sequences)
Processing 7510 of 8762 batches (1 sequences)
Processing 7511 of 8762 batches (1 sequences)
Processing 7512 of 8762 batches (1 sequences)
Processing 7513 of 8762 batches (1 sequences)
Processing 7514 of 8762 batches (1 sequences)
Processing 7515 of 8762 batches (1 sequences)
Processing 7516 of 8762 batches (1 sequences)
Processing 7517 of 8762 batches (1 sequences)
Processing 7518 of 8762 batches (1 sequences)
Processing 7519 of 8762 batches (1 sequences)
Processing 7520 of 8762 batches (1 sequences)
Processing 7521 of 8762 batches (1 sequences)
Processing 7522 of 8762 batches (1 sequences)
Processing 7523 of 8762 batches (1 sequences)
Processing 7524 of 8762 batches (1 sequences)
Processing 7525 of 8762 batches (1 sequences)
Processing 7526 of 8762 batches (1 sequences)
Processing 7527 of 8762 batches (1 sequences)
Processing 7528 of 8762 batches (1 sequences)
Processing 7529 of 8762 batches (1

Processing 7687 of 8762 batches (1 sequences)
Processing 7688 of 8762 batches (1 sequences)
Processing 7689 of 8762 batches (1 sequences)
Processing 7690 of 8762 batches (1 sequences)
Processing 7691 of 8762 batches (1 sequences)
Processing 7692 of 8762 batches (1 sequences)
Processing 7693 of 8762 batches (1 sequences)
Processing 7694 of 8762 batches (1 sequences)
Processing 7695 of 8762 batches (1 sequences)
Processing 7696 of 8762 batches (1 sequences)
Processing 7697 of 8762 batches (1 sequences)
Processing 7698 of 8762 batches (1 sequences)
Processing 7699 of 8762 batches (1 sequences)
Processing 7700 of 8762 batches (1 sequences)
Processing 7701 of 8762 batches (1 sequences)
Processing 7702 of 8762 batches (1 sequences)
Processing 7703 of 8762 batches (1 sequences)
Processing 7704 of 8762 batches (1 sequences)
Processing 7705 of 8762 batches (1 sequences)
Processing 7706 of 8762 batches (1 sequences)
Processing 7707 of 8762 batches (1 sequences)
Processing 7708 of 8762 batches (1

Processing 7866 of 8762 batches (1 sequences)
Processing 7867 of 8762 batches (1 sequences)
Processing 7868 of 8762 batches (1 sequences)
Processing 7869 of 8762 batches (1 sequences)
Processing 7870 of 8762 batches (1 sequences)
Processing 7871 of 8762 batches (1 sequences)
Processing 7872 of 8762 batches (1 sequences)
Processing 7873 of 8762 batches (1 sequences)
Processing 7874 of 8762 batches (1 sequences)
Processing 7875 of 8762 batches (1 sequences)
Processing 7876 of 8762 batches (1 sequences)
Processing 7877 of 8762 batches (1 sequences)
Processing 7878 of 8762 batches (1 sequences)
Processing 7879 of 8762 batches (1 sequences)
Processing 7880 of 8762 batches (1 sequences)
Processing 7881 of 8762 batches (1 sequences)
Processing 7882 of 8762 batches (1 sequences)
Processing 7883 of 8762 batches (1 sequences)
Processing 7884 of 8762 batches (1 sequences)
Processing 7885 of 8762 batches (1 sequences)
Processing 7886 of 8762 batches (1 sequences)
Processing 7887 of 8762 batches (1

Processing 8045 of 8762 batches (1 sequences)
Processing 8046 of 8762 batches (1 sequences)
Processing 8047 of 8762 batches (1 sequences)
Processing 8048 of 8762 batches (1 sequences)
Processing 8049 of 8762 batches (1 sequences)
Processing 8050 of 8762 batches (1 sequences)
Processing 8051 of 8762 batches (1 sequences)
Processing 8052 of 8762 batches (1 sequences)
Processing 8053 of 8762 batches (1 sequences)
Processing 8054 of 8762 batches (1 sequences)
Processing 8055 of 8762 batches (1 sequences)
Processing 8056 of 8762 batches (1 sequences)
Processing 8057 of 8762 batches (1 sequences)
Processing 8058 of 8762 batches (1 sequences)
Processing 8059 of 8762 batches (1 sequences)
Processing 8060 of 8762 batches (1 sequences)
Processing 8061 of 8762 batches (1 sequences)
Processing 8062 of 8762 batches (1 sequences)
Processing 8063 of 8762 batches (1 sequences)
Processing 8064 of 8762 batches (1 sequences)
Processing 8065 of 8762 batches (1 sequences)
Processing 8066 of 8762 batches (1

Processing 8224 of 8762 batches (1 sequences)
Processing 8225 of 8762 batches (1 sequences)
Processing 8226 of 8762 batches (1 sequences)
Processing 8227 of 8762 batches (1 sequences)
Processing 8228 of 8762 batches (1 sequences)
Processing 8229 of 8762 batches (1 sequences)
Processing 8230 of 8762 batches (1 sequences)
Processing 8231 of 8762 batches (1 sequences)
Processing 8232 of 8762 batches (1 sequences)
Processing 8233 of 8762 batches (1 sequences)
Processing 8234 of 8762 batches (1 sequences)
Processing 8235 of 8762 batches (1 sequences)
Processing 8236 of 8762 batches (1 sequences)
Processing 8237 of 8762 batches (1 sequences)
Processing 8238 of 8762 batches (1 sequences)
Processing 8239 of 8762 batches (1 sequences)
Processing 8240 of 8762 batches (1 sequences)
Processing 8241 of 8762 batches (1 sequences)
Processing 8242 of 8762 batches (1 sequences)
Processing 8243 of 8762 batches (1 sequences)
Processing 8244 of 8762 batches (1 sequences)
Processing 8245 of 8762 batches (1

Processing 8403 of 8762 batches (1 sequences)
Processing 8404 of 8762 batches (1 sequences)
Processing 8405 of 8762 batches (1 sequences)
Processing 8406 of 8762 batches (1 sequences)
Processing 8407 of 8762 batches (1 sequences)
Processing 8408 of 8762 batches (1 sequences)
Processing 8409 of 8762 batches (1 sequences)
Processing 8410 of 8762 batches (1 sequences)
Processing 8411 of 8762 batches (1 sequences)
Processing 8412 of 8762 batches (1 sequences)
Processing 8413 of 8762 batches (1 sequences)
Processing 8414 of 8762 batches (1 sequences)
Processing 8415 of 8762 batches (1 sequences)
Processing 8416 of 8762 batches (1 sequences)
Processing 8417 of 8762 batches (1 sequences)
Processing 8418 of 8762 batches (1 sequences)
Processing 8419 of 8762 batches (1 sequences)
Processing 8420 of 8762 batches (1 sequences)
Processing 8421 of 8762 batches (1 sequences)
Processing 8422 of 8762 batches (1 sequences)
Processing 8423 of 8762 batches (1 sequences)
Processing 8424 of 8762 batches (1

Processing 8582 of 8762 batches (1 sequences)
Processing 8583 of 8762 batches (1 sequences)
Processing 8584 of 8762 batches (1 sequences)
Processing 8585 of 8762 batches (1 sequences)
Processing 8586 of 8762 batches (1 sequences)
Processing 8587 of 8762 batches (1 sequences)
Processing 8588 of 8762 batches (1 sequences)
Processing 8589 of 8762 batches (1 sequences)
Processing 8590 of 8762 batches (1 sequences)
Processing 8591 of 8762 batches (1 sequences)
Processing 8592 of 8762 batches (1 sequences)
Processing 8593 of 8762 batches (1 sequences)
Processing 8594 of 8762 batches (1 sequences)
Processing 8595 of 8762 batches (1 sequences)
Processing 8596 of 8762 batches (1 sequences)
Processing 8597 of 8762 batches (1 sequences)
Processing 8598 of 8762 batches (1 sequences)
Processing 8599 of 8762 batches (1 sequences)
Processing 8600 of 8762 batches (1 sequences)
Processing 8601 of 8762 batches (1 sequences)
Processing 8602 of 8762 batches (1 sequences)
Processing 8603 of 8762 batches (1

Processing 8761 of 8762 batches (1 sequences)
Processing 8762 of 8762 batches (1 sequences)


## Step 4: Convert to Embeddings File

In [51]:
!python map_gene_symbol_to_protein_ids.py \
    --fasta_path ./data/{NAME}.fa \
    --save_path ./data/{NAME}.gene_symbol_to_protein_ID.json


!python convert_protein_embeddings_to_gene_embeddings.py \
    --embedding_dir ./data/{NAME}.clean.fa_esm1b \
    --gene_symbol_to_protein_ids_path ./data/{NAME}.gene_symbol_to_protein_ID.json \
    --embedding_model ESM1b \
    --save_path ./data/{NAME}.gene_symbol_to_embedding_ESM1b.pt


100%|█████████████████████████████████| 49792/49792 [00:00<00:00, 253804.51it/s]
Number of gene symbols = 14,718
Number of protein IDs = 39,310
100%|███████████████████████████████████| 39310/39310 [00:21<00:00, 1834.55it/s]
data/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.clean.fa_esm1b/ENSXETP00000100112.2.pt
100%|██████████████████████████████████| 14718/14718 [00:00<00:00, 36799.51it/s]


## STEP 5: Running SPEAR

In [50]:
# Your final embeddings will be located at: 
os.path.abspath(f"./data/{NAME}.gene_symbol_to_embedding_ESM1b.pt")

'/dfs/project/cross-species/yanay/code/SPEAR/protein_embeddings/data/Xenopus_tropicalis.UCB_Xtro_10.0.pep.all.gene_symbol_to_embedding_ESM1b.pt'