<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Read-in-USPTO-from-ORD" data-toc-modified-id="Read-in-USPTO-from-ORD-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Read in USPTO from ORD</a></span><ul class="toc-item"><li><span><a href="#Preface" data-toc-modified-id="Preface-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Preface</a></span></li><li><span><a href="#Extract-USPTO-data-from-ORD" data-toc-modified-id="Extract-USPTO-data-from-ORD-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Extract USPTO data from ORD</a></span></li><li><span><a href="#Tests:-Figure-out-how-to-access-the-info-I-need-in-the-dataset-file" data-toc-modified-id="Tests:-Figure-out-how-to-access-the-info-I-need-in-the-dataset-file-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Tests: Figure out how to access the info I need in the dataset file</a></span></li></ul></li><li><span><a href="#Preprocessing-of-USPTO---Molecular-AI" data-toc-modified-id="Preprocessing-of-USPTO---Molecular-AI-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Preprocessing of USPTO - Molecular AI</a></span><ul class="toc-item"><li><span><a href="#Read-in-data-cleaned-by-rxn-utils" data-toc-modified-id="Read-in-data-cleaned-by-rxn-utils-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Read in data cleaned by rxn utils</a></span></li></ul></li></ul></div>

# Read in USPTO from ORD

## Preface

In [None]:
# I tried to read USPTO data from the ord schema, e.g. the data contained through this link:
# url = "https://github.com/Open-Reaction-Database/ord-data/blob/main/data/02/ord_dataset-026684a62f91469db49c7767d16c39fb.pb.gz?raw=true"
#Â However, ORD reads literally EVERYTHING from USPTO, so this resulted in around 90k x 120k df, which Joe's computer
# and my laptop do not have the momory to deal with.

# There may be 90k columns, but a lot of the columns may have superfluous info, e.g. a type column = SMILES, 
# email columns etc. 
# So one possible solution would be to pre-filter the columns (delete all the unnecessary ones), 
# and then load it afterwards

# I could use the code below to do this 
# However, it's unnecessary, as Joe is parsing the original USPTO xml files!


In [None]:
# # import ord_schema
# # from ord_schema import message_helpers, validations
# # from ord_schema.proto import dataset_pb2

# # import wget

# # # url = "https://github.com/Open-Reaction-Database/ord-data/blob/main/data/02/ord_dataset-026684a62f91469db49c7767d16c39fb.pb.gz?raw=true"
# # url = "https://github.com/open-reaction-database/ord-data/blob/main/data/68/ord_dataset-68cb8b4b2b384e3d85b5b1efae58b203.pb.gz?raw=true"
# # pb = wget.download(url)

# # # Load Dataset message
# # data = message_helpers.load_message(pb, dataset_pb2.Dataset)

# rows = []
# for d in data.reactions:
#     # print(d)
#     row = message_helpers.message_to_row(d)
#     rows.append(row)
#     for k,v in row.items():
#         print(k)
#     break
# df = pd.DataFrame(rows)

In [None]:
# Or following the example here
# https://github.com/open-reaction-database/ord-schema/blob/main/examples/applications/Perera_Science_Granda_Nature_Suzuki/Granda_Perera_ml_example.ipynb

# # Download dataset from ord-data
# url = "https://github.com/open-reaction-database/ord-data/blob/main/data/68/ord_dataset-68cb8b4b2b384e3d85b5b1efae58b203.pb.gz?raw=true"
# pb = wget.download(url)

# # Load Dataset message
# data = message_helpers.load_message(pb, dataset_pb2.Dataset)

# # Ensure dataset validates
# valid_output = validations.validate_message(data)

# # Convert dataset to pandas dataframe
# df = message_helpers.messages_to_dataframe(data.reactions, drop_constant_columns=True)

# # View dataframe
# df


# # View all columns with variation in the dataset
# list(df.columns)


## Extract USPTO data from ORD

1. All of the grants USPTO data is contained here: https://github.com/open-reaction-database/ord-data
2. It is batched by year, it's best to just maintain this batching, it will make it easier to handle (each file won't get excessively large)
3. Read in the data contained in the .pb.gz file, each entry in the "list" is a reaction. Write a for loop over the "list", and extract the following from each reaction:
    - Reactants
    - Products
    - Solvents
    - Reagents
    - Catalyst
    - Temperature
    - Yield
    - Anything else?
4. Build a list for each of these, combine to a df, and then save as a paraquet file
5. repeat this for each of the 41 years (41 datasets) we have data for in USPTO. It'll probably be easiest to convert the code in this notebook into a script, and then run it automatically on each.

In [None]:
# Find the schema here
# https://github.com/open-reaction-database/ord-schema/blob/main/ord_schema/proto/reaction.proto

In [1]:
# Import modules
import ord_schema
from ord_schema import message_helpers, validations
from ord_schema.proto import dataset_pb2

import math
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import wget

from rdkit import Chem
from rdkit.Chem import AllChem
from sklearn import model_selection, metrics
from glob import glob

from tqdm import tqdm

2022-12-03 21:52:17.129875: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# Download dataset from ord-data
#url = "https://github.com/open-reaction-database/ord-data/blob/main/data/68/ord_dataset-68cb8b4b2b384e3d85b5b1efae58b203.pb.gz?raw=true"
#https://github.com/open-reaction-database/ord-data
url = "https://github.com/Open-Reaction-Database/ord-data/blob/main/data/02/ord_dataset-026684a62f91469db49c7767d16c39fb.pb.gz?raw=true"
pb = wget.download(url)


  0% [                                                    ]        0 / 56336827
  0% [                                                    ]     8192 / 56336827
  0% [                                                    ]    16384 / 56336827
  0% [                                                    ]    24576 / 56336827
  0% [                                                    ]    32768 / 56336827
  0% [                                                    ]    40960 / 56336827
  0% [                                                    ]    49152 / 56336827
  0% [                                                    ]    57344 / 56336827
  0% [                                                    ]    65536 / 56336827
  0% [                                                    ]    73728 / 56336827
  0% [                                                    ]    81920 / 56336827
  0% [                                                    ]    90112 / 56336827
  0% [                    


  6% [...                                                 ]  3596288 / 56336827
  6% [...                                                 ]  3604480 / 56336827
  6% [...                                                 ]  3612672 / 56336827
  6% [...                                                 ]  3620864 / 56336827
  6% [...                                                 ]  3629056 / 56336827
  6% [...                                                 ]  3637248 / 56336827
  6% [...                                                 ]  3645440 / 56336827
  6% [...                                                 ]  3653632 / 56336827
  6% [...                                                 ]  3661824 / 56336827
  6% [...                                                 ]  3670016 / 56336827
  6% [...                                                 ]  3678208 / 56336827
  6% [...                                                 ]  3686400 / 56336827
  6% [...                 


 13% [......                                              ]  7430144 / 56336827
 13% [......                                              ]  7438336 / 56336827
 13% [......                                              ]  7446528 / 56336827
 13% [......                                              ]  7454720 / 56336827
 13% [......                                              ]  7462912 / 56336827
 13% [......                                              ]  7471104 / 56336827
 13% [......                                              ]  7479296 / 56336827
 13% [......                                              ]  7487488 / 56336827
 13% [......                                              ]  7495680 / 56336827
 13% [......                                              ]  7503872 / 56336827
 13% [......                                              ]  7512064 / 56336827
 13% [......                                              ]  7520256 / 56336827
 13% [......              


 19% [..........                                          ] 11141120 / 56336827
 19% [..........                                          ] 11149312 / 56336827
 19% [..........                                          ] 11157504 / 56336827
 19% [..........                                          ] 11165696 / 56336827
 19% [..........                                          ] 11173888 / 56336827
 19% [..........                                          ] 11182080 / 56336827
 19% [..........                                          ] 11190272 / 56336827
 19% [..........                                          ] 11198464 / 56336827
 19% [..........                                          ] 11206656 / 56336827
 19% [..........                                          ] 11214848 / 56336827
 19% [..........                                          ] 11223040 / 56336827
 19% [..........                                          ] 11231232 / 56336827
 19% [..........          


 27% [..............                                      ] 15269888 / 56336827
 27% [..............                                      ] 15278080 / 56336827
 27% [..............                                      ] 15286272 / 56336827
 27% [..............                                      ] 15294464 / 56336827
 27% [..............                                      ] 15302656 / 56336827
 27% [..............                                      ] 15310848 / 56336827
 27% [..............                                      ] 15319040 / 56336827
 27% [..............                                      ] 15327232 / 56336827
 27% [..............                                      ] 15335424 / 56336827
 27% [..............                                      ] 15343616 / 56336827
 27% [..............                                      ] 15351808 / 56336827
 27% [..............                                      ] 15360000 / 56336827
 27% [..............      


 34% [..................                                  ] 19595264 / 56336827
 34% [..................                                  ] 19603456 / 56336827
 34% [..................                                  ] 19611648 / 56336827
 34% [..................                                  ] 19619840 / 56336827
 34% [..................                                  ] 19628032 / 56336827
 34% [..................                                  ] 19636224 / 56336827
 34% [..................                                  ] 19644416 / 56336827
 34% [..................                                  ] 19652608 / 56336827
 34% [..................                                  ] 19660800 / 56336827
 34% [..................                                  ] 19668992 / 56336827
 34% [..................                                  ] 19677184 / 56336827
 34% [..................                                  ] 19685376 / 56336827
 34% [..................  


 42% [......................                              ] 23920640 / 56336827
 42% [......................                              ] 23928832 / 56336827
 42% [......................                              ] 23937024 / 56336827
 42% [......................                              ] 23945216 / 56336827
 42% [......................                              ] 23953408 / 56336827
 42% [......................                              ] 23961600 / 56336827
 42% [......................                              ] 23969792 / 56336827
 42% [......................                              ] 23977984 / 56336827
 42% [......................                              ] 23986176 / 56336827
 42% [......................                              ] 23994368 / 56336827
 42% [......................                              ] 24002560 / 56336827
 42% [......................                              ] 24010752 / 56336827
 42% [....................


 49% [.........................                           ] 27934720 / 56336827
 49% [.........................                           ] 27942912 / 56336827
 49% [.........................                           ] 27951104 / 56336827
 49% [.........................                           ] 27959296 / 56336827
 49% [.........................                           ] 27967488 / 56336827
 49% [.........................                           ] 27975680 / 56336827
 49% [.........................                           ] 27983872 / 56336827
 49% [.........................                           ] 27992064 / 56336827
 49% [.........................                           ] 28000256 / 56336827
 49% [.........................                           ] 28008448 / 56336827
 49% [.........................                           ] 28016640 / 56336827
 49% [.........................                           ] 28024832 / 56336827
 49% [....................


 57% [.............................                       ] 32137216 / 56336827
 57% [.............................                       ] 32145408 / 56336827
 57% [.............................                       ] 32153600 / 56336827
 57% [.............................                       ] 32161792 / 56336827
 57% [.............................                       ] 32169984 / 56336827
 57% [.............................                       ] 32178176 / 56336827
 57% [.............................                       ] 32186368 / 56336827
 57% [.............................                       ] 32194560 / 56336827
 57% [.............................                       ] 32202752 / 56336827
 57% [.............................                       ] 32210944 / 56336827
 57% [.............................                       ] 32219136 / 56336827
 57% [.............................                       ] 32227328 / 56336827
 57% [....................


 64% [.................................                   ] 36536320 / 56336827
 64% [.................................                   ] 36544512 / 56336827
 64% [.................................                   ] 36552704 / 56336827
 64% [.................................                   ] 36560896 / 56336827
 64% [.................................                   ] 36569088 / 56336827
 64% [.................................                   ] 36577280 / 56336827
 64% [.................................                   ] 36585472 / 56336827
 64% [.................................                   ] 36593664 / 56336827
 64% [.................................                   ] 36601856 / 56336827
 64% [.................................                   ] 36610048 / 56336827
 64% [.................................                   ] 36618240 / 56336827
 65% [.................................                   ] 36626432 / 56336827
 65% [....................


 72% [.....................................               ] 40828928 / 56336827
 72% [.....................................               ] 40837120 / 56336827
 72% [.....................................               ] 40845312 / 56336827
 72% [.....................................               ] 40853504 / 56336827
 72% [.....................................               ] 40861696 / 56336827
 72% [.....................................               ] 40869888 / 56336827
 72% [.....................................               ] 40878080 / 56336827
 72% [.....................................               ] 40886272 / 56336827
 72% [.....................................               ] 40894464 / 56336827
 72% [.....................................               ] 40902656 / 56336827
 72% [.....................................               ] 40910848 / 56336827
 72% [.....................................               ] 40919040 / 56336827
 72% [....................


 79% [.........................................           ] 44859392 / 56336827
 79% [.........................................           ] 44867584 / 56336827
 79% [.........................................           ] 44875776 / 56336827
 79% [.........................................           ] 44883968 / 56336827
 79% [.........................................           ] 44892160 / 56336827
 79% [.........................................           ] 44900352 / 56336827
 79% [.........................................           ] 44908544 / 56336827
 79% [.........................................           ] 44916736 / 56336827
 79% [.........................................           ] 44924928 / 56336827
 79% [.........................................           ] 44933120 / 56336827
 79% [.........................................           ] 44941312 / 56336827
 79% [.........................................           ] 44949504 / 56336827
 79% [....................


 87% [.............................................       ] 49373184 / 56336827
 87% [.............................................       ] 49381376 / 56336827
 87% [.............................................       ] 49389568 / 56336827
 87% [.............................................       ] 49397760 / 56336827
 87% [.............................................       ] 49405952 / 56336827
 87% [.............................................       ] 49414144 / 56336827
 87% [.............................................       ] 49422336 / 56336827
 87% [.............................................       ] 49430528 / 56336827
 87% [.............................................       ] 49438720 / 56336827
 87% [.............................................       ] 49446912 / 56336827
 87% [.............................................       ] 49455104 / 56336827
 87% [.............................................       ] 49463296 / 56336827
 87% [....................


 93% [................................................    ] 52731904 / 56336827
 93% [................................................    ] 52740096 / 56336827
 93% [................................................    ] 52748288 / 56336827
 93% [................................................    ] 52756480 / 56336827
 93% [................................................    ] 52764672 / 56336827
 93% [................................................    ] 52772864 / 56336827
 93% [................................................    ] 52781056 / 56336827
 93% [................................................    ] 52789248 / 56336827
 93% [................................................    ] 52797440 / 56336827
 93% [................................................    ] 52805632 / 56336827
 93% [................................................    ] 52813824 / 56336827
 93% [................................................    ] 52822016 / 56336827
 93% [....................

In [2]:
# Load Dataset message
pb = 'data/USPTO/ord_dataset-026684a62f91469db49c7767d16c39fb.pb.gz'
data = message_helpers.load_message(pb, dataset_pb2.Dataset)

In [19]:
valid_output = validations.validate_message(data)

[20:57:50] reactant 3 has no mapped atoms.
[20:57:50] reactant 4 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 6 has no mapped atoms.
[20:57:51] reactant 0 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 0 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 1 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 0 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 4 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] product atom-mapping number 4 found multipl

[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 4 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 6 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 4 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 1 has no mapped atoms.
[20:57:51] reactant 2 has no mapped atoms.
[20:57:51] reactant 3 has no mapped atoms.
[20:57:51] reactant 4 has no mapped atoms.
[20:57:51] reactant 5 has no mapped atoms.
[20:57:51] reactant 1 has no mapped atoms.
[20:57:51] reactant 1 has no mapped atoms.
[20:57:51] product atom-mapping number 24 found multiple times.
[20:57:51] product atom-mapping number 25 found multiple times.
[20:57:51] product atom-mapping number 28 found multiple times.
[20:57:51] product atom-mapping number 27 found multiple times.
[20:57:51] pr

[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] reactant 5 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] reactant 5 has no mapped atoms.
[20:57:53] reactant 7 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] reactant 5 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] 

[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 3 has no mapped atoms.
[20:57:53] reactant 4 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:53] reactant 1 has no mapped atoms.
[20:57:53] reactant 2 has no mapped atoms.
[20:57:54] reactant 1 has no mapped atoms.
[20:57:54] reactant 2 has no mapped atoms.
[20:57:54] reactant 1 has no mapped atoms.
[20:57:54] reactant 3 has no mapped atoms.
[20:57:54] 

[20:57:54] reactant 2 has no mapped atoms.
[20:57:54] reactant 3 has no mapped atoms.
[20:57:54] reactant 4 has no mapped atoms.
[20:57:54] reactant 5 has no mapped atoms.
[20:57:54] reactant 6 has no mapped atoms.
[20:57:54] reactant 1 has no mapped atoms.
[20:57:54] reactant 2 has no mapped atoms.
[20:57:54] reactant 3 has no mapped atoms.
[20:57:54] reactant 4 has no mapped atoms.
[20:57:54] reactant 2 has no mapped atoms.
[20:57:54] reactant 2 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 1 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 1 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 5 has no mapped atoms.
[20:57:55] reactant 6 has no mapped atoms.
[20:57:55] reactant 1 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] 

[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 0 has no mapped atoms.
[20:57:55] reactant 1 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 5 has no mapped atoms.
[20:57:55] reactant 6 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 2 has no mapped atoms.
[20:57:55] reactant 3 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 0 has no mapped atoms.
[20:57:55] reactant 1 has no mapped atoms.
[20:57:55] reactant 4 has no mapped atoms.
[20:57:55] reactant 5 has no mapped atoms.
[20:57:55] 

[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 5 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 5 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 5 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] 

[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 5 has no mapped atoms.
[20:57:56] reactant 6 has no mapped atoms.
[20:57:56] reactant 0 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 5 has no mapped atoms.
[20:57:56] reactant 6 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 1 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 4 has no mapped atoms.
[20:57:56] reactant 3 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] reactant 2 has no mapped atoms.
[20:57:56] 

[20:57:58] reactant 1 has no mapped atoms.
[20:57:58] reactant 3 has no mapped atoms.
[20:57:58] reactant 4 has no mapped atoms.
[20:57:58] reactant 5 has no mapped atoms.
[20:57:58] reactant 7 has no mapped atoms.
[20:57:58] reactant 2 has no mapped atoms.
[20:57:58] reactant 2 has no mapped atoms.
[20:57:58] reactant 3 has no mapped atoms.
[20:57:58] reactant 4 has no mapped atoms.
[20:57:58] reactant 5 has no mapped atoms.
[20:57:58] reactant 6 has no mapped atoms.
[20:57:58] reactant 1 has no mapped atoms.
[20:57:58] reactant 2 has no mapped atoms.
[20:57:58] reactant 1 has no mapped atoms.
[20:57:58] reactant 2 has no mapped atoms.
[20:57:58] reactant 2 has no mapped atoms.
[20:57:58] reactant 4 has no mapped atoms.
[20:57:58] reactant 6 has no mapped atoms.
[20:57:58] reactant 3 has no mapped atoms.
[20:57:58] reactant 1 has no mapped atoms.
[20:57:58] reactant 2 has no mapped atoms.
[20:57:58] reactant 3 has no mapped atoms.
[20:57:58] reactant 1 has no mapped atoms.
[20:57:58] 

[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 3 has no mapped atoms.
[20:57:59] reactant 4 has no mapped atoms.
[20:57:59] reactant 5 has no mapped atoms.
[20:57:59] reactant 6 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 0 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 4 has no mapped atoms.
[20:57:59] reactant 5 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 0 has no mapped atoms.
[20:57:59] reactant 3 has no mapped atoms.
[20:57:59] reactant 4 has no mapped atoms.
[20:57:59] product atom-mapping number 7 found multiple times.
[20:57:59] product atom-mapping nu

[20:57:59] reactant 0 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 0 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 0 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 0 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 3 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 3 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 3 has no mapped atoms.
[20:57:59] reactant 4 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] reactant 2 has no mapped atoms.
[20:57:59] reactant 4 has no mapped atoms.
[20:57:59] reactant 5 has no mapped atoms.
[20:57:59] reactant 1 has no mapped atoms.
[20:57:59] 

[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 5 has no mapped atoms.
[20:58:00] reactant 6 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 5 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] product atom-mapping number 1 found multiple times.
[20:58:00] product atom-mapping number 5 found multiple times.
[20:58:00] product atom-mapping number 6 found multiple times.
[20:58:00] product atom-mapping number 8 found multiple times.
[20:58:00] product atom-mapping number 12 found multiple tim

[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 5 has no mapped atoms.
[20:58:00] reactant 6 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 5 has no mapped atoms.
[20:58:00] reactant 6 has no mapped atoms.
[20:58:00] reactant 7 has no mapped atoms.
[20:58:00] reactant 0 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 3 has no mapped atoms.
[20:58:00] reactant 0 has no mapped atoms.
[20:58:00] reactant 0 has no mapped atoms.
[20:58:00] reactant 4 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] reactant 2 has no mapped atoms.
[20:58:00] reactant 1 has no mapped atoms.
[20:58:00] 

[20:58:01] reactant 2 has no mapped atoms.
[20:58:01] reactant 3 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 0 has no mapped atoms.
[20:58:01] reactant 3 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 2 has no mapped atoms.
[20:58:01] reactant 3 has no mapped atoms.
[20:58:01] reactant 5 has no mapped atoms.
[20:58:01] reactant 6 has no mapped atoms.
[20:58:01] reactant 7 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 2 has no mapped atoms.
[20:58:01] reactant 3 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 2 has no mapped atoms.
[20:58:01] reactant 3 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 2 has no mapped atoms.
[20:58:01] reactant 3 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 1 has no mapped atoms.
[20:58:01] reactant 2 has no mapped atoms.
[20:58:01] 

[20:58:02] reactant 1 has no mapped atoms.
[20:58:02] product atom-mapping number 1 found multiple times.
[20:58:02] product atom-mapping number 2 found multiple times.
[20:58:02] product atom-mapping number 7 found multiple times.
[20:58:02] product atom-mapping number 6 found multiple times.
[20:58:02] product atom-mapping number 5 found multiple times.
[20:58:02] product atom-mapping number 8 found multiple times.
[20:58:02] product atom-mapping number 12 found multiple times.
[20:58:02] product atom-mapping number 13 found multiple times.
[20:58:02] product atom-mapping number 15 found multiple times.
[20:58:02] product atom-mapping number 16 found multiple times.
[20:58:02] product atom-mapping number 21 found multiple times.
[20:58:02] product atom-mapping number 20 found multiple times.
[20:58:02] product atom-mapping number 19 found multiple times.
[20:58:02] product atom-mapping number 22 found multiple times.
[20:58:02] product atom-mapping number 25 found multiple times.
[20

[20:58:02] reactant 2 has no mapped atoms.
[20:58:02] reactant 3 has no mapped atoms.
[20:58:02] reactant 4 has no mapped atoms.
[20:58:02] reactant 5 has no mapped atoms.
[20:58:02] reactant 6 has no mapped atoms.
[20:58:02] reactant 7 has no mapped atoms.
[20:58:02] reactant 1 has no mapped atoms.
[20:58:02] reactant 3 has no mapped atoms.
[20:58:02] reactant 4 has no mapped atoms.
[20:58:02] reactant 2 has no mapped atoms.
[20:58:02] reactant 3 has no mapped atoms.
[20:58:02] reactant 1 has no mapped atoms.
[20:58:02] reactant 2 has no mapped atoms.
[20:58:02] reactant 1 has no mapped atoms.
[20:58:02] reactant 3 has no mapped atoms.
[20:58:02] reactant 4 has no mapped atoms.
[20:58:02] reactant 1 has no mapped atoms.
[20:58:02] reactant 1 has no mapped atoms.
[20:58:02] reactant 2 has no mapped atoms.
[20:58:02] reactant 1 has no mapped atoms.
[20:58:03] reactant 2 has no mapped atoms.
[20:58:03] reactant 3 has no mapped atoms.
[20:58:03] reactant 4 has no mapped atoms.
[20:58:03] 

[20:58:03] reactant 0 has no mapped atoms.
[20:58:03] reactant 2 has no mapped atoms.
[20:58:03] product atom-mapping number 8 found multiple times.
[20:58:03] product atom-mapping number 12 found multiple times.
[20:58:03] product atom-mapping number 13 found multiple times.
[20:58:03] product atom-mapping number 15 found multiple times.
[20:58:03] product atom-mapping number 19 found multiple times.
[20:58:03] product atom-mapping number 20 found multiple times.
[20:58:03] product atom-mapping number 21 found multiple times.
[20:58:03] product atom-mapping number 22 found multiple times.
[20:58:03] product atom-mapping number 23 found multiple times.
[20:58:03] product atom-mapping number 24 found multiple times.
[20:58:03] product atom-mapping number 27 found multiple times.
[20:58:03] product atom-mapping number 28 found multiple times.
[20:58:03] product atom-mapping number 29 found multiple times.
[20:58:03] product atom-mapping number 30 found multiple times.
[20:58:03] product 

[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] product atom-mapping number 1 found multiple times.
[20:58:04] product atom-mapping number 16 found multiple times.
[20:58:04] product atom-mapping number 17 found multiple times.
[20:58:04] product atom-mapping number 18 found multiple times.
[20:58:04] product atom-mapping number 19 found multiple times.
[20:58:04] product atom-mapping number 20 found multiple times.
[20:58:04] product atom-mapping number 21 found multiple times.
[20:58:04] product atom-mapping number 2 found multiple times.
[20:58:04] product atom-mapping number 3 found multiple times.
[20:58:04] product atom-mapping number 4 found multiple times.
[20:58:04] product atom-mapping number 7 found multiple times.
[20:58:04] product atom-mapping number 8 found multiple times.
[20:58:04] product atom-mapping number 12 found multiple times.
[20:58:04] product atom-mapping number 11 found multiple times.
[20:58:04] product atom-

[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 0 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 3 has no mapped atoms.
[20:58:04] reactant 4 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 3 has no mapped atoms.
[20:58:04] reactant 4 has no mapped atoms.
[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 1 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 3 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] reactant 3 has no mapped atoms.
[20:58:04] reactant 2 has no mapped atoms.
[20:58:04] 

[20:58:05] reactant 1 has no mapped atoms.
[20:58:05] reactant 2 has no mapped atoms.
[20:58:05] reactant 4 has no mapped atoms.
[20:58:05] reactant 5 has no mapped atoms.
[20:58:05] reactant 2 has no mapped atoms.
[20:58:05] reactant 3 has no mapped atoms.
[20:58:05] reactant 2 has no mapped atoms.
[20:58:05] reactant 2 has no mapped atoms.
[20:58:05] reactant 2 has no mapped atoms.
[20:58:05] reactant 3 has no mapped atoms.
[20:58:05] reactant 0 has no mapped atoms.
[20:58:05] reactant 2 has no mapped atoms.
[20:58:05] reactant 4 has no mapped atoms.
[20:58:05] reactant 5 has no mapped atoms.
[20:58:06] reactant 0 has no mapped atoms.
[20:58:06] reactant 1 has no mapped atoms.
[20:58:06] reactant 2 has no mapped atoms.
[20:58:06] reactant 3 has no mapped atoms.
[20:58:06] reactant 4 has no mapped atoms.
[20:58:06] reactant 2 has no mapped atoms.
[20:58:06] reactant 3 has no mapped atoms.
[20:58:06] reactant 2 has no mapped atoms.
[20:58:06] reactant 4 has no mapped atoms.
[20:58:06] 

ValidationError: Dataset.reactions[1478].inputs["m1_m2"].components[1].identifiers[2]: RDKit 2022.03.5 could not validate InChI identifier InChI=1S/CHCl/c1-2/h1H

In [None]:
# inputs
# REACTANT = 1;
# REAGENT = 2;
# SOLVENT = 3;
# CATALYST = 4;
# WORKUP = 5;
# INTERNAL_STANDARD = 6;
# AUTHENTIC_STANDARD = 7;
# PRODUCT = 8;

# temperature:
# UNSPECIFIED = 0;
# CUSTOM = 1;
# AMBIENT = 2;
# OIL_BATH = 3;
# WATER_BATH = 4;
# SAND_BATH = 5;
# ICE_BATH = 6;
# DRY_ALUMINUM_PLATE = 7;
# MICROWAVE = 8;
# DRY_ICE_BATH = 9;
# AIR_FAN = 10;
# LIQUID_NITROGEN = 11;

# structure
# inputs -> m1, m2, m3 ...
# conditions -> temperature, ...
# notes
# workups
# outcomes -> reactants, yield?

In [68]:
#Known problems
# Some reactants (probably also other categories) only have the 'name' identifier, and thus, it is not possble to extract the smiles using [1] syntax
#   I'd need to write a try statement, and if we can't extract the smiles, then extract the name. Then we need to resolve the name and return a SMILES

reactants_all = []
reagents_all = []
products_all = []
solvents_all = []
reagents_all = []
catalysts_all = []

temperature_all = []

rxn_times_all = []

for i in tqdm(range(len(data.reactions))):
    rxn = data.reactions[i]
    # handle rxn inputs: reactants, reagents etc
    reactants = []
    reagents = []
    solvents = []
    catalysts = []
    products = []
    
    temperatures = []

    rxn_times = []

    # inputs
    for key in rxn.inputs: #these are the keys in the 'dict' style data struct
        try:
            rxn_role = rxn.inputs[key].components[0].reaction_role #rxn role
            if rxn_role == 1: #reactant
                reactants += [rxn.inputs[key].components[0].identifiers[1].value] # this should be the smiles
            elif rxn_role ==2: #reagent
                reagents += [rxn.inputs[key].components[0].identifiers[1].value] # this should be the smiles
            elif rxn_role ==3: #solvent
                solvents += [rxn.inputs[key].components[0].identifiers[1].value] # this should be the smiles
            elif rxn_role ==4: #catalyst
                catalysts += [rxn.inputs[key].components[0].identifiers[1].value] # this should be the smiles
            elif rxn_role in [5,6,7]: #workup, internal standard, authentic standard. don't care about these
                continue
            elif rxn_role ==8: #product
                products += [rxn.inputs[key].components[0].identifiers[1].value] # this should be the smiles
        except IndexError:
            #print(i, key )
            continue

    # temperature
    try:
        temperatures +=[rxn.conditions.temperature.control.type]
    except IndexError:
        temperatures += [np.nan]

    #outcomes
    try:
        rxn_times = (rxn.outcomes[0].reaction_time.value, rxn.outcomes[0].reaction_time.units)
        products = [rxn.outcomes[0].products[0].identifiers[1].value]
    except IndexError:
        rxn_times = (np.nan, np.nan)
        products = [np.nan]



    reactants_all += [reactants]
    reagents_all += [reagents]
    solvents_all += [solvents]
    catalysts_all += [catalyst]
    
    temperature_all = [temperatures]

    rxn_times_all += [rxn_times]      
    products_all += [products]
              

  5%|â         | 4266/93834 [00:00<00:04, 20005.80it/s]

0 m4
1 m4_5
1 m4_m5
5 m4
39 m2_m3_m4_m5_m7_m6_m8
98 m4
167 m2
178 m3
185 m4_m6
243 m3_m6
245 m6_m7
305 m6_m9
310 m1
321 m3
437 m3
466 m2
551 m1_m2
568 m4
638 m2
649 m3
656 m4_m6
680 m6_m10
692 m1
692 m3
699 m4
700 m5_m1_m2_m3
701 m5_m1_m2_m3
702 m5_m1_m2_m3
703 m5_m1_m2_m3
704 m5_m1_m2_m3
705 m5_m1_m2_m3
707 m2_m3
708 m3_m4
724 m2
729 m3_m4
737 m1_m2_m4
739 m2
740 m3_m4_m6
773 m9
773 m1
776 m1_m4_m2_m3
778 m3
805 m3
806 m3
893 m1_m2_m3_m4
894 m5
907 m6_m7_m8
913 m7_m8_m9
914 m7_m8_m9
918 m8_m9
919 m1_m2_m3_m5
920 m3
920 m4
921 m3
922 m3
931 m4
942 m3
948 m3
966 m1_m2_m4_m3
998 m3_m6
1002 m2_m3_m4_m5
1030 m1
1051 m4
1053 m2
1087 m4
1101 m5
1115 m7
1190 m1_m2_m5_m3
1192 m1_m2_m3
1193 m1_m2_m3
1194 m1_m2_m5
1195 m1_m2_m5
1275 m1_m2_m3_m6_m4_m7
1281 m7_m4
1287 m1_m2_m3_m4_m5
1298 m6
1298 m3
1303 m4
1304 m4
1315 m1_m2_m4
1316 m7
1320 m3
1321 m5
1326 m3
1327 m3
1328 m3
1329 m3
1338 m3
1339 m3
1340 m3
1341 m3
1343 m3
1344 m3
1345 m3
1346 m3
1347 m3
1348 m3
1349 m3
1350 m3
1353 m3
1354 m3
1355

  7%|â         | 6337/93834 [00:00<00:05, 16988.32it/s]

 m3
4344 m3
4347 m3
4356 m3
4359 m3
4370 m6_m8
4390 m4
4390 m3
4401 m2
4413 m4
4419 m3_m6_m4
4421 m3
4427 m3
4448 m2
4453 m2_m3
4464 m5
4482 m3
4482 m6_m7_m8
4542 m1_m2_m3_m5
4671 m2
4680 m3
4769 m2
4772 m2
4846 m5_m6
4851 m1_m2_m3
4852 m1_m2_m3
4854 m1_m2_m3
4855 m1_m2_m3
4862 m1_m2
4956 m4
4998 m1_m2_m3
5000 m1_m2_m3
5002 m1
5003 m2
5004 m1
5004 m3
5005 m1
5006 m2
5009 m2
5011 m3
5012 m3
5012 m5
5014 m1_m2_m3
5016 m1_m2
5022 m1_m2
5022 m3
5023 m6
5026 m3_m4
5026 m1
5026 m5
5059 m4
5060 m3
5142 m3_m4_m9
5146 m7_m4
5146 m5
5147 m5_m4
5165 m5
5165 m3
5174 m7_m1_m2_m3_m8_m9
5254 m1_m2_m7
5254 m4_m5_m6
5257 m2_m3_m7
5257 m4_m5_m8
5259 m2_m3_m7_m4_m5
5260 m1_m10_m2_m3_m6_m7
5278 m6
5285 m2_m9_m3_m4_m7
5361 m1_m6_m2_m3_m8
5365 m2_m11_m3_m4_m7_m8
5371 m3
5373 m2_m11_m3_m4_m7_m8
5382 m2
5398 m4
5436 m3
5440 m4_m5
5442 m3_m4_m5
5470 m2
5472 m5
5486 m4_m5
5518 m2
5520 m5
5556 m3
5579 m4_m5
5582 m4
5584 m5
5620 m4
5715 m3_m4
5722 m3
5787 m4
5976 m2
5983 m2
6053 m3
6104 m3
6120 m1
6120 m2
6131 m2

 11%|ââ        | 10736/93834 [00:00<00:05, 15988.64it/s]

8755 m5
8766 m2_m3
8766 m4_m5
8775 m4
8788 m5
8789 m5
8793 m2
8811 m1_m2_m3
8812 m4
8816 m1_m2_m3
8817 m1_m2
8826 m3
8827 m4
8830 m1_m2_m4_m3
8831 m1_m2_m3_m6_m4
8834 m1_m2_m4_m3_m5
8835 m1_m2_m4_m3
8857 m4
8861 m2_m3_m4
8862 m4
8885 m4
8895 m5
8897 m3
8904 m3
8915 m2_m3
8917 m1_m2_m4
8918 m3
8920 m5
8922 m8
8922 m7
8952 m4
8982 m6
8982 m5
9020 m6
9023 m4
9024 m3
9025 m3
9027 m4
9028 m6
9035 m2_m6
9040 m3
9040 m5
9040 m4
9049 m6_m7_m8
9056 m2
9069 m2
9073 m2
9076 m2
9077 m2
9092 m3
9098 m5
9098 m3
9098 m7_m8_m9
9098 m4
9115 m4
9127 m4
9127 m3
9128 m6_m12
9128 m9_m13
9128 m5
9137 m4
9145 m3
9221 m7_m10
9291 m2_m3_m5
9310 m3
9315 m3
9321 m1_m3_m2_m4
9340 m3
9340 m5
9345 m1_m2_m4
9351 m4
9358 m5
9385 m3
9397 m5_m4
9419 m2
9428 m4
9444 m2
9451 m4
9468 m2_m3_m4
9500 m3
9504 m5
9517 m2
9519 m3
9574 m4
9608 m6
9614 m3
9638 m3
9644 m3_m4
9661 m4
9662 m4
9684 m4_m6
9685 m6
9685 m5
9685 m7
9685 m8
9685 m2
9690 m1_m3
9706 m1_m2_m3_m4
9707 m1_m2_m4_m3
9712 m2_m3_m4_m5
9744 m1_m2_m3_m4_m5
9745 m4
9

 14%|ââ        | 12699/93834 [00:00<00:05, 15635.98it/s]

 m6_m7_m3
12654 m1_m2_m6_m3
12679 m1
12684 m1_m2_m3
12685 m1
12691 m1_m2_m3_m6_m4_m7
12693 m6
12698 m4
12698 m6
12698 m9
12698 m8
12698 m10
12698 m11
12698 m7
12698 m5
12701 m3
12703 m2
12709 m1
12711 m5_m6
12718 m2
12720 m2_m3
12721 m6
12721 m3_m4_m8
12722 m3
12725 m6
12727 m4
12733 m6
12736 m1
12737 m6
12737 m1
12738 m1
12740 m2
12741 m2
12759 m4
12760 m4
12761 m4
12762 m4
12766 m3
12767 m3
12773 m0_m7
12780 m6
12780 m1
12781 m2_m3_m7
12782 m2_m3_m7
12788 m1
12796 m1_m2_m3
12818 m1_m2_m3
12822 m1_m2_m6
12823 m4
12823 m3
12834 m3
12838 m4_m5
12841 m4
12855 m2_m3_m5_m4_m6
12928 m7
12941 m1_m2_m3_m4_m5_m6
12944 m5
12951 m4_m5_m6_m8
12955 m4_m5_m6_m8
12963 m2_m3_m6
12965 m5_m8
12966 m4_m5
12968 m3
12978 m1_m4_m6_m2_m5
12979 m5_m6
12980 m1_m2_m5_m3
12982 m1_m2_m5_m3
12989 m4_m5
12990 m3
12995 m4
12995 m5
12997 m4
12997 m5
12997 m3
12997 m6
12998 m4
12998 m6
12998 m3
12998 m5
12999 m6
12999 m5
12999 m4
12999 m7
13000 m4
13000 m5
13000 m6
13000 m7
13000 m3
13001 m6
13001 m3
13001 m5
13001 m

 17%|ââ        | 16073/93834 [00:01<00:06, 11181.96it/s]


14794 m8
14795 m8
14816 m3
14817 m3
14833 m4_m5
14855 m1
14857 m2
14858 m2
14859 m2
14860 m1
14864 m3
14900 m5
14902 m3
14915 m3
14915 m7
14919 m7
14919 m3
14927 m3
14930 m4
14931 m4
14938 m4
14948 m4
15062 m5_m3
15075 m4
15079 m2
15082 m4
15108 m2_m4
15112 m4
15120 m2
15125 m3_m4_m5
15151 m2
15211 m3
15212 m4
15239 m4_m5
15245 m4_m5
15263 m4
15275 m4_m5
15284 m4
15287 m2_m3
15293 m2_m3
15303 m2_m3
15343 m3
15383 m7
15383 m8
15482 m3
15493 m4
15494 m3
15529 m1_m2_m3
15564 m5
15618 m2
15629 m3
15635 m12
15635 m15
15635 m23
15635 m17
15635 m7_m25
15635 m24
15635 m22
15635 m9
15635 m18_m19_m20
15635 m13
15635 m11
15635 m14
15636 m5
15636 m4
15636 m12
15636 m7
15636 m6
15637 m1
15646 m5_m6
15647 m4
15648 m3
15655 m4
15656 m4
15666 m4
15667 m3
15669 m7
15671 m4
15672 m3
15674 m7
15687 m4
15764 m3
15764 m5
15775 m2_m3_m4
15808 m2_m3_m4
15823 m3
15828 m3
15871 m2_m3_m7
15876 m3_m4
15877 m1_m2_m4_m3
15878 m3
15884 m5_m7
15885 m4
15914 m1
15916 m2
15916 m3
15923 m2
15923 m3
15925 m2
15925 m3
1

 21%|ââ        | 19541/93834 [00:01<00:05, 12873.18it/s]

 m2
17193 m3_m5
17220 m5
17224 m4
17225 m1_m2_m3_m8
17237 m4_m5
17237 m2_m3_m6
17242 m4_m7_m5_m8_m6
17246 m2
17254 m4_m5_m10
17254 m7
17254 m6
17273 m5_m6
17277 m1_m2_m5
17278 m4_m5_m9
17283 m1_m2_m5
17289 m3
17289 m4
17298 m2
17308 m2
17308 m6_m7
17326 m3_m4
17374 m3
17421 m3_m4
17485 m3
17486 m3
17494 m3
17495 m3
17496 m3
17497 m3
17520 m3
17542 m2
17573 m3
17621 m7
17621 m6
17621 m10
17621 m9
17621 m8
17621 m5
17636 m1_m2_m3
17636 m4_m5_m8_m6
17655 m3
17762 m2_m3_m4
17778 m7_m8
17788 m1_m10_m2_m3_m6_m7
17793 m1_m2_m3_m4
17804 m1
17807 m3
17807 m1
17808 m0_m6
17808 m4_m8
17819 m1_m2_m4
17838 m5
17839 m1_m3
17842 m5_m8
17854 m2_m3
17865 m3
17883 m6
17890 m10_m11
17914 m3
17916 m2_m3
17935 m10_m6_m7
17938 m6_m2_m3
17944 m3
17949 m4
17951 m1_m2_m7
17952 m2
17957 m3
17957 m4
17959 m2
17963 m4
17964 m3
17966 m7
17968 m4
17969 m3
17971 m7
17989 m1
17991 m3_m5
18004 m3
18004 m1
18004 m2
18016 m1
18019 m1
18020 m1
18021 m1
18022 m1
18023 m1
18024 m1
18025 m1
18026 m1
18027 m1
18028 m1
18029 

 23%|âââ       | 21300/93834 [00:01<00:05, 13417.74it/s]

m5
21228 m4
21232 m2
21244 m2
21250 m5
21256 m5
21261 m1_m4_m2_m5
21266 m5_m4
21267 m4
21267 m7_m8
21269 m1_m7_m6
21273 m1_m2_m7
21274 m4
21275 m5
21275 m3
21277 m3
21289 m4
21290 m1_m2
21293 m1_m2_m3
21295 m4
21298 m1
21298 m2_m3_m8
21299 m2
21299 m3
21299 m1
21299 m5
21299 m6
21299 m7
21313 m4
21316 m4
21321 m3_m4_m5
21329 m5
21331 m3
21333 m3
21348 m4
21349 m4
21351 m4
21352 m3
21354 m4
21359 m2
21375 m3
21381 m3
21405 m1
21442 m1
21443 m1
21447 m3
21453 m4
21454 m4
21460 m1
21470 m1
21495 m1_m2_m3_m6
21499 m1_m2_m3_m6
21513 m1_m2_m5
21546 m5_m9
21548 m7_m13
21548 m1
21555 m5_m6
21587 m4_m6
21595 m3_m4
21598 m3_m7
21636 m4
21665 m4
21685 m8
21685 m4
21685 m7
21688 m8
21688 m7
21744 m2_m7_m5_m3_m4_m6
21767 m5_m9_m6_m7
21771 m5
21778 m5
21783 m5_m9_m6
21798 m5
21802 m6
21804 m6
21805 m6
21806 m6
21860 m5_m7
21861 m4_m5
21865 m4
21869 m4
21905 m2_m3
21910 m1_m2_m3
21910 m6_m7
21913 m3_m6
21914 m3
21919 m5
21922 m3
21928 m3
21931 m3
21942 m4_m9_m8_m7
21943 m9_m4_m5
21977 m3
21999 m1_m2_

 27%|âââ       | 25676/93834 [00:01<00:04, 14308.57it/s]


24146 m3
24152 m4
24153 m4
24160 m3
24173 m3
24177 m3
24178 m2
24179 m3_m4
24187 m5
24188 m2
24188 m3
24206 m1_m4
24207 m1
24207 m3
24208 m19
24208 m23
24208 m5
24208 m3
24208 m20
24208 m2
24208 m18
24208 m22
24208 m17
24209 m1
24211 m1
24212 m1
24213 m1
24214 m1
24215 m1
24216 m1
24217 m1
24218 m1
24220 m1
24221 m1
24222 m1
24223 m1
24225 m4_m5
24231 m1
24232 m1
24233 m1
24234 m1
24235 m1
24236 m1
24237 m1
24238 m1
24239 m1
24240 m1
24241 m1
24242 m1
24243 m1
24244 m1
24246 m1
24247 m1
24248 m1
24249 m1
24250 m1
24251 m1
24252 m1
24253 m1
24275 m4
24293 m6
24295 m1_m2_m3_m4_m5
24314 m3
24344 m1
24347 m2
24347 m1
24349 m4
24349 m3_m7
24350 m7_m8
24350 m4
24364 m3_m4
24366 m5_m9_m6
24366 m7
24384 m5
24403 m3
24424 m3
24425 m2_m3_m4
24439 m2
24486 m5
24491 m3
24529 m2
24529 m3
24606 m2_m3
24612 m4
24613 m4
24615 m5
24619 m1
24626 m4
24653 m3
24678 m1
24679 m1_m2_m4
24680 m1_m2_m4
24686 m1_m2_m5
24689 m4
24698 m3
24698 m5
24698 m4
24699 m2_m3_m6
24702 m7
24702 m4
24704 m2_m3_m6
24709 m2_

 30%|âââ       | 28525/93834 [00:02<00:04, 13803.85it/s]

 m6
26223 m5
26224 m5
26224 m6
26225 m4
26230 m3
26248 m3
26248 m4_m5_m8
26294 m5
26308 m5_m6
26323 m5
26328 m1
26329 m1
26330 m1
26332 m1
26333 m1
26334 m1
26335 m1
26336 m1
26338 m1
26339 m1
26387 m5
26387 m2
26387 m8
26387 m3
26387 m7
26433 m3
26456 m4
26460 m2
26466 m1_m2_m5
26474 m5
26482 m4
26484 m5
26488 m1
26494 m6
26496 m6
26498 m6
26499 m6
26501 m6
26503 m6
26504 m6
26505 m5
26506 m5
26517 m1_m2_m4
26518 m1_m2_m4
26523 m3_m4
26524 m5
26532 m4
26534 m4
26535 m5
26536 m3
26566 m3_m4
26570 m3
26581 m2
26629 m3
26634 m3
26635 m3
26649 m2
26771 m3
26809 m4_m5
26810 m4
26826 m3
26852 m3
26860 m1_m2_m3_m4_m5
26915 m4
26920 m4_m5
26923 m1_m2_m4_m3_m5
26925 m5
26983 m3
27000 m5
27004 m3
27018 m5
27050 m4
27072 m1_m2_m3_m4_m5
27077 m1_m2_m3_m4_m5
27095 m5
27103 m3
27108 m4
27120 m5
27143 m4
27148 m6
27149 m6_m9
27150 m4
27150 m6
27156 m5
27157 m4
27164 m3
27164 m4
27174 m3
27175 m5
27175 m4
27185 m4
27197 m7
27199 m4
27210 m5
27212 m4
27213 m3
27225 m4
27227 m5
27229 m4
27234 m5
27235 

 34%|ââââ      | 32039/93834 [00:02<00:04, 14734.20it/s]

 m4_m5
29755 m4
29756 m5
29757 m3_m6
29758 m4
29759 m3
29761 m3
29764 m3
29767 m4
29768 m3
29769 m2
29769 m4
29770 m3
29771 m5
29772 m4
29772 m6
29781 m3
29783 m3_m4
29801 m1
29943 m1_m2_m3_m4_m5
29956 m5_m0
29984 m8
29991 m6_m4_m5
29993 m4
29993 m3
30005 m1
30006 m1
30007 m1
30008 m1
30009 m1
30010 m1
30011 m1
30012 m1
30013 m1
30014 m1
30069 m6_m5_m0
30117 m3
30133 m3
30151 m1_m6
30221 m6_m7
30260 m3_m4
30281 m4_m5_m10
30289 m7_m2
30296 m3
30317 m4
30329 m4
30338 m4
30375 m4_m5
30376 m4
30383 m3
30403 m2_m3_m4_m5_m6_m7
30413 m1
30439 m3_m4_m6
30441 m4
30443 m5
30445 m5
30464 m6
30584 m1_m2_m5_m3_m6
30602 m3
30602 m8_m2
30609 m3
30630 m9
30655 m8
30661 m1_m2_m3
30691 m7
30694 m7
30697 m3
30701 m3
30708 m5_m6
30728 m3
30738 m5
30739 m3_m4
30740 m6
30740 m1
30741 m1
30750 m6_m7
30754 m3_m4
30773 m3_m4
30806 m2_m3_m4_m5
30808 m3
30813 m3
30817 m2_m3
30831 m3
30832 m4_m7
30841 m4_m5_m8
30851 m5_m9
30853 m4
30854 m5
30861 m4
30866 m4
30867 m7
30890 m3
30938 m4_m5_m7
30940 m4
30941 m3
30991

 38%|ââââ      | 35889/93834 [00:02<00:03, 15437.00it/s]

 m9_m5
33872 m4
33881 m6
33893 m1_m3_m2_m4
33897 m1
33901 m2
33902 m5
33902 m2
33902 m1
33918 m4_m3
33929 m4
33934 m7_m8
33934 m1
33934 m5
33934 m3
33935 m1
33935 m3
33935 m5
33935 m6_m7
33936 m3
33936 m1
33936 m6_m7
33936 m5
33937 m3
33937 m5
33937 m1
33937 m6_m7
33938 m3
33938 m5
33938 m1
33938 m6_m7
33939 m1
33939 m5
33939 m3
33939 m6_m7
33940 m5
33940 m6_m7
33940 m3
33940 m1
33941 m5
33941 m1
33941 m6_m7
33941 m3
33942 m5
33942 m6_m7
33942 m1
33942 m3
33951 m3_m4_m6
33962 m3
33970 m3_m4
33999 m1
33999 m4_m5_m6
34007 m2_m6
34017 m3_m4
34022 m3
34028 m5_m6_m10_m7_m8
34037 m5_m7_m8
34045 m2_m3_m6
34046 m2_m3_m6
34048 m2_m3_m6
34050 m2_m3_m5
34051 m5_m6_m9
34051 m2_m3_m10
34053 m8
34054 m1_m2_m5
34057 m1_m2_m3_m4_m6_m7
34058 m2
34063 m3
34070 m2
34071 m3
34074 m3
34084 m7_m8
34086 m3
34087 m8
34089 m3_m9
34089 m7_m10
34089 m5
34091 m7_m8
34092 m7_m8
34093 m7_m8
34126 m1_m5_m6
34127 m7
34128 m2_m3_m4
34129 m5
34131 m5
34135 m1
34142 m3_m4
34153 m1_m2_m3_m7_m6
34194 m4
34195 m3_m4
34197 

 40%|ââââ      | 37976/93834 [00:02<00:03, 15694.01it/s]


36957 m3
36969 m3
36973 m4
36974 m3_m4
36979 m3
36982 m4_m5
36989 m3
37006 m2
37007 m4
37012 m5
37014 m4
37019 m4
37033 m6
37044 m3_m4
37044 m5_m8
37068 m7
37068 m6
37070 m5_m6
37072 m5_m7
37073 m2
37080 m3
37082 m6
37118 m4
37141 m3_m4
37143 m3
37145 m3
37149 m3
37155 m2
37161 m1_m4
37164 m2
37165 m3
37165 m4
37165 m8
37167 m2
37168 m2
37169 m3
37170 m3
37170 m4
37171 m2
37172 m2
37173 m2
37174 m2
37175 m4
37176 m2
37178 m3
37179 m2
37180 m2
37181 m2
37182 m2
37184 m3
37184 m4
37185 m3
37186 m2
37187 m4
37188 m2
37189 m2
37190 m2
37193 m4
37194 m4
37206 m5
37259 m5
37259 m1_m2_m4
37268 m5
37283 m5
37288 m4
37293 m5
37306 m5
37307 m6
37331 m8
37332 m5
37335 m6
37335 m10
37335 m7_m8_m13_m9_m16_m15_m14
37338 m5
37339 m5
37344 m5
37405 m4
37406 m3
37422 m4_m5
37470 m3
37481 m0_m1_m2_m3_m4
37483 m0_m1_m2_m3_m4
37484 m0_m1_m2_m3_m4
37485 m0_m1_m2_m3_m4
37486 m0_m1_m2_m3_m4
37487 m0_m1_m2_m3_m4
37488 m0_m1_m2_m3_m4
37489 m0_m1_m2_m3_m4
37490 m0_m1_m2_m3_m4
37491 m0_m1_m2_m3_m4
37493 m0_m1_m

 45%|âââââ     | 42559/93834 [00:02<00:02, 18031.74it/s]

m5
40113 m3
40120 m1
40121 m1
40122 m1
40123 m1
40124 m1
40125 m1
40125 m6
40148 m5
40149 m6_m7
40149 m3
40151 m5_m6
40168 m4
40170 m4
40171 m5
40172 m3
40202 m3_m4
40206 m3
40217 m2
40265 m3
40270 m3
40271 m3
40285 m2
40407 m3
40445 m4_m5
40446 m4
40462 m3
40488 m3
40496 m1_m2_m3_m4_m5
40551 m4
40556 m4_m5
40559 m1_m2_m4_m3_m5
40561 m5
40619 m3
40636 m5
40640 m3
40654 m5
40685 m5
40686 m5
40691 m6
40700 m4
40703 m4
40705 m3
40706 m3
40725 m3_m4_m5
40726 m1_m2_m6
40726 m3_m4_m7
40727 m1_m2_m3_m4
40728 m6_m7_m12
40728 m4_m5_m10_m11
40728 m1_m2_m9_m3
40729 m3_m4_m7_m8
40734 m1_m2_m6
40734 m3_m4_m7
40735 m1
40735 m5_m6_m7
40737 m2_m3_m6
40743 m5
40746 m6
40748 m1_m2_m3
40753 m8_m9
40763 m5
40773 m4
40777 m7_m6_m0
40823 m4
40825 m6_m2_m3_m7
40842 m5_m8
40847 m3
40849 m3
40850 m3
40876 m5
40886 m3
40895 m4_m7_m0
40900 m1
40926 m3_m6
40942 m1
40942 m4
40968 m3_m4
40973 m3
40981 m6_m9
40996 m3
41018 m3
41022 m3
41040 m2
41074 m3_m4
41078 m4_m5
41081 m8_m10
41081 m6
41142 m3_m4_m6
41149 m1_m2_

 49%|âââââ     | 46179/93834 [00:03<00:02, 17024.28it/s]

44074 m3
44076 m3
44080 m4
44080 m2_m3
44083 m4_m5
44083 m3
44085 m2
44091 m3
44091 m6
44091 m4_m5
44093 m6
44093 m4_m5
44093 m3
44094 m3_m5
44111 m5_m6_m0
44121 m5_m6
44170 m5
44189 m4
44192 m4
44194 m4
44198 m4
44207 m3_m4_m5_m8
44316 m4
44317 m4
44318 m4
44319 m4
44320 m1_m5
44330 m2
44375 m3
44383 m4_m5_m6
44398 m4_m5_m6_m8_m9
44401 m1_m11
44403 m3_m4_m6
44406 m5
44409 m3_m7_m4
44423 m2_m5_m3_m6
44423 m1
44504 m1
44513 m7_m11_m12_m13
44513 m3_m9_m4_m5_m6
44521 m2_m8_m9
44679 m4_m5_m8
44690 m5_m6
44719 m3_m5
44733 m4
44734 m4
44735 m1_m2_m7_m3_m8
44735 m6
44736 m1
44738 m4_m3
44744 m4
44778 m2
44788 m3
44798 m1_m2_m3_m5
44804 m5_m6
44806 m4_m5
44808 m4_m5
44809 m5
44809 m1
44811 m7_m8
44825 m2_m3_m7
44830 m1_m2_m3_m6_m4_m7
44837 m1_m2_m4_m3_m5
44838 m1_m2_m3
44841 m5_m7_m6
44853 m1_m2_m4_m3_m5
44861 m1_m2_m4_m3_m5
44865 m1_m2_m4_m3_m5
44868 m1_m2_m4_m3_m5
44872 m4_m5
44909 m3_m7
44910 m2_m3
44924 m2
44953 m3
44953 m4
44953 m2
44960 m4
44960 m3
44963 m2_m3_m5_m4
44982 m4_m5
44988 m4


 53%|ââââââ    | 49879/93834 [00:03<00:02, 16909.70it/s]

47602 m1
47604 m1
47605 m3
47606 m3
47607 m3
47612 m2_m3_m4
47612 m6
47613 m5_m6
47664 m1
47665 m1
47666 m1
47667 m1
47668 m1
47669 m2
47677 m1
47715 m5
47715 m4
47718 m4
47720 m4
47722 m4
47722 m5
47761 m4
47761 m5
47804 m5
47812 m2
47819 m5_m7
47824 m4
47826 m5
47826 m4
47833 m2_m3
47834 m2_m3
47847 m3
47852 m0_m7
47852 m2
47853 m1
47876 m3_m4
47888 m4
47902 m1
47924 m2_m3
47925 m6
47927 m2_m3
47928 m6
47929 m1
47931 m4
47933 m4
47934 m7
47940 m6
47941 m1
47942 m4
47947 m4
47950 m3
47953 m1
47961 m5
47963 m4
48021 m3
48025 m2_m3_m6
48027 m3
48044 m3
48066 m1_m2_m5_m3_m6
48072 m4
48078 m4
48094 m1_m2_m6
48113 m4
48139 m1_m2_m5_m3_m6
48165 m1_m2_m3_m5
48205 m4
48210 m5
48212 m5
48214 m6
48215 m6
48216 m6
48216 m5_m9
48220 m5
48222 m3
48223 m6
48227 m4
48229 m4
48242 m5
48245 m4
48248 m5
48250 m5
48261 m4_m8
48269 m5
48272 m4
48312 m5
48327 m3
48328 m3
48344 m8_m9
48348 m3
48353 m6_m9
48359 m6
48377 m4_m5_m6
48415 m3
48439 m2
48456 m2_m3
48480 m3
48486 m4
48486 m3
48502 m2_m3
48507 m2_m

 57%|ââââââ    | 53562/93834 [00:03<00:02, 17301.17it/s]


51713 m4
51754 m8
51774 m7_m2_m8
51797 m4_m8
51819 m1_m2_m7_m3_m8
51832 m2
51833 m5
51841 m1
51845 m3
51847 m4
51848 m2
51898 m5
51906 m4
51911 m4
51960 m1
51960 m7
51961 m1
51970 m4_m5
51971 m3_m4_m5
51981 m2
51984 m8
51986 m5
51991 m3
51992 m3
51994 m1
51995 m3
51997 m8
52003 m6
52006 m4
52007 m4
52022 m2
52030 m5
52030 m1
52044 m4
52055 m5
52060 m4
52071 m5
52083 m1_m2_m6_m3_m4_m5
52109 m2
52110 m5
52111 m2_m3_m4
52117 m5
52117 m1
52126 m4
52136 m4_m5
52202 m3
52209 m6
52215 m5
52223 m5
52236 m5
52242 m5
52246 m5
52257 m5
52260 m4
52264 m3
52268 m3_m4_m5
52272 m3_m4_m5
52278 m4
52288 m3_m5
52332 m3
52337 m4
52342 m2
52425 m1_m2_m5
52440 m3
52499 m2_m3_m4_m5
52538 m1
52541 m4
52555 m0_m1
52566 m3_m7
52572 m3
52573 m1
52594 m2
52606 m3
52607 m3
52609 m4
52645 m2
52698 m3
52699 m3
52700 m3
52701 m3
52702 m4
52703 m4
52705 m3
52705 m2
52734 m6_m7_m9
52734 m1
52745 m1
52746 m4
52746 m6
52750 m1
52751 m4_m5
52776 m1_m5_m2
52784 m1_m2_m3_m4
52785 m4_m6
52786 m4_m6
52787 m2_m3_m5_m4
52790 

 59%|ââââââ    | 55302/93834 [00:03<00:02, 13608.78it/s]

m1
54406 m1
54408 m1
54410 m1
54412 m1
54413 m1
54413 m3
54415 m1
54417 m1
54418 m3
54419 m1
54420 m1
54421 m1
54422 m1
54423 m1
54425 m1
54427 m1
54429 m1
54433 m1
54436 m1
54439 m2
54441 m2
54444 m1
54447 m1
54447 m3
54450 m1
54452 m1
54453 m1
54459 m6_m1_m2_m3_m4_m5
54465 m6
54467 m4
54471 m7
54472 m6
54481 m2_m3_m6
54504 m4_m5_m7
54504 m2_m3_m8
54508 m4_m5_m7
54523 m2_m3
54528 m5
54531 m3_m4_m5_m9
54540 m1
54541 m1
54541 m5
54564 m4
54571 m3
54592 m4_m5
54597 m3
54607 m4
54608 m4
54610 m5
54638 m3
54652 m5_m4_m0
54687 m5_m6
54691 m1_m5_m2_m6
54694 m1_m4
54732 m4
54735 m5
54736 m2
54736 m6
54736 m4
54737 m2
54776 m6_m4
54777 m3
54778 m5
54779 m1_m2_m3
54786 m3
54788 m2
54793 m1
54795 m5_m6
54800 m2
54803 m3_m4_m8
54803 m6
54804 m3
54807 m6
54809 m4
54814 m3
54826 m2_m7
54841 m7
54844 m5_m11
54857 m4_m5
54859 m3
54863 m4
54863 m3
54865 m3
54865 m4
54867 m4
54868 m2
54882 m5_m6
54885 m7
54885 m3
54887 m1_m2_m3_m5
54891 m3_m7_m4_m5
54900 m1_m2_m3_m4_m5_m6
54905 m3_m4
54906 m2_m3_m6
549

 61%|ââââââ    | 56786/93834 [00:03<00:03, 11471.71it/s]


55899 m3
55899 m4
55906 m4
55943 m3
55945 m4_m6
55958 m3
55972 m3
55994 m3
55999 m1
56022 m3
56065 m4
56085 m5_m8
56101 m2
56102 m2
56130 m1
56137 m2_m3
56138 m2_m3
56150 m3
56156 m4_m5
56157 m4
56167 m7
56167 m6
56173 m4
56181 m5
56182 m5
56182 m7
56184 m3_m4_m5
56184 m6
56189 m3_m4
56204 m4
56207 m3_m5
56208 m3
56212 m1_m2_m4
56214 m1_m2_m4
56220 m2_m3_m4_m6
56226 m3_m4_m5_m6
56227 m4_m5_m6_m8
56230 m4_m5
56232 m6
56256 m6_m1_m2_m3
56284 m3
56286 m12
56286 m10
56287 m6
56333 m7
56349 m6_m7_m9
56349 m4_m5_m10
56360 m4_m5_m6
56369 m2
56530 m4_m5_m6
56530 m3
56531 m3
56532 m3
56533 m3
56534 m2
56540 m6
56540 m5
56569 m4
56589 m1_m2_m3_m4_m5
56591 m1_m6_m2_m3_m7
56620 m2
56658 m5
56661 m5
56670 m2
56683 m4
56691 m1_m2_m6
56691 m4_m5
56694 m4
56695 m2
56696 m2
56697 m2
56698 m2
56700 m4
56702 m2
56703 m4
56704 m2
56705 m2
56706 m2
56707 m6
56707 m7
56707 m2
56707 m9
56707 m4
56707 m1
56707 m3
56765 m1_m4_m2
56768 m4_m5_m6
56768 m7
56769 m5_m7
56770 m2
56770 m1
56770 m6
56773 m1
56774 m1


 66%|âââââââ   | 61756/93834 [00:04<00:02, 11201.98it/s]

57985 m6
57986 m3
57987 m5
57989 m3
57994 m3
57996 m3
57998 m3
57999 m5
58001 m3
58063 m3_m4_m8
58065 m1
58136 m0_m5_m6
58141 m9
58141 m10
58141 m1
58141 m2
58142 m6
58142 m7
58142 m8
58143 m3
58183 m1_m2_m3_m6
58211 m5
58211 m1_m2_m6_m3
58214 m3
58249 m3
58261 m5
58286 m5
58300 m3
58308 m5
58321 m1_m2_m6
58321 m5
58363 m3
58505 m3
58511 m1_m2_m4_m5
58513 m1_m2_m7_m3_m4
58514 m3_m9
58516 m5
58535 m3
58536 m3
58537 m3
58538 m3
58539 m3
58540 m3
58541 m3
58542 m3
58543 m3
58544 m3
58545 m3
58546 m3
58547 m3
58548 m4
58549 m3_m4_m5
58568 m5
58568 m6
58568 m9
58570 m10
58570 m9
58570 m6
58570 m5
58572 m3_m4_m8
58576 m3
58579 m4_m7_m10
58579 m3
58584 m4
58584 m5_m6_m9
58614 m4
58615 m6
58616 m4
58617 m4
58620 m6
58622 m4
58623 m4
58674 m4
58677 m4
58692 m1_m4_m2
58694 m4
58707 m3
58721 m3_m4
58723 m5
58723 m1_m2_m3_m4
58725 m3
58741 m4
58745 m3
58746 m5
58751 m4
58753 m4
58753 m5
58826 m4
58837 m4
58837 m3
58850 m6
58850 m7
58851 m6
58851 m2_m3_m4
58853 m3
58899 m4
58944 m1_m2_m3_m4
58969 m

 70%|âââââââ   | 65848/93834 [00:04<00:01, 14964.97it/s]

 m4
61812 m5
61812 m3
61813 m6_m9
61814 m5_m8
61842 m3
61848 m3
61852 m3
61853 m4
61854 m4
61856 m4
61858 m5
61861 m3
61870 m5
61871 m3
61873 m7
61873 m5
61887 m6
61894 m6_m2_m3_m4
61895 m2_m3_m7
61897 m2
61915 m1_m4_m2
61930 m6
61976 m6
61992 m3
61994 m4
61995 m4
62012 m3
62023 m5
62027 m1
62027 m7
62028 m2
62030 m3
62071 m4
62075 m3
62081 m5
62083 m4_m5
62096 m5
62108 m4
62112 m5
62159 m4_m9_m5_m6
62186 m2_m3
62186 m4
62189 m2
62192 m0_m4
62204 m3
62220 m1_m2_m5_m3
62243 m3
62255 m0_m4
62268 m2_m3
62269 m5
62273 m3_m4
62273 m5
62291 m4
62293 m5
62301 m2_m3
62313 m6
62332 m5
62332 m4
62332 m8
62332 m1_m2
62332 m3
62339 m4
62340 m4
62341 m4
62342 m4
62354 m6
62354 m5
62354 m4
62355 m2_m3
62366 m2
62385 m5
62385 m4
62386 m3
62405 m3_m5
62408 m5
62408 m4
62409 m3
62424 m3
62426 m1
62430 m4
62481 m4
62510 m2
62558 m5
62624 m1_m2_m3_m5
62625 m1_m2_m3_m4_m6
62627 m4
62628 m3
62629 m4
62630 m6
62637 m5_m7
62640 m2
62682 m2
62709 m5
62735 m1_m2_m3_m4
62740 m1_m2_m3_m4
62762 m4
62771 m4
62784 

 74%|ââââââââ  | 69511/93834 [00:04<00:01, 16438.11it/s]

 m1_m2_m3
66133 m4
66144 m4
66151 m4
66195 m1_m2_m3
66286 m4
66328 m3
66331 m2
66333 m3
66335 m2
66336 m5
66337 m4
66337 m3
66337 m6
66338 m2
66338 m5
66338 m6
66339 m6
66339 m2
66339 m5
66340 m3
66341 m2
66341 m4
66342 m4
66342 m3
66343 m3
66345 m3
66351 m5
66351 m2
66352 m5
66352 m4
66353 m6
66353 m5
66353 m2
66356 m4
66357 m4
66361 m4
66377 m3_m7
66378 m5
66378 m3_m8
66388 m4
66417 m3
66419 m3
66434 m5
66456 m5
66471 m3
66474 m3
66490 m4_m5_m8_m6
66555 m5
66555 m4
66556 m3_m5_m6
66561 m4
66563 m4
66567 m4
66570 m4
66571 m3
66572 m3
66573 m4
66586 m4
66592 m6
66597 m4
66601 m1_m2_m3
66604 m1_m2
66609 m5
66611 m4
66613 m1
66614 m1
66617 m5
66617 m1
66618 m7
66621 m5
66621 m6
66631 m4_m5
66634 m3_m4
66663 m2
66689 m3
66689 m6_m7_m8
66691 m2
66758 m4_m5
66759 m5_m6
66759 m2
66773 m1_m2_m3
66775 m5_m6
66778 m2
66779 m3
66780 m4
66782 m2
66784 m5_m6
66784 m3
66785 m5
66785 m4
66787 m2
66787 m4_m5
66788 m4
66796 m5_m6
66798 m5
66801 m3
66826 m4
66828 m4
66829 m4
66835 m4
66836 m5
66837 m1_

 78%|ââââââââ  | 73055/93834 [00:05<00:01, 16580.47it/s]

 m3
70094 m2
70099 m3
70099 m1
70145 m6
70146 m5
70251 m4
70252 m3_m4
70269 m2
70269 m3
70270 m4
70290 m4
70294 m1_m2_m7_m3_m4_m8
70295 m1
70296 m4
70296 m6_m1_m2_m3_m5
70301 m4
70303 m1_m2_m3_m4_m5_m8
70310 m6
70310 m7
70310 m8
70315 m3_m4_m6
70318 m1_m2_m3_m4_m7
70330 m1_m2_m3_m4_m6
70344 m5
70392 m1_m2_m6_m3_m4_m5
70429 m6_m1
70438 m4
70480 m4
70492 m1_m2_m3
70494 m1_m2_m3
70495 m1_m2_m3_m4
70497 m1_m2_m3
70522 m4
70525 m2_m3
70563 m2
70573 m3_m4_m6
70589 m2
70596 m2
70610 m4
70621 m3
70624 m3
70630 m5
70632 m5
70635 m4
70636 m4
70638 m4
70640 m3
70641 m3
70642 m3
70645 m3
70646 m3
70647 m2_m3
70649 m5
70657 m3
70668 m4
70670 m4
70707 m4_m6
70711 m4_m6
70714 m4_m6
70756 m6_m9
70757 m4
70758 m4
70765 m4
70766 m4
70770 m6
70790 m5_m8
70799 m5
70801 m4
70832 m4_m5_m11
70832 m8_m9
70836 m3
70893 m2
70898 m2
70901 m2
70909 m6
70914 m1_m2_m3_m4
70914 m6
70914 m5
70916 m6
70916 m1_m2_m3_m4
70916 m5
70919 m4
70924 m5
70992 m4
71007 m1_m2_m5_m3_m6
71009 m5
71019 m5
71022 m4
71031 m3
71034 m3

 80%|ââââââââ  | 74783/93834 [00:05<00:01, 15709.39it/s]

 m4
73341 m6
73352 m3
73357 m5
73362 m5
73370 m5
73376 m8
73377 m3
73379 m3
73380 m5
73380 m3
73388 m1_m3_m2_m4
73390 m7
73396 m2
73396 m4
73401 m4
73404 m7
73414 m3
73415 m2
73416 m2
73418 m1_m2_m4_m3_m5_m6
73425 m4
73428 m2_m3_m4_m5_m7_m6
73435 m6
73435 m5
73448 m3_m4
73459 m4_m5
73464 m1_m2_m3_m5
73471 m1_m2_m3_m5
73486 m6_m7
73486 m1
73562 m4
73571 m3_m4_m6
73625 m6
73629 m6
73630 m5
73633 m5
73638 m3
73645 m4
73651 m3
73654 m3_m4
73655 m7_m6_m0
73662 m8_m7_m0
73664 m4
73665 m2
73666 m9_m6_m0
73670 m1_m2_m9
73673 m1_m2_m5
73674 m1_m2_m7
73684 m1_m7
73690 m1_m8
73690 m4
73703 m2_m5
73704 m6
73708 m0_m1_m3
73709 m1_m5_m2
73711 m13
73711 m6_m7_m11_m12_m8_m9
73724 m3
73724 m2
73756 m2
73756 m4
73760 m2
73761 m2
73762 m2
73763 m2
73764 m2
73765 m2
73766 m2
73767 m3
73768 m3
73769 m3
73770 m2
73780 m2
73784 m4
73785 m4
73793 m4
73829 m4
73836 m3_m4
73837 m3_m5
73852 m4_m5_m6_m8
73866 m4
73869 m12_m2_m3_m4
73870 m12_m2_m3_m4
73871 m12_m2_m3_m4
73889 m4_m3
73897 m2
73900 m1
73902 m4_m3
739

 83%|âââââââââ | 78237/93834 [00:05<00:00, 16233.26it/s]

 m4
76072 m5
76083 m3
76092 m3
76092 m5
76106 m0_m3
76115 m7
76116 m2_m3
76121 m4
76183 m3
76186 m3
76217 m3
76222 m3
76225 m3
76226 m2_m3_m5
76228 m3
76230 m4
76232 m5
76232 m6_m0_m1
76235 m6_m0_m1
76235 m5
76287 m4
76289 m5
76294 m3
76354 m5
76374 m4
76374 m2
76375 m1_m5
76376 m1_m5
76377 m1_m2_m3
76390 m1_m2_m3_m4_m6
76394 m4_m5
76395 m4_m5
76406 m1
76434 m3
76440 m4
76459 m6
76468 m6
76472 m4
76473 m2
76511 m2_m3
76520 m2_m3
76575 m3
76589 m4
76589 m1
76591 m2
76629 m4
76634 m3_m4
76699 m4_m5
76718 m3
76740 m6
76770 m1_m2_m4
76771 m1_m2_m4_m3_m5
76773 m6_m7_m8
76773 m5_m12
76802 m5_m6
76815 m4_m5_m6
76830 m4
76830 m2_m3
76836 m4
76865 m4_m5
76865 m2_m3_m8
76874 m2_m3_m6
76915 m3
76920 m4_m11
76920 m7_m8
76923 m1_m2_m11_m3_m8
76924 m1_m2_m11_m3_m8
76934 m4
76942 m1_m2_m11_m3_m8
76947 m3
76983 m2_m3_m4
77015 m1_m2_m9
77120 m3
77202 m0_m1_m2_m3
77213 m4
77223 m3
77230 m7
77230 m6
77231 m4
77242 m5
77248 m6
77255 m5
77262 m2
77272 m4
77291 m7_m6
77293 m7_m6
77313 m6_m7
77318 m1_m2_m6_m

 88%|âââââââââ | 82256/93834 [00:05<00:00, 17331.10it/s]


80357 m1
80400 m2
80430 m3_m4_m7
80430 m5
80431 m3_m4
80447 m5
80470 m4
80470 m1_m5
80479 m3
80498 m1_m2_m3_m4_m7
80513 m4
80516 m5
80520 m5
80521 m2
80523 m5
80530 m5
80530 m4
80534 m5
80535 m3
80537 m4
80540 m2
80543 m4
80544 m4
80545 m4
80546 m4
80547 m4
80553 m1
80553 m3
80560 m6
80562 m1_m4_m2
80573 m2
80575 m4
80577 m5
80578 m7
80579 m3
80579 m2
80580 m7
80580 m5
80583 m6
80583 m4
80583 m5
80584 m6
80584 m4
80584 m5
80585 m5
80585 m4
80585 m6
80586 m5
80586 m4
80586 m3
80587 m4
80587 m6
80587 m5
80588 m5
80588 m4
80589 m3
80589 m4
80589 m2
80590 m6
80590 m4
80590 m5
80594 m5
80594 m3
80594 m4
80595 m5
80595 m4
80595 m3
80596 m4
80596 m5
80596 m6
80608 m6
80608 m4
80608 m5
80608 m3
80612 m2
80700 m5
80701 m5
80704 m3
80705 m2
80715 m4_m5_m6
80727 m1_m2_m3_m4
80751 m2
80751 m3
80752 m2
80772 m5_m8
80774 m0_m5_m4
80777 m2
80785 m2
80789 m4
80790 m2
80793 m3
80809 m2_m3_m7_m4_m5_m8
80814 m4
80824 m1_m2_m4_m3
80826 m1_m2_m5
80832 m1_m2_m3_m5
80841 m2
80913 m1_m6_m5_m2_m3
80913 m4
809

 91%|ââââââââââ| 85765/93834 [00:05<00:00, 17071.42it/s]

 m2
83699 m4
83717 m3_m4
83726 m5_m6
83737 m3_m5
83739 m6_m7
83742 m5_m6
83777 m3
83777 m1_m6_m2_m7
83880 m4
83882 m3
83896 m3
83898 m4
83902 m3_m4
83906 m3
83931 m2
83931 m1
83932 m1
83932 m2
83933 m3
83934 m3
83935 m3
83938 m5
83938 m6
83939 m5
83948 m6_m8
83972 m4_m8_m5
83972 m2_m7_m3
83973 m2
83977 m1_m2_m5
83979 m4
83979 m3
83985 m4_m8
83996 m2_m3_m5
84010 m4
84017 m4
84033 m3
84033 m7
84034 m3
84037 m7
84037 m3
84038 m2
84038 m5
84038 m4
84046 m2_m3_m5
84066 m4
84068 m5
84072 m4
84092 m5
84092 m1_m2_m4
84099 m4_m10
84101 m7
84101 m8
84109 m1_m2_m5_m3_m6
84127 m1_m2_m5_m3_m6
84133 m4
84139 m4
84155 m1_m2_m7
84174 m4
84200 m1_m2_m5_m3_m6
84214 m4
84227 m3_m6_m7_m4
84229 m1_m2_m3_m5
84283 m5
84291 m5
84304 m5
84310 m5
84317 m5
84329 m5
84342 m5
84345 m5
84358 m6_m4
84367 m4
84372 m4
84380 m4_m5
84403 m3
84405 m3
84504 m1
84506 m1
84536 m9_m10
84553 m1_m2_m3
84556 m6
84573 m1_m2_m3
84574 m1_m2_m3
84576 m1_m2_m3
84577 m1_m2
84593 m1_m5
84606 m1_m2_m3
84607 m1_m2_m3
84610 m1_m2_m3_m4_m

 95%|ââââââââââ| 89548/93834 [00:06<00:00, 16948.80it/s]

 m2_m3
87661 m2_m3
87663 m2_m3
87664 m2_m3
87671 m2_m3
87678 m6
87681 m1
87682 m1
87682 m6
87683 m1
87685 m2
87686 m2
87711 m5
87749 m3
87754 m3
87755 m3
87785 m1_m2_m3_m11_m4_m5_m10_m9
87785 m6_m7
87787 m4
87787 m6
87794 m6
87817 m1_m2_m4
87819 m1_m2_m3_m5
87843 m4
87845 m10_m4_m5
87847 m10_m4_m5
87851 m6
87916 m3_m7_m4_m5_m6_m9
87922 m4
87937 m4
87960 m2
87961 m2
87962 m2
87964 m2
87965 m2
87966 m2
87967 m1
87970 m1_m2_m3_m4_m5
87996 m7_m5
88058 m2_m3_m4_m5_m10
88073 m2_m3
88118 m5
88123 m5
88128 m4
88148 m5_m0
88155 m1_m4_m2_m3_m5
88157 m2_m6
88170 m3
88194 m3
88213 m5
88213 m6
88220 m3
88221 m3
88222 m3
88223 m3
88227 m2_m3
88228 m3
88236 m3
88288 m3_m10
88288 m4_m5_m6
88288 m7
88297 m3
88314 m3_m4_m7
88318 m4
88329 m6
88330 m4_m5
88333 m1
88357 m4
88369 m1
88385 m3_m4_m6
88412 m4
88426 m1_m2_m4_m3_m5
88435 m1_m2_m4_m3_m5
88456 m1_m2_m5_m3_m6
88461 m2_m3_m5
88462 m2_m3_m10
88462 m5_m6_m9
88464 m8
88465 m1_m2_m5
88468 m1_m2_m3_m4_m6_m7
88469 m3
88477 m3
88480 m3
88481 m2_m3_m5
88482

100%|ââââââââââ| 93834/93834 [00:06<00:00, 14978.73it/s]

m5
91222 m5
91226 m0_m4
91229 m3
91245 m1_m4_m2
91248 m3_m4
91249 m2
91251 m3
91257 m2_m3_m7_m4
91264 m2_m3_m4
91268 m2_m3_m4
91276 m3
91279 m3
91281 m3
91283 m3
91284 m3_m8_m4
91302 m1_m2_m6
91331 m3_m4_m5_m6
91353 m5
91353 m2
91366 m4
91431 m4
91432 m4
91434 m4
91435 m3
91436 m4
91467 m4
91468 m4
91470 m4
91471 m3
91473 m4
91499 m3
91506 m2_m3_m5_m4
91509 m5
91510 m6
91511 m1
91565 m1_m6
91567 m1_m6
91567 m3_m4
91569 m3_m4
91569 m1_m6
91571 m1_m2_m4_m3_m5
91572 m3_m4
91572 m1_m5
91576 m6
91580 m6
91581 m5
91584 m5
91589 m3
91596 m4
91602 m1_m2_m6
91602 m3
91606 m7_m6_m0
91613 m8_m7_m0
91615 m4
91616 m2
91617 m9_m6_m0
91623 m1_m2_m5
91624 m1_m2_m7
91634 m1_m7
91640 m4
91640 m1_m8
91653 m2_m5
91654 m6
91658 m0_m1_m3
91659 m1_m5_m2
91661 m13
91661 m6_m7_m11_m12_m8_m9
91674 m2
91674 m3
91723 m4
91724 m3
91738 m1_m2_m3
91755 m4
91760 m1_m2
91768 m6_m7_m9
91768 m4
91794 m2_m6_m3
91810 m4
91814 m4
91818 m4
91821 m4
91823 m4
91835 m1
91836 m1
91837 m1
91838 m1
91841 m3_m4
91841 m6
91843 m6
9




In [36]:
for i in products_all:
    if i:
        print(i)

In [3]:
data.name

'uspto-grants-2016'

In [4]:
res = [field.name for field in data.DESCRIPTOR.fields]
res
# can do data.name to get the year, e.g. data.name returns 'uspto-grants-2016'
# can do data.dataset_id to get the ord file name, e.g. 'ord_dataset-026684a62f91469db49c7767d16c39fb'

['name', 'description', 'reactions', 'reaction_ids', 'dataset_id']

In [7]:
1+1

2

## Tests: Figure out how to access the info I need in the dataset file

In [62]:
data.reactions[3].outcomes[0].products[0].identifiers[1].value

'NC=1C(C=C(C(C1)=N)OCCCC)=NC1=CC=C(C=C1)N(CCO)CCO'

In [70]:
#91434 m4
data.reactions[91434].inputs['m4']

components {
  identifiers {
    type: NAME
    value: "Hexanes EtOAc"
  }
  amount {
    moles {
      value: 0.0
      precision: 1.0
      units: MOLE
    }
  }
  reaction_role: REACTANT
}

In [None]:
# structure
# inputs -> m1, m2, m3 ...
# conditions -> temperature, ...
# notes
# workups
# outcomes -> reactants, yield?


In [24]:
rxn.inputs['m1'].components[0].reaction_role

1

In [34]:
rxn.inputs['m1'].components[1].identifiers[0].value

'ethyl 4-bromo-1-(3,4-difluorophenyl)-1H-pyrazole-3-carboxylate'

In [6]:
rxn.conditions.temperature.control.type

2

# Preprocessing of USPTO - Molecular AI

In [1]:
# Running code from:
# https://molecularai.github.io/reaction_utils/uspto.html

# From within the folder containing all the USPTO data

# First run:
# conda activate <rxnutilities>
#  python -m rxnutils.data.uspto.preparation_pipeline run --nbatches 200  --max-workers 8 --max-num-splits 200

# Then I was supposed to run:
# conda activate rxnmapper
# python -m rxnutils.data.mapping_pipeline run --data-prefix uspto --nbatches 200  --max-workers 8 --max-num-splits 200
# But that didn't work, so I just ran the first part 
# Even after I replaced the delimiter (from 	 to , it still failed ). I'll just give up lol

## Read in data cleaned by rxn utils

In [1]:
import pandas as pd

In [2]:
cleaned_USPTO = pd.read_csv('/Users/dsw46/USPTO_data/uspto_data_cleaned.csv', sep = '	')
cleaned_USPTO.shape

(3740596, 7)

In [3]:
cleaned_USPTO['ReactionSmilesClean'][0]

'OCCBr.CCS(=O)(=O)Cl.CCOCC.CCN(CC)CC>>CCS(=O)(=O)OCCBr'

In [4]:
#full_USPTO = pd.read_csv('/Users/dsw46/USPTO_data/uspto_data.csv', sep = '	')
#full_USPTO.shape