<h2>A Data Driven Approach to Predicting Chemical Reaction Kinetics: Data Preparation</h2>
<br>
Maneet Goyal<sup>1</sup>, Keren Zhang<sup>2</sup>
<br>
<i><sup>1</sup>School of Civil and Environmental Engineering, <sup>2</sup>School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA</i>
<hr>

In [1]:
# Displaying Old Proposal PDF
from IPython.display import IFrame
IFrame("Proposals/OldProposal.pdf", width=1000, height=500)

<h3>Response to Proposal Review</h3>

<u>Reviewer Comments: </u>The proposed project is certainly related to chemical engineering, and the dataset is far too large and complex for traditional tools, so this is an appropriate project idea. The fact that the team has already scraped the data is commendable, as is the idea of ultimately delivering a Python package. However, the goals are very vague, and should be significantly clarified. I suggest making major revisions to the proposal before moving forward with the project. Specifically, it is not clear to me <u>what the inputs/outputs of the proposed algorithm will be</u>, and <u>how this relates to existing work</u>. 

The proposal indicates that half of “goal 1” is already achieved, so I would focus on the “feature vector” part of this goal. The process of identifying feature vectors for chemical reactions is far from trivial. The proposal glosses over this complexity, but determining <u>an appropriate representation for the molecular inputs</u> may be the most challenging part of the project. <u>Will the reactants be cross-referenced with PubChem to extract information about the bonds that are formed/broken?</u> <u>How will radical species be represented?</u> <u>How will reactions with 2 reactants/products be compared to reactions with 1 or 3 reactants/products?</u> These questions are critical and should be explored further in the proposal goals.

The scope of the outputs should be narrowed, given the challenges with defining the inputs. I recommend focusing on activation energies, since this implicitly defines “feasibility”. Predicting activation energies is also extremely useful. The authors should review the work of Bill Greene at MIT, and the RMG code, which uses some physically-inspired models to predict reaction barriers. This provides a good benchmark for what constitutes “good” performance from a new approach, and a high level of success could be defined as a model that out-performs the existing approach.

<u>Response</u>

<ol>
<li>The input to the model (at the time of prediction) will be a feature vector depicting a reaction under consideration. Output will be reaction order and activation energy. Prime focus is on reaction order since we have more data for the same. More than 55% of our reaction records don't have any reported activation energy.</li>

Here's a more complete picture: we have around 28000 reactions. Corresponding to these 28000 reactions, there are about 65000 records in total. Each record has reaction order, activation enery, Arrhenius rate law constants, reaction temperature, etc. stored in it which were reported in some or the other work/paper. Around 37000 of these records don't have activation enery reported. On the other hand, only around 1000 records have reaction order missing.

<li>We are currently looking into leveraging past work to aid our analysis. We have take a note of Bill Greene's work and will refer to it as necessary.</li>

<li>Molecular representation will be done with a feature vector containing elements like the number of following entities (functional groups): C-H, C=H, C#H, C=C, -OH, -CHO, -COOH, -Cl, -Br, -Fl, etc. We plan to use around 20-30 such entities including the reaction temperature.</li>

<li>We have used the ChemSpider Web API to query individual reactants. Around 35% of the reactants weren't found in their database. Of those found, some had multiple structures reported. The reactions that weren't found will be cleaned using OpenRefine and again queried using the same API. We will look into PubChem API also to query these reactants to extract some more relevant elements to complete our features. The reactants for which multiple structures are reported will be scanned through to select the most relevant structure. Here, structure implies SMILE representation. In the end, whichever reactants have fully developed feature vectors will be mapped to their reactions and only those reactions will be used for training.</li>

<li>Currently, we plan to use a binary element in the feature to reprsent whther the reactant is a radical or not.</li>

<li>The feature vector for the reaction will be formed by appending the feature vectors of the involved reactants and the normalized reaction temperature. The first half of the reactions' feature vector will correspond to the reactant of higher molecular weight just to ensure uniformity. For 3-reactant reactions, the features vectors of the 2 lowest molecular weight compounds will be summed and then appended to the feature vector of the first element. If the sum of the molecular weights of these 2 reactants is more than that of the third compound, the corresponding feature vector will occupy the first half of the reaction feature vector.</li>

</ol>

<h3>Revised Proposal</h3>

In [2]:
# Displaying Revised Proposal PDF
from IPython.display import IFrame
IFrame("Proposals/RevisedProposal.pdf", width=1000, height=500)

<h3>HTML Parsing and Web Scraping</h3>

In [3]:
# Importing our Source File for HTML Parsing
import htmlparser as hp # Our Source File. Open the file in a code editor for viewing the completing implementation.

<h4>Creating an in-memory table from the input HTML file</h4>

In [4]:
myTableCreator = hp.TableCreator("ReactionHTMLFile/NIST Chemical Kinetics Database.html")

----Status----
HTML File read in memory.
----Status----
Soup created out of the HTML file
----Status----
Done


<p style="color:magenta;">*Soup, specific to BeautifulSoup is a alternate representation of the HTML data that makes it easy for us to filter out the required data.</p>

<h4>Creating an output ".tsv" file after reading all the reactions</h4>

In [5]:
myTableCreator.extrct_rxn_to_txt("PreliminaryOutput/DemoGenerated/reactions.tsv")

----Status----
All reactions written into the output .tsv file.


<h4>Read HREFs from the Input ".tsv" file path into a Pandas DataFrame</h4>

In [6]:
myRxnExtrator = hp.RxnDetailsExtractor("PreliminaryOutput/DemoGenerated/reactions.tsv")
print('\n----Here''s how our dataframe looks----')
print(myRxnExtrator.reactions_df.head(10))
print('\n----No. of Rows---')
print(len(myRxnExtrator.reactions_df.index))

----Status----
Reactions '.tsv' file read into a Pandas DataFrame.

----Heres how our dataframe looks----
                                         Reaction Link  Records
RID                                                            
1    http://kinetics.nist.gov/kinetics/ReactionSear...        1
2    http://kinetics.nist.gov/kinetics/ReactionSear...        1
3    http://kinetics.nist.gov/kinetics/ReactionSear...        1
4    http://kinetics.nist.gov/kinetics/ReactionSear...        1
5    http://kinetics.nist.gov/kinetics/ReactionSear...        1
6    http://kinetics.nist.gov/kinetics/ReactionSear...        1
7    http://kinetics.nist.gov/kinetics/ReactionSear...        1
8    http://kinetics.nist.gov/kinetics/ReactionSear...        1
9    http://kinetics.nist.gov/kinetics/ReactionSear...        1
10   http://kinetics.nist.gov/kinetics/ReactionSear...        1

----No. of Rows---
28983


<h4>`Scraping data` from the url suplied by the DataFrame above into TSV files.</h4>
<h4 style="color: red;">We suggest terminating the below code as soon as you are convinced that it works. On our system, the entire scraping was done in around 2 hours.</h4>
<h4 style="color: green;">The records.tsv and ref_reaction.tsv files were generated using BeautifulSoup and is given with the code. You don't need to generate it.</h4>
<h4>Here, records.tsv will contain all the data pertaining to a reaction order, for e.g., reaction order, activation enery, temperature, etc. "ref_reaction.tsv", on the other hand, will contain information on reactive whose kinetics was studied with respect to some other reactions. These reactions will not be included in our analysis.</h4>
<p>The code should start running and printing log instantly. In case the code takes a lot of time and still doesn't print anything, check whether the NIST Server is responding by going to http://kinetics.nist.gov/.</p>

In [7]:
# myRxnExtrator.extrct_rec_to_tsv("PreliminaryOutput/DemoGenerated/records.tsv", "PreliminaryOutput/DemoGenerated/ref_reaction.tsv")

<hr>
<h4 style="color: blue;">Beyond this point, we manually dealt with the records<u>.tsv</u> to fix some formatting issues that accompanied the scraping process. The cleaned version of the file is provided in <u>'CleanedData'</u> folder in a <u>.xlsx</u> format.</h4>
<p>If the NIST Database is updated in the future, the researchers may need to re-run the scraping process. As regards cleaning, look for column offsets. For instance, due to delimiters, sometimes non-numeric entries may end up in reaction order/activation energy columns. Look for such offsets and resolve them. Resolving these offsets was the main data cleaning task of this data. </p>
<hr>
[OpenRefine is a good tool to look at.](http://openrefine.org/)
<hr>

In [8]:
# The cleaned records were then stored as a Dataframe
myRxnExtrator.send_records_to_hdf(records_file="CleanedData/records.xlsx", dataframe_key="Records", output_hdf="PreliminaryOutput/DemoGenerated/DataDF.h5")

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->block2_values] [items->['RecordType', 'Squib', 'PaperDetails', 'Temperature', 'FrequencyFactor', 'RateConstant']]

  exec(code_obj, self.user_global_ns, self.user_ns)


<p>This is how our records dataframe looks:</p>

In [9]:
import species as sp
sp.Populator.print_from_hdf5(hdf5_store="PreliminaryOutput/DemoGenerated/DataDF.h5", dataframe_key="Records", lines=3)

          RID RecordType                                              Squib  \
RecordID                                                                      
1           1     Theory  http://kinetics.nist.gov/kinetics/Detail;jsess...   
2           2     Theory  http://kinetics.nist.gov/kinetics/Detail;jsess...   
3           3     Theory  http://kinetics.nist.gov/kinetics/Detail;jsess...   

                                               PaperDetails Temperature  \
RecordID                                                                  
1         Ab initio and DFT calculations on the gas phas...   613 - 273   
2         Kinetics of the formation reactions of trichlo...   200 - 400   
3         Kinetics of the formation reactions of trichlo...   200 - 400   

         FrequencyFactor  TemperatureRatioExponent  ActivationEnergy  \
RecordID                                                               
1               6.30E+12                       NaN          173000.0   
2           

<h3>Initializing Populator</h3>
<p><em>Here, we extract all the unique reactants and products from all our reactions and augment the database with some markers. Species data is manually fetched and these markers help us plan which species should be fetched earlier than the rest and which reactions should be considered for training ML models.</em></p>

In [10]:
my_populator = sp.Populator()

DEBUG:chemspipy.api:Initializing ChemSpider


--Populator Initialized--


<h4>Moving Reactions and Individual Chemical Species to a HDF5 File as Pandas Dataframes.</h4>

In [11]:
my_populator.reactions_and_species('PreliminaryOutput/DemoGenerated/reactions.tsv', 'PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions', 'Species')

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->['Reaction Link', 'Reactants', 'Products', 'Reactants_List', 'Products_List', 'Reactants_SIDs_List', 'Products_SIDs_List']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


-- DataFrames Created and Stored in PreliminaryOutput/DemoGenerated/DataDF.h5 --


<p>This is how our reactions dataframe looks:</p>

In [12]:
my_populator.print_from_hdf5(hdf5_store="PreliminaryOutput/DemoGenerated/DataDF.h5", dataframe_key="Reactions", lines=5)

                                         Reaction Link  Records  \
RID                                                               
1    http://kinetics.nist.gov/kinetics/ReactionSear...        1   
2    http://kinetics.nist.gov/kinetics/ReactionSear...        1   
3    http://kinetics.nist.gov/kinetics/ReactionSear...        1   
4    http://kinetics.nist.gov/kinetics/ReactionSear...        1   
5    http://kinetics.nist.gov/kinetics/ReactionSear...        1   

         Reactants            Products   Reactants_List       Products_List  \
RID                                                                           
1    C2H5OCH=CHNH2  C2H4 + CH3NH2 + CO  [C2H5OCH=CHNH2]  [C2H4, CH3NH2, CO]   
2           CBr3OF           CBr3 + OF         [CBr3OF]          [CBr3, OF]   
3           CBr3OF     CBr3O(Â·) + Â·F         [CBr3OF]      [CBr3O(·), ·F]   
4           CCl3OF         Â·F + CCl3O         [CCl3OF]         [·F, CCl3O]   
5           CCl3OF         Â·CCl3 + OF         [CCl3OF] 

<h4>Assigning Scores/Markers to Individual Chemical Species.</h4>
<p style="color: green;">A <u>Score</u> (here) is defined as <em>No. of times a species is occuring as a product or as a reactant<em>.</p>

In [13]:
sp.Populator.status_check('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions', 'Species')

-- Scores Assigned --



<p>This is how our updated species dataframe looks:</p>

In [14]:
my_populator.print_from_hdf5(hdf5_store="PreliminaryOutput/DemoGenerated/DataDF.h5", dataframe_key="Species", lines=8)

                       Species  Scores
SID                                   
0    ((CH3)2N)2C=C((N(CH3)2)2)       1
1                 ((CH3)2N)2CO       3
2                 ((CH3)3Si)2O       5
3             (-)-Trans-pinane       4
4                (.)CCl2CF2CF3       1
5                  (.)CD2CHDCl       1
6               (.)CF2C(O)O(.)       1
7                     (.)CF2OH       1


<h4>Assign boolean flags/markers to each reaction</h4>
<p> These boolean flags are Status50, Status75, Status100, Products_Available. Here, <em>StatusX = True</em> implies that the corresponding reaction's reactants and products have a <u>Score >= X</u>. And Products_Available = True implies that the products of that reaction has been reported.</p>

In [15]:
my_populator.reaction_status('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions', 'Species')

-- Boolean Flags Assigned --



your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Reaction Link', 'Reactants', 'Products', 'Reactants_List', 'Products_List', 'Reactants_SIDs_List', 'Products_SIDs_List']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


<hr>
<h4 style="color: blue;">Beyond this point, we manually fetched the data (PubChem IDs) for "Species" dataframe. The dataframe was exported into Excel format and the PubChem IDs were manually added. The latest Excel file is provided in <u>'CleanedData'</u> folder in a <u>.xlsx</u> format. 
</h4>
<hr>
<p>If the NIST database is updated and scraping is performed again, update your new Species dataframe with PubChem IDs from the old species.xlsx file lying in cleaned `CleanedData` folder. A function, `transfer_cid`, to perform this task is present in the same folder.</p>
<hr>

In [16]:
# Storing the updated species.xlsx into our HDF5 file
myRxnExtrator.send_records_to_hdf(records_file="CleanedData/new_species.xlsx", dataframe_key="Species", output_hdf="PreliminaryOutput/DemoGenerated/DataDF.h5")

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Species']]

  exec(code_obj, self.user_global_ns, self.user_ns)


<h3>Augementing "Records" DF, "Reactions" DF and "Species" DF</h3>

In [17]:
import recordmapper as rm
rm.RecordMapper.fill_rxn_order('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions', 'Records')
rm.RecordMapper.fill_activ_enrgy('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions', 'Records')
rm.RecordMapper.map_rid_to_cid('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions', 'Species')

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Reaction Link', 'Reactants', 'Products', 'Reactants_List', 'Products_List', 'Reactants_SIDs_List', 'Products_SIDs_List', 'ReactionOrder']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


--Records stored into HDF5 file--


your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Reaction Link', 'Reactants', 'Products', 'Reactants_List', 'Products_List', 'Reactants_SIDs_List', 'Products_SIDs_List', 'ReactionOrder', 'ActivationEnergy']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


--Records stored into HDF5 file--
--Records stored into HDF5 file--


your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Reaction Link', 'Reactants', 'Products', 'Reactants_List', 'Products_List', 'Reactants_SIDs_List', 'Products_SIDs_List', 'ReactionOrder', 'ActivationEnergy', 'ReactantCID', 'ProductCID']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


<h3>Augmenting Reactions and Species Dataframes with PubChem Data</h3>

In [18]:
my_populator.get_pubchem_data("PreliminaryOutput/DemoGenerated/DataDF.h5", 'Species')

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov
DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/180/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


549 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/62695/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


894 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/62695/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


1048 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137654/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


1209 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/7845/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


1396 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/11605/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


1406 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123145/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


1553 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/7844/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


1554 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5462311/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


2798 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/241/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


2882 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123147/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3195 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/24408/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3356 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5460627/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3390 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5360770/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3399 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5462310/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3419 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6326/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3537 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123166/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3543 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6325/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3548 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/702/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3669 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6324/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3689 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6334/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


3735 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137438/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


4162 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123398/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


4619 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/10037/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


4872 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137767/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


5162 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/712/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


5375 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/176/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


5650 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6335/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


5726 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/8252/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6152 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/177/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6256 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137849/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6264 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6327/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6296 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123144/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6384 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/887/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6488 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123146/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6504 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/297/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6597 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6373/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6736 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5975/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6759 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/281/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6775 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/280/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6782 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/24526/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


6906 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/166686/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7007 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5460635/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7510 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/783/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7511 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/783/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7937 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/10038/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7946 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/962/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7986 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/784/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7990 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/402/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


7996 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/260/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8030 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/768/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8147 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123370/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8152 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/313/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8171 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/167583/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8176 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/14917/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8180 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/21844680/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8191 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/945/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8215 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/24529/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8216 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/944/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8217 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/520535/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8230 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5362549/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8457 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5360629/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8459 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/57370662/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8651 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/947/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8692 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/948/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8702 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/140912/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8743 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/138039/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8763 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5460607/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8767 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123329/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8771 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/222/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8803 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/145068/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8814 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/3032552/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8818 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/943/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8823 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5360545/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8835 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/159832/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8919 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/159832/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8921 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/977/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8924 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/977/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8925 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/24823/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


8934 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6914119/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9003 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/159832/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9109 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123159/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9217 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5362487/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9377 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5460613/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9423 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/1119/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9431 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6327230/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9532 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/139580/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9554 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/23953/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9582 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/1140/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


9764 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137284/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10406 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6360/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10423 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/8255/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10424 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/138305/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10425 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/7843/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10557 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137103/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10780 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123271/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10933 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123138/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10935 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/122980/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10940 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/6432238/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10950 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/137518/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10956 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123164/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


10972 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/123136/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


11090 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5378701/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


11110 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/3034819/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


11147 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5360523/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


11174 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/5360525/record/json HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pubchem.ncbi.nlm.nih.gov


11175 done


DEBUG:urllib3.connectionpool:https://pubchem.ncbi.nlm.nih.gov:443 "GET /rest/pug/compound/cid/157350/record/json HTTP/1.1" 200 None


11206 done


your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Species', 'BondsInfo']]

  """Entry point for launching an IPython kernel.


<h4>Some chemicals\species names have been assigned artificial CIDs:</h4>

`.CH`: -1 | .CH radical. PubChem didn't have info on it. So we have hardcoded a feature vector corresponding to .CH and assigned it a CID of -1.<br>
`M`: -2 | This implies a catalyst. It is not considered in the reaction and corresponds to a feature vector of all zeros.<br>
<hr>
These 3 entries point towards missing data: <br>
`Products`: -3 <br>
`Other Products`: -3 <br>
`Adduct`: -4 <br>


<h4>Augmenting Species Dataframe with Feature Vectors</h4>

In [19]:
# Initializing Feature Contructor
import features as ft
my_constructor = ft.FeatureConstructor("FeatureLibrary/elements.csv", "FeatureLibrary/bonds.csv")
my_constructor.create_species_feat_vec('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Species')

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['Species', 'BondsInfo', 'FeatureVector']]

  return pytables.to_hdf(path_or_buf, key, self, **kwargs)


<h4>Getting a subset of the Reaction Dataframe with only those reaction whose feature vectors can be calculated.</h4>

In [20]:
import expansion_utils as eu
rxn_df_subset = eu.Extender.get_rxn_subset('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Reactions')
print('------')
print('Upper limit on the No. of Reactions that can be used: {}'.format(len(rxn_df_subset)))
print('------')

------
Upper limit on the No. of Reactions that can be used: 1852
------


<h4>Augmenting Reaction Dataframe with Feature Vectors and exporting that data to Excel</h4>

In [21]:
rxn_df_subset = my_constructor.bond_brk('PreliminaryOutput/DemoGenerated/DataDF.h5', 'Species', rxn_df_subset)
rxn_df_subset.to_excel('PreliminaryOutput/DemoGenerated/TrainingData.xlsx')

<h3>Some File IO Post Processing</h3>

<p> The HDF5 File doesn't delete the old content when we update a dataframe that was stored in it. The new content is appended to it while the old content remains. We have to manually get rid of the old content to keep the file size under control. For more info: http://pandas.pydata.org/pandas-docs/stable/io.html#delete-from-a-table</p>

In [22]:
# Compress the present HDF5 file and replace it with the newer (smaller) version
!ptrepack --chunkshape=auto --propindexes --complevel=9 --complib=blosc PreliminaryOutput/DemoGenerated/DataDF.h5 PreliminaryOutput/DemoGenerated/Comp_DataDF.h5
import os
os.remove("PreliminaryOutput/DemoGenerated/DataDF.h5")
os.rename("PreliminaryOutput/DemoGenerated/Comp_DataDF.h5", "PreliminaryOutput/DemoGenerated/DataDF.h5")

In [23]:
# Write the updated/augmented Dataframes to Excel
eu.Extender.export_all_to_excel(input_hdf5="PreliminaryOutput/DemoGenerated/DataDF.h5", out_directory_path="PreliminaryOutput/DemoGenerated")

-- Dataframes written to Excel files (.xlsx) --


<hr>
Here's the workflow of our entire project. The portions in `blue` and `orange` were presented in this notebook. For ML section, go to `Project-MachineLearning.ipynb`.
<hr>

In [24]:
# Displaying Project Workflow
from IPython.display import IFrame
IFrame("Report/FlowChart.png", width=550, height=550)