Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chemical JSON format #1137

Closed
greglandrum opened this issue Oct 31, 2016 · 50 comments
Closed

Chemical JSON format #1137

greglandrum opened this issue Oct 31, 2016 · 50 comments

Comments

@greglandrum
Copy link
Member

greglandrum commented Oct 31, 2016

Discussion Document

It'd be great to have a chemical JSON format in the RDKit. We're collecting ideas here.

Please include ideas and/or pointers to other attempts at this in the comments below. I will integrate them up here.

Limitations

  • Let's limit ourselves to specified structures without query features for the moment, but keep in mind that we may want to include queries later.

Features that won't be in the first version, but that might come

  • Query features
  • Reaction support

Requirements

  • [] CTAB-like atoms and connectivity info
  • [] 2D (or 3D) coordinates not required in order to have correct stereochemistry
  • [] multi-conformer
  • [] can include atom labels
  • [] flexible stereochemistry model:
    • absolute and relative stereochemistry
    • flexible enough to accommodate atropisomers
  • [] agnostic to chemistry model (no aromaticity)
  • [] optional toolkit-specific fields for perceived properties and toolkit-specific info
  • [] supports flexible (and dynamic) properties attached to molecules, atoms, bonds, conformers
@mcs07
Copy link
Contributor

mcs07 commented Oct 31, 2016

A few years ago I added support for JSON formats to Open Babel:
https://github.com/openbabel/openbabel/tree/master/src/formats/json

The two example formats I implemented were the ChemDoodle JSON format and the JSON output of the PubChem PUG REST API.

There is also the OpenChemistry Chemical JSON project:
https://github.com/OpenChemistry/chemicaljson

@proteneer
Copy link

I don't think there's enough value simply replacing the storage format itself. Yes it's slightly easier to parse JSON than the row column based SDF format, but that by itself isn't sufficient. I think it's also really important to define the scope of this (else things like query values start to creep in.)

One of the things that would be a real value-add is coming to some consensus on defining a minimally complete representation distinct from computed properties, so as to minimize possible inconsistencies. For example, the treatment of stereochemistry in SDF/MOLBLOCK can be inconsistent between the calculated parity value (R/S) and the atomic coordinates with wedgd bonds.

Another thing I'd like, but may get flamed on is to avoid any kind of explicit ordering on atom indices since the underlying graph structure and its associated properties should be invariant under isomorphism. That is, things that dependent on a proper index should be labelled explicitly (eg: atom mappings). Practically this means that we shouldn't guarantee an iteration order over atoms/bonds.

@markussitzmann
Copy link

We should take a look at things like JSON-LD, HAL, Json+Collections whether they might be helpful to create proper media types and/or make the molecule format straight available for Web APIs. The chemicaljson link mentioned by Matt already mentions this, too.

This here might be a starting point:

https://sookocheff.com/post/api/on-choosing-a-hypermedia-format/

which concludes

"""
If you are augmenting existing API responses choose JSON-LD. If you are keeping it simple choose HAL. If you are looking for a full featured media type choose Collection+JSON.
"""

@mcs07
Copy link
Contributor

mcs07 commented Oct 31, 2016

On the topic of JSON-LD, there is SciData recently published by @stuchalk, which seems like it would be worth looking at, even if the scope is slightly different from what is relevant to RDKit.

@greglandrum
Copy link
Member Author

greglandrum commented Oct 31, 2016

@proteneer : what's motivating the desire to avoid atom indices? I'm pretty sure that they make the file more (human) readable. If we treat the indices as an convenience feature for the input format but not something that's guaranteed to be preserved on parsing the file, does that help?

@proteneer
Copy link

@greglandrum - my point was simply that the spec itself (which is separate from the actual concrete JSON implementation) should not guarantee consistent ordering. So basically as you mentioned, one way to do this is on the implementation's serialization level (i.e. serialization and deserialization may permute the ordering).

As an example, there's a particular format (which I won't mention here) that prefers to put explicit hydrogens at the end of the molblock for "convenience" sake. This is great until the inevitable molblock violating this guarantee comes along and everything breaks.

Note that I'm fine with an implementation that actually uses a list of atoms. I do agree with you that it's far more accessible to read, even if we run the risk of implementers assuming consistent ordering.

@dmaziuk
Copy link

dmaziuk commented Oct 31, 2016

There should be a column for atom index. Users are free to ignore it. You don't have to write the atoms out in that order, but if you do, people might be able to use simple stupid tools like diff to quickly compare two molecule files.

This may not sound very useful to a chemist but when I run a batch job over 25K ligands, diff'ing the ins and outs and flagging only the ones that changed -- or didn't, depending, -- for a closer look is a very useful feature trivially coded in a one-line shell post-script.

There should also be a column for atom label because I don't know of any algorithm that can label atoms C-alpha, H-beta-21, etc. for the two dozen molecules that use those. Every piece of code here has the atom tables for the "common" residues, each with its own typos and who knows what. We wouldn't have to do that if our exchange formats didn't throw away protons, atom labels and indexes, and everything else that is "obvious to a chemist".

@dmaziuk
Copy link

dmaziuk commented Oct 31, 2016

A separate issue is that JSON itself is not a streaming format. Valid JSON has to be a single string that gets loaded into RAM in order to be parsed into a single "javascript object". Consider the size of the string describing a hundred "best models" for a moderately-sized polymer.

@stuchalk
Copy link

Thanks for the mention. I am happy to answer any questions about SciData when/if you get to look at it.
Now that I have gotten SciData out I am going to work more on Chemical JSON…I have lots of ideas…

Stuart

On Oct 31, 2016, at 9:14 AM, Matt Swain <notifications@github.commailto:notifications@github.com> wrote:

On the topic of JSON-LD, there is SciDatahttp://stuchalk.github.io/scidata/ recently published by @stuchalkhttps://github.com/stuchalk, which seems like it would be worth looking at, even if the scope is slightly different from what is relevant to RDKit.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/1137#issuecomment-257290264, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAg99gVRkB-K4PB1fBTVqK-MXPhXk0Njks5q5em1gaJpZM4Kk-HO.

@markussitzmann
Copy link

markussitzmann commented Oct 31, 2016

@dmaziuk Never comfort user that use "stupid tools" - that is their own responsibility. A big advantage of json is that there are well-tested parser for basically any language and environment (even for Unix shells if you really need it).

And, there are also json streaming solutions for basically all important languages (conceptually there is no big difference between parsing xml or json).

@dmaziuk
Copy link

dmaziuk commented Oct 31, 2016

Uh-huh. Well I'll stick to formats that let me use tools that actually work. It's a good thing by now I can write a format converter with my eyes closed and one hand tied behind my back.

@DavidACosgrove
Copy link
Collaborator

For me, the key thing about the format is that it supports multiple conformers of the same molecule efficiently. That's what kicked the discussion off in the original rdkit-discuss thread. I would imagine that means 1 block defining the chemistry, and then multiple sets of co-ordinates for the conformations. If there could be 2D co-ordinates labelled distinct from 3D ones, that would be helpful though it might create problems in the RDKit molecule object.
Whilst a compact JSON format would be better than multiple MolBlocks for this, a binary format would be faster still - no need to convert ASCII/Unicode to integers and floats, for example. Is there any enthusiasm for defining a binary multi-conformer format at the same time as the JSON one? I would think it could have very similar features but written out in binary format without all the JSON plumbing.

@dmaziuk
Copy link

dmaziuk commented Nov 1, 2016

There is an advantage to storing a table of atoms & bonds as delimited text: you can load it in Excel. Do not underestimate the power of Excel. (And other stupid tools.)

If you define the data structure, you can write it out as Protocol Buffer Definition and dump it into binary. Or e.g. as a Data Type Definition and dump it into XML. It's only a matter of picking up the appropriate library and feeding it your data structure in the way it understands.

@coleb
Copy link
Collaborator

coleb commented Nov 2, 2016

+1 @DavidACosgrove general comments about multi-conformer support and an additional note:

In my experience reading conformers efficiently does come down to reading the coordinates efficiently (once chemistry perception is out of the way), and that means reading binary. Luckily, we don't need to come up with a new format to handle binary once we decide on the JSON structure. MsgPack is a 1-to-1 encoding from text JSON to binary. With support for as many languages that support JSON: http://msgpack.org/index.html

The RCSB is going down this exact same route for macro-molecule representation as well:

That format focuses heavily on compressing large macro-molecules for efficient transmission. So I doubt we want to use it for small molecules, but I could be wrong. An .mmtf reader would be a useful addition to RDKit regardless.

@dmaziuk Do Protocol Buffers have a 1-to-1 mapping to JSON like MsgPack? I am unfamiliar of the pros and cons of each.

@proteneer
Copy link

Protocol buffers 3 does indeed have a JSON encoding in addition to the
standard binary encoding.

On Wednesday, November 2, 2016, Brian Cole notifications@github.com wrote:

+1 @DavidACosgrove https://github.com/DavidACosgrove general comments
about multi-conformer support and an additional note:

In my experience reading conformers efficiently does come down to reading
the coordinates efficiently (once chemistry perception is out of the way),
and that means reading binary. Luckily, we don't need to come up with a new
format to handle binary once we decide on the JSON structure. MsgPack is a
1-to-1 encoding from text JSON to binary. With support for as many
languages that support JSON: http://msgpack.org/index.html

The RCSB is going down this exact same route for macro-molecule
representation as well:

That format focuses heavily on compressing large macro-molecules for
efficient transmission. So I doubt we want to use it for small molecules,
but I could be wrong. An .mmtf reader would be a useful addition to RDKit
regardless.

@dmaziuk https://github.com/dmaziuk Do Protocol Buffers have a 1-to-1
mapping to JSON like MsgPack? I am unfamiliar of the pros and cons of each.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1137 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACLNFNi2_wk2BiPFxYIxUNfXgE4XuEmBks5q6FxmgaJpZM4Kk-HO
.

Yutong Zhao

@dmaziuk
Copy link

dmaziuk commented Nov 2, 2016

Protobuf is the schema, aka DTD, plus translator. AFAICT MsgPack just packs the bytes and lets the reader sort them out. IME people who didn't sit through Algorithms and Data Strucutres 101 tend to view the lack of the schema as a feature whereas Comp. Sci. types call it a bug.

A table of coordinates would be a few bytes smaller in a binary format than in CSV: no comma delimiters, but the overhead is minimal. The CSV, OTOH, can be directly loaded into a database, edited with sed, and so on. Encoding in JSON as list of lists with a header row on top will add overhead and remove much of the usability of CSV: the worst of both worlds.

@greglandrum
Copy link
Member Author

greglandrum commented Nov 2, 2016

This is a good discussion but I'm afraid that we are heading a bit off into in the weeds here. I think it would be more productive to figure out what information we need to capture and then to think about the technology (format) that we need to store that information. I suspect that we will actually end up with multiple formats in order to be able to balance robustness, portability, and performance.

@DavidACosgrove
Copy link
Collaborator

@dmaziuk: The reason for favouring a binary format for these purposes is not size, it's speed. With a binary format, numerical contents can be read directly into a float or int, with any ASCII format something that ultimately calls atof will have to be used, which imposes a significant overhead on reading. I think you may be mis-counting the size difference, however. An int in binary format will normally be 4 bytes whatever the value of the integer being stored, in ASCII it can be anywhere between 1 and 10 bytes.

@greglandrum has a point however - let's first decide what should be in the file!

@dmaziuk
Copy link

dmaziuk commented Nov 2, 2016

IME speed has never been a practical problem. By the time it starts biting you, there's three more next generations of hardware out there and your computer is long overdue for upgrade. (We're unzip/untar'ing text files on 3-7yo hardware fast enough to saturate the SATA bus and hang the machine and I have to configure cgroups on every cluster node to try and limit the i/o. I/m having real hard time thinking of an instance of text files being "too slow", and the problem I happen to have with them right now is the exact opposite of that.)

I think one of the things missing from Greg's requirements is intended audience. Who is going to use the format and for what purpose. And also why does RDKit need another format.

JSON is the web's darling du jour, now that XML has settled into its niche and we all moved on, but it's only really good for what it was intended for: sending small snippets of JavaScript directly into the browser. If the intended audience is not the browser, RDKit is not JavaScript, and the data is not small...

@dmaziuk
Copy link

dmaziuk commented Nov 2, 2016

The other thing is you can spin the math either way: you're not going to represent "ALA", "CA", etc. in binary any more efficiently that in ASCII/UTF-8. 12.3 takes up 4 bytes in UTF-8 and 8 bytes in double precision IEEE 754. Plus the round-off error: if you really want to do it right, you want to send a "significant digits" integer alongside so that your users could tell if 12.000019287547965 is actually 12 or 12.000.

If you send it as text you can off-load the decision to the user: they can stare at "12.000" and try to figure out if it is actually accurate to the 3rd digit, or the programmer just printed it as "%7.3f" because the numbers line up pretty that way.

@arose
Copy link

arose commented Nov 2, 2016

related discussion at alchemistry.org https://github.com/alchemistry/fileformat

@arose
Copy link

arose commented Nov 2, 2016

Hi, I am one of the MMTF developers. One of the things we are thinking about are ways to flexible add more metadata to the format.

@markussitzmann
Copy link

Why a new format? My answer to this would be:

  1. having a chemical structure format that is in the public domain but has a community that is strong enough to create traction and support it (and I think RDKit has the biggest potential in this regard - I know there is CML but I have the feeling it fails the "creating traction and support" part, sorry).
  2. having a chemical structure format that is more flexible than writing a sequential list of molecules and properties - chemical data sets are usually more complex than this and quite often they contain structures that have a certain relationship to each other (e.g. "is stereoisomer of", "is conformer of", "is tautomer of". So there is maybe a way to add such kinds of semantics (not necessarily as part of the file itself which leads to the next point),
  3. creating a chemical structure format that better fits all these nice Web/REST Service and Linked Data formats and mechanisms (i.e. can be serialized to standard representations like XML and json)

@markussitzmann
Copy link

@dmaziuk "... but it's only really good for what it was intended for: sending small snippets of JavaScript directly into the browser. "

Hmm, I tend to disagree - a lot of web services use json as exchange format nowadays for big amount of data, and have you come in touch with NoSQL world with things like Lucene, Solr, Elasticsearch which all pretty strongly rely on or support json? Json is also natively understood by Javascript, which has a growing relevance on the server side of web services, and it is almost natively understood by python (the javascript guys actually stole the python dictionary data type when they developed json).

@greglandrum
Copy link
Member Author

@dmaziuk is absolutely right: being a bit more explicit about what we want to accomplish with the format as well as who the intended users are is a good idea.

I will put some more meat on this later, but I'm primarily looking for an efficient and flexible format for storing and exchanging data about small molecules. It should be both machine and human readable (or at least have an easy way to get a human-readable form) and support optional toolkit-dependent information (like ring information, aromaticity, etc.) that can be ignored (or not) by other toolkits. I'm really not looking to create the one-format-to-rule-them-all and my focus at the moment is almost entirely on having something for the RDKit, though I want to be very sure that it's easily useable by other toolkits as well.

My biases on this one:

  • it's not going to be XML based (though likely it will be convertible into XML)
  • it's not going to be unstructured text where column numbering is important (i.e. ctab et al)

@khinsen
Copy link

khinsen commented Nov 3, 2016

Please consider defining a data model first, and then a data format as an implementation of this data model.

I understand that the focus of this discussion is on JSON, for good technical reasons. But the technical requirements for data formats vary: one person needs JSON, another one needs XML, a third one needs HDF5. There will always be many formats for the same kind of data because of technical imperatives. And that means format conversion, which we all love to do, right?

Format conversion is actually not much of a problem if its lossless in both directions, i.e. if the conversion happens between two formats that represent the same data. And that common abstract definition is the data model. Think of it as a high-level format description. For more details, see this article.

You might also want to look at my MOSAIC data model/format for computational chemistry, and read the paper that explains the rationals behind its design. You might be able to actually use MOSAIC by adding a JSON implementation. Or extend MOSAIC to your needs. But the most important aspect of MOSAIC is the two-level design as a data model with multiple implementations.

@greglandrum
Copy link
Member Author

Wow, I don't need to come back and flesh out what I was thinking too much, @khinsen just said a lot of it for me, and better than I probably would have.
Thanks for that Konrad, I could hardly agree more.

Restating, hopefully accurately, using a somewhat different vocabulary: we should really be defining a schema that describes the information we're trying to capture and then worry about details of the physical representation (i.e. JSON, protobuff, msgpack, etc.)

@khinsen
Copy link

khinsen commented Nov 3, 2016

@greglandrum Exactly. In my experience, the best approach to defining a data model is a hierarchical one, just like for program design. At the highest level, you may want to describe a molecule as a graph, for example, and decide which attributes you want to attach to vertices and edges. Next, you could define how to represent that graph plus its attributes in terms of more basic data structures such as arrays of strings, numbers, etc. The last step is the concrete data format.

@khinsen
Copy link

khinsen commented Nov 3, 2016

@dmaziuk Your example concerning numbers is a nice illustration of what should be defined in a data model, and why it is important to have one. At the data model level, it matters if you want to represent a measured or computed value with an attached precision, or a raw floating-point value from a computation. You probably don't want to off-load that decision to the user, but even if that's what you want, this choice is part of your data model.

If you start from the other end, e.g. the efficiency of representation, you will probably end up defining a format that is impossible to convert to anything else without losing information or, worse, having to make up information.

BTW, if you need to represent raw floating-point numbers in a text-based format, e.g. for continuing a computation at a precise state saved in a file, a decimal representation is a sure recipe for having to worry about round-off errors. A byte sequence in IEEE format is error-free and very portable, it's just not human-readable. As a compromise, you can consider floating-point notation in base 8 or 16, which permits error-free conversion to and from IEEE.

@coleb
Copy link
Collaborator

coleb commented Nov 3, 2016

To try and keep it on topic of data and not format:

[] can include atom labels

@greglandrum, how generalizable is this requirement? Is this as simple as the Tripos atom name field? i.e., a fixed size string. Or something that can hold any arbitrary key-value data? Hopefully the latter, and I would generalize it to both the molecule and the bonds. Something like the following to serialize RDKit properties:

{
  "_Name" : "CorpID",
  "foo" : "bar",
  "atoms" : [{"partial charge" : 1.23, "force" : [0.1, 0.2, 0.3], ... }, ...],
  "bonds" : [{"highlighted" : true, ... }]
}

Being able to add arbitrary properties on the molecule, atoms, and bonds would be very powerful. And matches RDKit's property system since I think targeting at just RDKit is just fine for now as well.

@greglandrum
Copy link
Member Author

@coleb : I intended to cover that with:

[] supports flexible (and dynamic) properties attached to molecules, atoms, bonds, conformers

@coleb
Copy link
Collaborator

coleb commented Nov 3, 2016

@greglandrum good, very cool. :-)

So what is "can include atom labels" then? How is that different?

@greglandrum
Copy link
Member Author

greglandrum commented Nov 3, 2016

Ah, right. That is, in my mind, the equivalent of the "CA" or"CB" in a PDB file.
And I'm thinking that it's an actual attribute instead of a property since people keep telling me that molecules should have names. ;-)

@dmaziuk
Copy link

dmaziuk commented Nov 3, 2016

@khinsen not sure what you mean by IEEE being error-free: as I recall the entire first chapter of our Sci.Comp. 201 textbook was about error control.

@greglandrum My vote would be for segmented data model with an atom/bond table and a completely separate coordinate table, and so on. There has to be a core section that is mandatory (and once you define it and people start using it, it'll be very hard to change), conformers are optional; etc.

You can tar/zip them and call the resulting archive .rdk (RPM and DEB packages, among others, are that). Or concatenate them in one file with section delimiters.

On the end-user side IME number crunching typically involves tables: matrices and such, and pulling out subsets works well with tables e.g. loaded into sqlite. Table-based is good, implicit column headers (numbers) -- not so much, but if I had to choose between that and JSON list of maps (rows), I'd probably go for numbered columns.

@arose
Copy link

arose commented Nov 3, 2016

For me having a schema where properties (their names and what data they hold) are explicitly defined seems more and more important. Having fields with arbitrary (though typed as text/float/...) user data is a use case but for interoperability different consumers of the format need to "discover" the properties to actually use them. Properties can be optional with a required core to allow for slim files.

@khinsen
Copy link

khinsen commented Nov 3, 2016

@dmaziuk It's the transmission (encoding/decoding) of floating-point numbers that is error-free if you use a binary representation. Computations are a different story.

@dmaziuk
Copy link

dmaziuk commented Nov 3, 2016

Which binary encoding? IEEE binary encoding will turn 0.3 into 0.30000000000000004.

Transmission errors: noise, bit flips, etc. affect unicode binary bits exactly the same way as ieee binary bits.

Forgive me for having difficulties with the meaning of "error free" in this context.

@khinsen
Copy link

khinsen commented Nov 3, 2016

@dmaziuk Ouch, there are too many distinct meanings of "binary" in this context!

I am thinking of the IEEE binary formats, which are by far the most used ones. Error-free conversion from and to text representations is possible only for (1) raw byte dumps, or (2) a base-2/8/16 representation.

Your example proves my point: you can't convert decimal "0.3" to IEEE binary float formats without error.

@greglandrum
Copy link
Member Author

A request: the lack of threading in these comment threads makes it difficult enough to track long discussions, let's please try to stay on topic here and not continue the discussion about binary vs text (or other details of what the eventual physical format may be).

@dmaziuk
Copy link

dmaziuk commented Nov 3, 2016

@khinsen no.

dmaziuk@stingray:~$ python
Python 2.7.5 (default, Sep 15 2016, 22:37:39) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1 + 0.2
0.30000000000000004
>>> 

@greglandrum the relevant point is whether you want to add the "num significant digits" field to every floating-point field in your data model.

@shenkin
Copy link

shenkin commented Dec 4, 2016

I've not seen a good summary of requirements so far. I'd like to see included:

user-specifiable structure-level properties per structure
user-specifiable atom-level properties per atom
user-specifiable bond-level properties per bond

Some properties might be built-in, perhaps by using reserved keywords to specify them; examples: formal charge (on an atom), partial charge (on an atom), bond order (on a bond), and so on. These could include properties that are always be required to be present as well as properties that are sufficiently commonly used that standard names would be desirable.

Since this whole discussion started out on the rdkit-discuss list as a way to store conformers (not just multiple molecules), it would be good if there were a way to take advantage of any storage savings that might be possible for a sequence of conformations. I'm not sure that's a requirement, though. In certain situations, there might be associated guarantees, as well. For example, a molecule known to be a conformer beyond the first one in a sequence of molecules might share all properties (ct, atom, bond) specified for the leading conformer in the sequence unless overridden in the later conformer. So any conformer is in effect specified by difference from the first conformer in the sequence.

@dmaziuk
Copy link

dmaziuk commented Dec 5, 2016

PDB chem comp (ligand) model includes a list of structure-level properties as well as tables of atoms and bonds (with properties). One of the reasons they (and we) use STAR is because it's about the only format that lets you combine tables and key-value pairs in a reasonable fashion. (Don't get me started on shortcomings of STAR.)

JSON does not have a built-in table data type.

@shenkin
Copy link

shenkin commented Dec 5, 2016 via email

@dmaziuk
Copy link

dmaziuk commented Dec 5, 2016

... or a list of lists, or a { "head" : [ ...], "body" : [[...], ...] } -- that's my point: there is no one standard way that everybody understands.

@markussitzmann
Copy link

markussitzmann commented Dec 5, 2016 via email

@shenkin
Copy link

shenkin commented Dec 5, 2016 via email

@shenkin
Copy link

shenkin commented Dec 5, 2016 via email

@markussitzmann
Copy link

markussitzmann commented Dec 5, 2016 via email

@dmaziuk
Copy link

dmaziuk commented Dec 6, 2016

On the other hand, I am not sure what you really gain - it might be more space efficient (okay, not a too big argument anymore)

It can be if you're storing multiple conformers for a larger molecule. Coupled with JSON's requirement to read the whole string in memory at once, it has a potential to be... suboptimal.

@rdkit rdkit deleted a comment from shenkin Jul 10, 2017
@rdkit rdkit deleted a comment from shenkin Jul 10, 2017
@rdkit rdkit deleted a comment from shenkin Jul 10, 2017
@rdkit rdkit deleted a comment from shenkin Jul 10, 2017
@rdkit rdkit deleted a comment from shenkin Jul 10, 2017
@greglandrum
Copy link
Member Author

closing this because there's now (and has been for a while) an implementation of commonchem and an rdkit-specific extension of that in rdMolInterchange: http://rdkit.org/docs/source/rdkit.Chem.rdMolInterchange.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests