Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SDF Export to PandasTools and adjusted SDF Import #404

Merged
merged 3 commits into from
Dec 23, 2014

Conversation

nhfechner
Copy link
Contributor

The pull request contains the commit from @samoturk in the pull request #401. In addition to these changes, I made the generation of a smiles column optional and defaulting to not generate it. Adding a smiles column later on is simple, but automatically doing it silently changes the matching of SD tags to dataframe columns.

The export is fairly self contained and is nearly identically reproducing the SDF input. The deviations are mainly occuring when some SD tags are not present in all molecules in the input SDF. Such cells are filled in the dataframe with "nans" which consequently end up in the written SDF.
Another deviation is that the generated SD file contains the valences for some atoms (e.g. Cu,Si,Se) in the atoms block even if they were not present in the input.

Are these deviations acceptable?

@samoturk : Do you think the method signatures and the changes to the input are reasonable or would you suggest doing things differently?

samoturk and others added 3 commits December 19, 2014 16:24
…to _MolPlusFingerprint and modified it to accept mol object, fixed AddMoleculeColumnToFrame to use new _MolPlusFingerprint
Fixed LoadSDF to keep 3D info, renamed _MolPlusFingerprintFromSmiles to ...
@samoturk
Copy link
Contributor

I agree with the changes. I guess round trip works in most cases which is good enough. We could remove SD tags with nan from mols in cases when some SD tags are not in all molecules but it would complicate the code and probably still not be 100% fool proof.

Nice effort! I think it can be merged.

@greglandrum
Copy link
Member

Great. I will merge next time I am in front of my laptop

On Tuesday, December 23, 2014, Samo Turk notifications@github.com wrote:

I agree with the changes. I guess round trip works in most cases which is
good enough. We could remove SD tags with nan from mols in cases when some
SD tags are not in all molecules but it would complicate the code and
probably still not be 100% fool proof.

Nice effort! I think it can be merged.


Reply to this email directly or view it on GitHub
#404 (comment).

greglandrum added a commit that referenced this pull request Dec 23, 2014
Added SDF Export to PandasTools and adjusted SDF Import
@greglandrum greglandrum merged commit 505f5a3 into rdkit:master Dec 23, 2014
@greglandrum
Copy link
Member

@nhfechner and @samoturk : thanks for this!

@greglandrum greglandrum added this to the 2015_03_1 milestone Mar 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants