# Chemical structure standardisation with AMBIT

- ambitcli - command line 
- Guide and download: http://ambit.sourceforge.net/ambitcli_standardisation.html
- Download: https://zenodo.org/record/1145812
- Used to standardize [ExCAPE-DB](https://jcheminf.springeropen.com/articles/10.1186/s13321-017-0203-5)

This is a Jupyter notebook using BeakerX kernels. The main kernel is Python, mainly to demonstrate the usage of Java code in Python notebook.  

### Configuring Maven repositories and Maven dependencies

In [2]:
%%java
%classpath config resolver mvnLocal
%classpath config resolver nexus-idea-releases https://nexus.ideaconsult.net/content/repositories/releases
%classpath config resolver nexus-idea-snapshots https://nexus.ideaconsult.net/content/repositories/snapshots
%classpath add mvn ambit ambit2-tautomers 3.3.0-SNAPSHOT
%classpath add mvn ambit ambit2-dbcli 3.3.0-SNAPSHOT

### Standardize single structure

In [3]:
%%java
import ambit2.tautomers.processor.StructureStandardizer;
import org.openscience.cdk.interfaces.IAtomContainer;
import org.openscience.cdk.smiles.SmilesGenerator;
import org.openscience.cdk.smiles.SmilesParser;
import org.openscience.cdk.silent.SilentChemObjectBuilder;

SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
IAtomContainer mol = sp.parseSmiles("CC(=O)CC(C1=CC=CC=C1)C2=C(C3=CC=CC=C3OC2=O)O");
StructureStandardizer std = new StructureStandardizer();
System.out.println(String.format("Clear isotopes %s\tGenerate 2D %s\tGenerate InChI %s\tGenerate SMILES %s\tAromatic %s\tCanonical %s\nStereo from 2D %s\tTautomers %s\tImplicit H %s\tNeutralise %s\tSplit fragments %s",
        std.isClearIsotopes(),
        std.isGenerate2D(),
        std.isGenerateInChI(),
        std.isGenerateSMILES(),
        std.isGenerateSMILES_Aromatic(),
        std.isGenerateSMILES_Canonical(),
        std.isGenerateStereofrom2D(),
        std.isGenerateTautomers(),
        std.isImplicitHydrogens(),
        std.isNeutralise(),
        std.isSplitFragments()
                                ));
IAtomContainer mol_std = std.process(mol);
return SmilesGenerator.generic().create(mol_std);

Clear isotopes false	Generate 2D false	Generate InChI true	Generate SMILES true	Aromatic false	Canonical false
Stereo from 2D false	Tautomers false	Implicit H false	Neutralise false	Split fragments false
2018-08-03 15:19:25:624 +0300 [javash0] INFO  ConfigManager - Loading global configuration
2018-08-03 15:19:25:628 +0300 [javash0] INFO  ConfigManager - Loading artefact configuration: jniinchi-1.03_1
2018-08-03 15:19:25:632 +0300 [javash0] INFO  ClasspathRepository - Searching classpath for: jniinchi-1.03_1-WINDOWS-AMD64
2018-08-03 15:19:25:635 +0300 [javash0] INFO  LocalRepository - Searching local repository for: jniinchi-1.03_1-WINDOWS-AMD64
2018-08-03 15:19:25:643 +0300 [javash0] INFO  ManifestReader - Reading manifest


CC(=O)CC(C1=CC=CC=C1)C2=C(C3=CC=CC=C3OC2=O)O

### Standardize file with chemical structures
- using tab-delimited file with SMILES column

In [14]:
import pandas as pd
df = pd.read_csv("test.txt")
(df)

In [12]:
%%java
import ambit2.dbcli.AmbitCli;
import ambit2.dbcli.CliOptions;

String infile="test.txt";
String out = "test_std.txt";

String[] args = new String[] { "-a", "standardize", "-i", infile, "-m", "post", "-o", out, "-d",
            "smiles=true", "-d", "inchi=true", "-d", "tautomers=true" };
CliOptions options = new CliOptions();
if (options.parse(args))
try {
    AmbitCli cli = new AmbitCli(options);
    cli.go(options.getCmd(), options.getSubcommand().name());
} finally {
    // (new File(out)).delete();
}
return out;

INFO   InChI native library loaded in 1 msec 
INFO   Reading from test.txt Writing to test_std.txt
INFO   Records read:1  [1.000      msec/record]	processed:1  [441.000    msec/record]	error:0	skipped:0	Total time 442 msec.
INFO   Records read:2  [0.500      msec/record]	processed:2  [224.500    msec/record]	error:0	skipped:0	Total time 450 msec.
INFO   Records read:2  [0.500      msec/record]	processed:2  [224.500    msec/record]	error:0	skipped:0	Total time 450 msec.
INFO   ambitcli-3.3.0-SNAPSHOT Records read:2  [0.500      msec/record]	processed:2  [224.500    msec/record]	error:0	skipped:0	Total time 450 msec.


test_std.txt

In [24]:
df=pd.read_csv("./test_std.txt",sep='\t', keep_default_na=False)
(df)