Skip to content

Commit

Permalink
Bugfix for handling single fragment in Mol Extractor input data (#19)
Browse files Browse the repository at this point in the history
* Bugfix for handling a single fragment in input data of RDKit Molecule
Extractor node (Fixes #18).

Single fragments are treated as expected, resulting in a single output
row. If the sanitization option is on, it will be sanitized as multiple
fragments would.
Updated documentation and regression tests.
Additionally, added JRebel startup parameter for test preparation launch
file.

* Updated copyright information
  • Loading branch information
manuelschwarze authored and greglandrum committed Apr 27, 2017
1 parent 6c301e4 commit f445971
Show file tree
Hide file tree
Showing 4 changed files with 872 additions and 866 deletions.
Original file line number Diff line number Diff line change
@@ -1,56 +1,57 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE knimeNode PUBLIC "-//UNIKN//DTD KNIME Node 2.0//EN" "http://www.knime.org/Node.dtd">
<knimeNode icon="./default.png" type="Manipulator">
<name>RDKit Molecule Extractor</name>

<shortDescription>
Splits up fragment molecules contained in a single RDKit molecule cell and extracts these molecules into separate cells.
</shortDescription>

<fullDescription>
<intro>
Splits up disconnected fragment molecules contained in a single RDKit molecule cell
and extracts these molecules into separate cells. If the input molecule contains only one fragment
or if the input cell is empty (missing), the input cell will be used as result with the
appropriate reference column. It can either be used with an input table or based on
flow variable input for the molecules and their format. Supported molecule formats are
RDKit Mol cells (when connecting an input table), SMILES, MOL and SDF. <br></br>
Please be aware that auto-conversion (e.g for SMILES input) may fail when connecting an input table. <br></br>
The Advanced Tab offers different options to treat conversion failures, empty input cells
and zero-atom molecules (empty molecules). You may configure the node to fail, to generate empty cells
with or without warning, or to skip the input with or without warning.<br></br>
The node can be used for instance after a Quickform Molecule Input node, which brings up a sketcher in the
KNIME Web Portal. When the user draws multiple molecules at once this node will split up the users
input into multiple molecules.
</intro>
<option name="Table Input - RDKit Mol column">The input column with RDKit Molecules, which may contain multiple disconnected fragments to be extracted.</option>
<option name="Table Input - Reference column (e.g. an ID)">The column to be used as reference column. Its values are assigned to
to the cells with the extracted molecules. You may use the Row ID here or set it to None, in which case the reference column will not be added.</option>
<option name="Variable/Data Input - Molecules">A textual representation of the molecules to be split. Usually, you will
attach a flow variable to control this setting, e.g. coming from a Molecule Sketcher. This setting will only
be used, if no table is connected.</option>
<option name="Variable/Data Input - Format">The format of the molecules: MOL, SDF or SMILES are supported.
This setting will only be used, if no table is connected.</option>
<option name="Output - Column name for extracted molecules">The name to be used for the new column used to store extracted molecules.</option>
<option name="Output - Column name for copied reference data">The name to be used for the new reference column used to store the reference values.</option>
<option name="Advanced - Sanitize fragments">Flag to determine, if fragments shall be sanitized when being extracted.
Selecting this option may lead to additional fragmentation errors. Default is false.</option>
<option name="Advanced - How to react on conversion errors">Define here, how the node shall behave, if
an input molecule to be processed could not be converted into an RDKit molecule. Usually, this is the case
if the molecule is invalid. By default, the node
will generate a missing cell and no warning.
If conversion requires special treatment, you may use the RDKit From Molecule node
to perform the conversion before executing this node. It offers different options for conversion.</option>
<option name="Advanced - How to react on empty (missing) cells">Define here, how the node shall behave, if
an empty (missing) cell is used as input. By default, the node
will generate a missing cell and no warning.</option>
<option name="Advanced - How to react on empty (zero atom) molecules">Define here, how the node shall behave, if
an empty molecule is used as input. This means that the molecule has zero atoms. By default, the node
will generate a missing cell and no warning.</option>
</fullDescription>

<ports>
<inPort index="0" name="Input table with RDKit Molecules">Input table with RDKit Molecules, which may contain disconnected fragment molecules.</inPort>
<outPort index="0" name="Result table with extracted RDKit Molecules">Output table with extracted fragment molecules.</outPort>
</ports>
</knimeNode>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE knimeNode PUBLIC "-//UNIKN//DTD KNIME Node 2.0//EN" "http://www.knime.org/Node.dtd">
<knimeNode icon="./default.png" type="Manipulator">
<name>RDKit Molecule Extractor</name>

<shortDescription>
Splits up fragment molecules contained in a single RDKit molecule cell and extracts these molecules into separate cells.
</shortDescription>

<fullDescription>
<intro>
Splits up disconnected fragment molecules contained in a single RDKit molecule cell
and extracts these molecules into separate cells, also sanitizing these molecules if desired.
If the input cell is empty (missing), the input cell will be used as result with the
appropriate reference column. If the input molecule contains only one fragment it will
result in a single row. The node can either be used with an input table or based on
flow variable input for the molecules and their format. Supported molecule formats are
RDKit Mol cells (when connecting an input table), SMILES, MOL and SDF. <br></br>
Please be aware that auto-conversion (e.g for SMILES input) may fail when connecting an input table. <br></br>
The Advanced Tab offers different options to treat conversion failures, empty input cells
and zero-atom molecules (empty molecules). You may configure the node to fail, to generate empty cells
with or without warning, or to skip the input with or without warning.<br></br>
The node can be used for instance after a Quickform Molecule Input node, which brings up a sketcher in the
KNIME Web Portal. When the user draws multiple molecules at once this node will split up the users
input into multiple molecules.
</intro>
<option name="Table Input - RDKit Mol column">The input column with RDKit Molecules, which may contain multiple disconnected fragments to be extracted.</option>
<option name="Table Input - Reference column (e.g. an ID)">The column to be used as reference column. Its values are assigned to
to the cells with the extracted molecules. You may use the Row ID here or set it to None, in which case the reference column will not be added.</option>
<option name="Variable/Data Input - Molecules">A textual representation of the molecules to be split. Usually, you will
attach a flow variable to control this setting, e.g. coming from a Molecule Sketcher. This setting will only
be used, if no table is connected.</option>
<option name="Variable/Data Input - Format">The format of the molecules: MOL, SDF or SMILES are supported.
This setting will only be used, if no table is connected.</option>
<option name="Output - Column name for extracted molecules">The name to be used for the new column used to store extracted molecules.</option>
<option name="Output - Column name for copied reference data">The name to be used for the new reference column used to store the reference values.</option>
<option name="Advanced - Sanitize fragments">Flag to determine, if fragments shall be sanitized when being extracted.
Selecting this option may lead to additional fragmentation errors. Default is false.</option>
<option name="Advanced - How to react on conversion errors">Define here, how the node shall behave, if
an input molecule to be processed could not be converted into an RDKit molecule. Usually, this is the case
if the molecule is invalid. By default, the node
will generate a missing cell and no warning.
If conversion requires special treatment, you may use the RDKit From Molecule node
to perform the conversion before executing this node. It offers different options for conversion.</option>
<option name="Advanced - How to react on empty (missing) cells">Define here, how the node shall behave, if
an empty (missing) cell is used as input. By default, the node
will generate a missing cell and no warning.</option>
<option name="Advanced - How to react on empty (zero atom) molecules">Define here, how the node shall behave, if
an empty molecule is used as input. This means that the molecule has zero atoms. By default, the node
will generate a missing cell and no warning.</option>
</fullDescription>

<ports>
<inPort index="0" name="Input table with RDKit Molecules">Input table with RDKit Molecules, which may contain disconnected fragment molecules.</inPort>
<outPort index="0" name="Result table with extracted RDKit Molecules">Output table with extracted fragment molecules.</outPort>
</ports>
</knimeNode>
Loading

0 comments on commit f445971

Please sign in to comment.