Skip to content

Latest commit

 

History

History
executable file
·
451 lines (299 loc) · 24.2 KB

README.md

File metadata and controls

executable file
·
451 lines (299 loc) · 24.2 KB

Build Status

Flame

Flame is a flexible framework supporting predictive modeling and similarity search within the eTRANSAFE (http://etransafe.eu) project.

Flame allows to:

  • Easily develop machine-learning models, for example QSAR-like models, starting from annotated collections of chemical compounds stored in standard formats (i.e. SDFiles)
  • Transfer new models into a production environment where they can be used by web services to predict the properties of new compounds.

Flame can be used in comand mode or using a web based GUI. The code for the GUI is accessible here.

Installation (binaries and docker)

We provide Windows and Linux installers for performing local installations which include the graphic interface.

The latest versons can be downloaded here:

These versions can be run using an script which starts a locally installed web server, accessible from a web browser at address http://localhost:8000

A docker container (https://www.docker.com/), fully configured can be downloaded from DockerHub and installed using:

docker run -d -p 8010:8000 mpastorphi/flame

Then, the Flame GUI will be accesible from a web browser at address http://localhost:8010

Please note that the port of this address is defined in the command line above and can be easily customized.

It is also possible to use an existing local folder for storing the models and the predictions generated by Flame. Let's assume you wish to use 'c:\flame_repo' as the local flame repository. Start by creating three folders inside named 'models', 'predictions', and 'spaces'. Then, run the following command:

docker run -d -p 8010:8000 -v c:\flame_repo:/data mpastorphi/flame

Then, as in the previous example, you can acces the Flame GUI from a web broser at http://localhost:8010

Documentation

A Flame walkthrough, showing the main features is accesible here

A collection of short videos illustrate how Flame can be used for:

  • predict a sigle molecule here
  • predict sketching the input structure here
  • profile a collection of molecules here
  • build a simple model here
  • document a model here

Flame is described in the following open-access article:

Flame: an open source framework for model development, hosting, and usage in production environments

Manuel Pastor, José Carlos Gómez-Tamayo & Ferran Sanz 

Journal of Cheminformatics volume 13, Article number: 31 (2021)

(https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00509-z)

Installing from source code

Flame can be used in most Windows, Linux or macOS configurations, provided that a suitable execution environment is set up. We recommend, as a fist step, installing the Conda package and environment manager. Download a suitable Conda or Anaconda distribution for your operative system from here

Download the repository:

git clone https://github.com/phi-grib/flame.git

Go to the repository directory

cd flame

and create the conda environment with all the dependencies and extra packages (numpy, RDKit...):

conda env create -f environment.yml

Once the environment is created type:

source activate flame

to activate the environment.

Conda environments can be easily updated using a new version of the environment definition

conda env update -f new_environment.yml

Flame must be installed as a regular Python package. From the flame directory type (note the dot at the end):

pip install . 

or

python setup.py install

For development, use pip with the -e flag or setup with develop instead of install. This will made accesible the latest changes to other components (eg. flame_API)

pip install -e .

or

python setup.py develop

Configuration

After installation is completed, you must run the configuration command to configure the directory where flame will place the models and chemical spaces. If Flame has not been configured previously the following command

flame -c config

will suggest a default directory structure following the XDG specification in GNU/Linux, %APPDATA% in windows and ~/Library/Application Support/flame_models in Mac OS X.

To specify a custom path use the -d parameter to enter the root folder where the models and chemical spaces will be placed:

flame -c config -d /my/custom/path

will set up the model repository to /my/custom/path/models, the chemical spaces repository to /my/custom/path/spaces and the predictions to /my/custom/path/predictions.

Once Flame has been configured, the current setting can be displayed using again the command

flame -c config

As a fallback, Flame can also be configured using the following command

flame -c config -a silent

This option sets up the models, spaces and predictions repositories within the Flame installation directory (flame\flame\models, flame\flame\spaces, flame\flame\predictions). Unlike other options, this command does not ask permision to the end-user to create the directories or set up the repositories and is used internally by automatic installers and for software development.

Main features

  • Native support of most common machine-learning algorithms, including rich configuration options and facilitating the model optimization.
  • Easy creation of chemical spaces for similarity search, using fingerprints or molecular descriptors.
  • Support for any standard formatted input: from a tsv table to a collection of compounds in SMILES or SDFile format.
  • Multiple interfaces adapted to the needs of different users: as a web service, for end-user prediction, as a full featured GUI for model development, as command line, integration in Jupyter notebooks, etc.
  • Support for parallel processing.
  • Support for multilevel models: the output of a model can be used as input for other models.
  • Integrated model version management.

Quickstarting

Flame provides a simple command-line interface flame.py, which is useful for accessing its functionality and getting acquainted with its use.

You can run the following commands from any terminal, in a computer where flame has been installed and the environment (flame) was activated (source activate flame in Linux, activate flame in Windows)

Let's start creating a new model:

flame -c manage -a new -e MyModel

This creates a new entry in the model repository and the development version of the model, populating these entries with default options. The contents of the model repository are shown using the command.

flame -c manage -a list

Building a model only requires entering an input file formatted for training one of the supported machine-learning methods. In the case of QSAR models, the input file can be an SDFile, where the biological property is annotated in one of the fields.

The details of how Flame normalizes the structures, obtains molecular descriptors and applies the machine-learning algorithm are defined in a parameters file (parameter.yaml) which now contains default options. These can be changed as we will describe later, but for now let's use the defaults to obtain a Random Forest model on a series of chemical compounds annotated with a biological property in the field <activity>. For this example we will use the file PXR_train.sdf that can be found in the \data folder:

flame -c build -e MyModel -f PXR_train.sdf

After a few seconds, the model is built and a summary of the model quality is presented in the screen. This model is immediately accessible for predicting the properties of new compounds. This can be done (for the example compound rofecoxib.sdf) using the command:

flame -c predict -e MyModel -v 0 -f rofecoxib.sdf

And this will show the properties predicted for the compounds in the query SDFile.

The parameters used for building the models can be inspected using the following command:

flame -c manage -e MyModel -a parameters

In order to customize the model building we need to pass as an argument of build a file containing any change we want to introduce. This is what we call a "delta" file. Delta files can be easily generated by redirecting the output of the above command to a text file...

flame -c manage -e MyModel -a parameters > delta.txt

... and then editing it. The new, edited file can be used in the build command as follows:

flame -c build -e MyModel -f series.sdf -p delta.txt

For model documentation we need to obtain a delta file which will already include some information extracted from both the parameters and the quality metrics from model building. Other fields are empty as they requiere of manual filling (ie: institution info or model interpretation). Delta file documentation can be obtained by executing:

flame -c manage -e MyModel -a documentation > delta.txt

The file delta.txt can be edited to include all the required information. After the edition, changes can be made persistent by executing the following command:

flame -c manage -e MyModel -a documentation -t delta.txt

Flame can also use as input a TSV file, containing all the X and Y values required to build the model. This file must have the following format: columnn must be separated by a single tab. The first row must contain variable names. One row per compound. Molecule names are optional but, if present, must be placed in the first column. SMILES are optional, but can be inserted in any column with the label SMILES (all capitals). The activity must be placed in a column with the label specified by the parameter TSV_activity.

The commands for building a model or predicting from a TSV file are identical to the ones used with SDFiles. Please make sure that in the parameter file, the input_type is set to data.

flame -c build -e MyModel -f series.tsv 
flame -c predict -e MyModel -f query.tsv 

In the above commands we specified the model version used for the prediction. So far we only have a model in the development folder (version 0). This version will be overwritten every time we develop a new model for this endpoint. Let's imagine that we are very satisfied with our model and want to store it for future use. We can obtain a persistent copy of it with the command

flame -c manage -a publish -e MyModel

This will create model version 1. We can list existing versions for a given endpoint using the list command mentioned below

flame -c manage -e MyModel -a list

Now, the output says we have a published version of model MyModel.

Imagine that the model is so good you want to send it elsewhere, for example a company that wants to obtain predictions for confidential compounds in their own computing facilities. The model can be exported using the command

flame -c manage -a export -e MyModel

This creates a very compact file with the extension .tgz in the local directory. It can be sent by e-mail or uploaded to a repository in the cloud from where the company can download it. In order to use it, the company can easily install the new model using the command

flame -c manage -a import -f MyModel.tgz

And then the model is immediately operative and able to produce exactly the same predictions we obtain in the development environment

To test the similarity search capabilities of Flame create a new chemical space:

flame -c manage -a new -s MySpace

This creates a new entry in the spaces repository and the development version of the chemical space, populating these entries with default options.

Now provide the collection of compounds to include in the chemical space as a SDFile and set up the parameters (e.g. the molecular descriptors used to characterize it) using a delta file, as described above for the models.

You can obtain the current parameters by using the command:

flame -c manage -s MySpace -a parameters > delta.txt

The file delta.txt can be edited and then the new parameters can be applied by making reference to the edited file in the sbuild command, as follows:

flame -c sbuild -s MySpace -f series.sdf -p delta.txt

Once it was built, this chemical space can be used to search compounds similar to a given query compounds in an efficient way.

flame -c search -s MySpace -v 0 -f query.sdf -p similarity.yaml

The file query.sdf can contain the chemical structure of one or many compounds. The file similarity.yaml must define the metric used for the search, the distance cutoff and the maximum number of similars to extract per query compound. The last two fields can be left empty to avoid applying these limits.

Flame commands

Command Description
-c/ --command Action to be performed. Acceptable values are build, predict, profile, sbuild, search and manage
-e/ --endpoint Name of the model which will be used by the command. This name is defined when the model is created for the fist time with the command -c manage -a new
-s/ --space Name of the chemical space which will be used by the command. This name is defined when the chemical space is created for the fist time with the command -c manage -a new
-v/ --version Version of the model, typically an integer. Version 0 refers to the model development "sandbox" which is created automatically upon model creation
-a/ --action Management action to be carried out. Acceptable values are list, new, kill, publish, remove, export and import. The meaning of these actions and examples of use are provided below
-f/ --infile Name of the input file used by the command. This file can correspond to the training data (build) or the query compounds (predict)
-m/ --multi Name of a yaml file with endpoint names and versions for multiple predictions, used by profile command
-p/ --parameters Name of an input file used to pass a set of parameters used to train a model (build) or to performa a similarity search (search)
-t/ --documentation_file Name of an input file used to pass documentation information using the command (manage -a documentation)
-inc/ --incremental indicates that the input file must not replace any existing training series and, instead, the compound will be added
-h/ --help Shows a help message on the screen

Management commands deserve further description:

Management commands

Command Example Description
new flame -c manage -a new -e NEWMODEL Creates a new entry in the model repository named NEWMODEL
kill flame -c manage -a kill -e NEWMODEL Removes NEWMODEL from the model repository. Use with extreme care, since the program will not ask confirmation and the removal will be permanent and irreversible
publish flame -c manage -a publish -e NEWMODEL Clones the development version, creating a new version in the model repository. Versions are assigned sequential numbers
remove flame -c manage -a remove -e NEWMODEL -v 2 Removes the version specified from the NEWMODEL model repository
list flame -c manage -a list Lists the models present in the repository and the published version for each one. If the name of a model is provided, lists only the published versions for this model
info flame -c manage -e MODEL -a info Shows summary information about the characteristics of model MODEL
parameters flame -c manage -e MODEL -a parameters Shows a list of the main modeling parameters used by build to generate model MODEL
series flame -c manage -e MODEL -a series Download a copy of the training series used to build MODEL as "training_series.sdf"
documentation flame -c manage -e MODEL -a documentation Shows a list with the main documentation information of model MODEL. When called with parameter -t, it can be used to add new documentation information
export flame -c manage -a export -e NEWMODEL Exports the model entry NEWMODE, creating a tar compressed file NEWMODEL.tgz which contains all the versions. This file can be imported by another flame instance (installed in a different host or company) with the -c manage import command
import flame -c manage -a import -f NEWMODEL.tgz Imports file NEWMODEL.tgz, typically generated using command -c manage -a export creating model NEWMODEL in the local model repository
refresh flame -c manage -a refresh -e MODEL Rebuilds all model versions within the MODEL tree. If called without -e argument it will rebuild all models present in the current repository

Flame GUI

You can install Flame_API (https://github.com/phi-grib/flame_API) to access most of the functionalities using a simple web application.

Please refer to the manual page of Flame_API for further information

Technical details

Using Flame

Flame was designed to be used in different ways, using diverse interfaces. For example:

  • Using a web GUI
  • Using the flame.py command described above
  • As a Python package, making direct calls to the high-level objects predict, build or manage
  • As a Python package, making calls to the lower level objects idata, apply, learn, odata

Developing models

Typically, Flame models are developed by modeling engineers. This task requires importing an appropriate training series and defininig the model building workflow.

Model building can be easily customized with the Flame modeling GUI or by modifying the parameters defined in a command file (called parameters.yaml) by passing a file with the new parameter values at building time (using parameter -p/--parameters, as descrive above). Then, the model can be built using the flame.py build command, and its quality can be assessed in an iterative process which is repeated until optimum results are obtained. This task can also be carried out making calls to the objects mentioned above from an interactive Python environment, like a Jupyter notebook. A full documentation of the library can be obtained running Doxygen on the root directory.

Advanced users can customize the models by editting the objects idata_child, appl_child, learn_child and odata_child present at the model/dev folder. These empty objects are childs of the corresponding objects called by flame, and it is possible to override any of the parents' methods simply by copying and editing these whitin the childs' code files.

Models can be published to obtain persistent versions, usable for predicton in the same environment, or exported for using them in external production environments, as described above.

Incremental re-training of existing models

Existing model can be re-built using the option -inc (or --incremental) when the model is built to add the compounds present in the input file to the existing training series.

For example, imagine 'MyModel' is a model generated using a series of 1000 compounds and 'series.sdf' contains a collection of 500 additional compounds

flame -c build -e MyModel -f series.sdf -inc

This command will add all the compounds present in the file 'series.sdf' at the end of the existing training series, thus generating a new model with 1500 compounds.

In this process no checking for dupplicate molecules or any other test is carried out.

Refreshing models

Models built with very old versions of Flame can be updated using the refresh command. This command will use existing training series and modeling parameters to rebuild completely the models. It is possible to refresh a single model version

flame -c manage -a refresh -e MyModel -v 1

all versions of a single model

flame -c manage -a refresh -e MyModel

or all models present in the current model repository

flame -c manage -a refresh

Please note that rebuilding a large number of models can be a very long and computer intensive process

Documenting models

When a new model is created an empty documentation template is generated in the model folder. All the fields related with the modeling methodology and the model quality evaluation are automatically completed when the model is built, but other fields (e.g. author name, institution, mechanism, result interpretation) require manual user input. The mechanism for completing this template is similar to the method used to complete the parameters. The command

flame -c manage -e MyModel -a documentation > MyModel-documentation.yaml

can be used to dump the half-filled template to a plain text file, which can be edited to complete the missing information. Then, once completed, we can use the command

flame -c manage -e MyModel -a documentation -t MyModel-documentation.yaml

to introduce the existing information in the internal model documentation template.

The model documentation can be shown in the screen using the first command without output redirectioning

flame -c manage -e MyModel -a documentation

Internally, the documentation mimics the fields of the OECD-QMRF reports.

Predicting using models

Models built in Flame can be used for obtaining predictions using diverse methods. We can use the command mode interface with a simple call:

flame -c predict -e MyModel -v 1 -f query.sdf

This allows to integate the prediction in scripts, or workflow tools like KNIME and Pipeline Pilot.

It is also possible to run multiple predictions on a single input file. For this, we must write a YAML file indicating which models and versions must be run. For example, the following file MULTI.YAML indicates that the profiling will use the models PXR, version 2 and BBB, versions 3

	endpoints   : ['PXR','BBB']
	versions    : [2,3]

and then run the profile with the following command:

flame -c profile -m MULTI.YAML -f query.sdf

Also, the models can run as prediction web-services. These services can be consumed by the stand-alone web GUI provided and described above or connected to a more complex platform, like the one currently in development in the eTRANSAFE project.

Licensing

Flame was produced at the PharmacoInformatics lab (http://phi.upf.edu), in the framework of the eTRANSAFE project (http://etransafe.eu). eTRANSAFE has received support from IMI2 Joint Undertaking under Grant Agreement No. 777365. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and the European Federation of Pharmaceutical Industries and Associations (EFPIA).

Alt text Alt text

Copyright 2020 Manuel Pastor (manuel.pastor@upf.edu)

Flame is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 3.

Flame is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Flame. If not, see http://www.gnu.org/licenses/.