Skip to content

Commit

Permalink
Improve README
Browse files Browse the repository at this point in the history
  • Loading branch information
lszeremeta committed Aug 24, 2020
1 parent d48de76 commit 27819e1
Showing 1 changed file with 20 additions and 8 deletions.
28 changes: 20 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
[SDF](https://pubs.acs.org/doi/abs/10.1021/ci00007a012) parser written in Java running from command-line interface (CLI). SDFEater not only ~~eats~~ parse your SDF files, but also can add additional data to the output.

## Publications and resources

If you need more detailed information, take a look at these publications and resources. There you will find detailed description of the parser, performance tests and example Cypher outputs.

1. Ł. Szeremeta, "SDFEater: A Parser for Chemoinformatics Formats"
Expand All @@ -15,32 +16,38 @@ Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 15. I
4. D. Tomaszuk, “chemskos”. figshare, 29-Aug-2018 [Online]. Available: https://doi.org/10.6084/m9.figshare.7022144.

## How to start?

Simply download one of the [ready to use JAR file](https://github.com/lszeremeta/SDFEater/releases) from project releases. You can also [clone this repository](https://help.github.com/articles/cloning-a-repository/) and build the project yourself.

### Build project yourself

1. Clone this repository:
```

```shell
git clone https://github.com/lszeremeta/SDFEater.git
```

2. Build SDFEater using [Apache Maven](https://maven.apache.org/):
```

```shell
cd SDFEater
mvn clean package
```

Built JAR files can be found in the _target_ directory.

## Example usage
```

```shell
java -jar SDFEater-version-jar-with-dependencies.jar -i ../examples/chebi_special_char_test.sdf -f cypher -up
```

Example above reads SDF input file, adds periodic table data for atoms, try to replace chemical database IDs with URL and give [Cypher](https://neo4j.com/developer/cypher-query-language/) file in the output.

In _examples_ directory you can find example SDF files based on data from [ChEBI](https://www.ebi.ac.uk/chebi/init.do) ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)) and [DrugBank open structures](https://www.drugbank.ca/releases/latest#open-data) ([CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/)) databases.

## CLI options

Running SDFEater without parameters displays help.

* `-i,--input <arg>` - input SDF file path (required)
Expand All @@ -49,7 +56,9 @@ Running SDFEater without parameters displays help.
* `-u,--urls` - try to generate full database URLs instead of IDs (enabled in `cvme`)

## Output formats

You can specify the output format using `-f,--format`. Available output formats:

* `cypher` - [Cypher](https://neo4j.com/developer/cypher-query-language/) molecule, atoms, bonds and relation ready to [import to the Neo4j graph database](https://neo4j.com/developer/kb/export-sub-graph-to-cypher-and-import/),
* `cvme` - [CVME](http://cs.aalto.fi/en/current/events/2017-09-22-002/) file format based on SKOS,
* `smiles` - plain text SMILES (if available in the molecule property)
Expand All @@ -64,16 +73,19 @@ You can specify the output format using `-f,--format`. Available output formats:
* `microdata` - Simple HTML with [Microdata](https://www.w3.org/TR/microdata/) (based on [MolecularEntitly](https://bioschemas.org/types/MolecularEntity/) type)

## Used open source projects
- [Apache Commons CLI](https://github.com/apache/commons-cli) as CLI controller ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),
- [Gson](https://github.com/google/gson) as periodic table JSON parser ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),
- [periodic-table](https://github.com/andrejewski/periodic-table) - base JSON periodic table file ([ISC License](https://choosealicense.com/licenses/isc/)),
- [Apache Jena](https://jena.apache.org/) - for some output formats ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),
- [Apache Commons Text](https://commons.apache.org/proper/commons-text/) - to HTML escape for RDFa and Microdata formats ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)).

* [Apache Commons CLI](https://github.com/apache/commons-cli) as CLI controller ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),
* [Gson](https://github.com/google/gson) as periodic table JSON parser ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),
* [periodic-table](https://github.com/andrejewski/periodic-table) - base JSON periodic table file ([ISC License](https://choosealicense.com/licenses/isc/)),
* [Apache Jena](https://jena.apache.org/) - for some output formats ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),
* [Apache Commons Text](https://commons.apache.org/proper/commons-text/) - to HTML escape for RDFa and Microdata formats ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)).

The sample SDF files in the examples directory are based on data from [ChEBI](https://www.ebi.ac.uk/chebi/init.do) ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)) and [DrugBank](https://www.drugbank.ca/releases/latest#open-data) open structures ([CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/)) databases.

## Contribution

Would you like to improve the SDFEater? Great! We are waiting for your help and suggestions. If you are new in open source contributions, read [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/).

## License

Distributed under [MIT license](https://github.com/lszeremeta/chebi-sdf-parser/blob/master/LICENSE.txt).

0 comments on commit 27819e1

Please sign in to comment.