Skip to content

SDFEater 2.0.0: A journey into deep waters ⛵

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 21 Apr 23:04
· 139 commits to master since this release

The new and better SDFEater 2.0.0 is now available! Below you will find an overview of the most important changes since the last release.

🚀 New features

In the new version of SDFEater, you will find the following new features. You must try them out!

Subject type selector

Now you can select the preferred subject type for all formats excluding cypher, cvme, smiles, and inchi. You can use the -s,--subject option for this. Supported subject types are iri, uuid, and bnode. If you don't know what to choose, you can leave the default subject type (iri) and not use -s,--subject at all.

Base subject selector

You can also set your own IRI base for the iri subject type. You can override the default one ('https://example.com/molecule#entity') with the -b, --base option if you want. For each base IRI with #, an additional id attribute is added to the HTML output formats.

Dataset type support

Dataset type support has been added for JSON-LD HTML, JSON-LD, RDFa, and Microdata formats. Thanks to this, the generated datasets can be even more visible to search engines, e.g. in the Google Dataset Search. Read more at Google Developers page.

📈 Improvements

The new version brings some improvements. Below you will find the most important of them.

Java 8+ JARs

The base Java version of the project has been downgraded. Previously shared JAR files were built using Java 11 and required Java 11 or later. Now all you need is Java 8 or later to run SDFEater from JAR files.

New supported SDF keys

The new supported SDF keys are ChEBI ID, DATABASE_ID, DRUGBANK_ID, and ChEBI Name. On a special Wiki page, you will find a list of currently supported SDF keys.

Better error handling

You can expect better error handling in this release. For example, when you select an unsupported format, you will get a clear message, and help is displayed. SDFEater also sends appropriate exit codes so that the operating system can interpret errors when it occurs.

Output format improvements

The output of JSON-LD and JSON-LD from HTML has been completely rewritten. For these formats (as it was for RDFa and Microdata), when SDFEater encounters a molecule to which no keys have been matched, it does not include it in the output. This release also introduces improved removal of special characters from HTML.

Better documentation

In this release, README has been enhanced with a Quick start section, which should make it easier for you to start with SDFEater. Additionally, some of the README stuff was moved to the project Wiki. You can find useful information there as well.

💔 Breaking changes

These changes are not backward compatible.

Simplified set of options

Options -p,--periodic and -u,--urls were primarily used in the cypher output format. To keep things simple, these options are no longer available in SDFEater. Instead of switches, you now have 4 separate output formats (cypher, cypheru, cypherp, and cypherup):

  • cypher - Cypher molecule, atoms, bonds and relation ready to import to the Neo4j graph database,
  • cypheru - the same as cypher option, but try to generate full database URLs instead of IDs,
  • cypherp - the same as cypher option, but add additional atoms data from periodic table,
  • cypherup - the same as cypher option, but added URLs and additional atoms data from periodic table.

🕶️ Changes under the hood

The changes under the hood don't affect you directly, but you might find them interesting.

Rewritten GitHub Actions workflows

Workflows in GitHub Actions have been rewritten for better clarity and understanding.

Below are just some of the additional changes:

  • SDFEater is now being built and tested on 3 current versions of Java (8, 11, and 16) instead of one of them,
  • GitHub Actions builds and sends containers to Docker Hub instead of building it on Docker Hub,
  • Automated building and testing SDFEater now takes place not only in the master branch but also for pull requests and other branches,
  • The cache is used to build the project with Maven and Docker,
  • Waiting for concurrent jobs if needed.

If you are interested, you can see the workflows for the SDFEater project.

Code simplification

SDFEater was originally intended to support only one output format. Over time, SDFEater has expanded to support more of them. Currently, SDFEater is a powerful tool to convert SDF to various formats. A rewrite of the current code has long been planned to make it easier to add new formats and make the code more readable. This release is a small step forward in that direction.

This changelog contains only the most significant changes. Below you will find a list of all commits since the last release.

Commits

  • [36ff794]: Downgrade project to Java 8 for better Java compatibility (Łukasz Szeremeta)
  • [2912686]: Add build and test on multiple Java version on Ubuntu/MacOS/Windows (Łukasz Szeremeta)
  • [1019c6a]: Java 15 -> Java 16 in maven.yml (Łukasz Szeremeta)
  • [d201839]: Exit with status 2 on parse error (Łukasz Szeremeta)
  • [559edc4]: Add note about supported Java versions (Łukasz Szeremeta)
  • [d8f2a04]: Update dependencies (Łukasz Szeremeta)
  • [23e2612]: Simplify Cypher options set (Łukasz Szeremeta)
  • [3b33dad]: Change subject base URI for RDFa and Microdata formats (Łukasz Szeremeta)
  • [34c4f39]: Add id for molecule div tag (Łukasz Szeremeta)
  • [9e76ac9]: baseURI for all formats, Format enum (Łukasz Szeremeta)
  • [3887081]: Simplify File.parse method (Łukasz Szeremeta)
  • [33ab519]: Update README (Łukasz Szeremeta)
  • [00ed6fb]: Add subject type option (Łukasz Szeremeta)
  • [d249ad8]: Simplify options set and options parse (Łukasz Szeremeta)
  • [1d45446]: Load periodic table data only for cypherp or cypherup formats (Łukasz Szeremeta)
  • [5a0253b]: Use equals instead of == (Łukasz Szeremeta)
  • [505d424]: Add Dataset schema, JSON-LD rewrite, code cleanup (Łukasz Szeremeta)
  • [bd879ea]: http -> https (Łukasz Szeremeta)
  • [82b498a]: jsonldhtml output improvements (Łukasz Szeremeta)
  • [184af4c]: JSON-LD output fix (Łukasz Szeremeta)
  • [f8b9498]: Improve JSON-LD outputs for empty files (Łukasz Szeremeta)
  • [d0ad287]: Add new line after jsonld/jsonldhtml output (Łukasz Szeremeta)
  • [13ac9f0]: https -> http for Google Rich Results Test (Łukasz Szeremeta)
  • [2d3dbf0]: MolecularEntitly type -> profile with version (Łukasz Szeremeta)
  • [bcb69d5]: MolecularEntity typo fix in README (Łukasz Szeremeta)
  • [f62b083]: Add --base option (Łukasz Szeremeta)
  • [12de21d]: Use own htmlEscape, escape subjectBase for html (Łukasz Szeremeta)
  • [80573f1]: Disable HTML escaping for JSON-LD, rewrite htmlEscape (Łukasz Szeremeta)
  • [48db0e9]: Don't create molecule if no data inside it (JSON-LD) (Łukasz Szeremeta)
  • [610d0ca]: Improve README (Łukasz Szeremeta)
  • [0112757]: Add schema:url support (Łukasz Szeremeta)
  • [353c648]: Add "ChEBI Name" key support (Łukasz Szeremeta)
  • [a6d18d7]: Move manual build instructions to the Wiki (Łukasz Szeremeta)
  • [6bf7fa3]: Add Quick start (Łukasz Szeremeta)
  • [60102c5]: Add info about DrugBank SDF convert (Łukasz Szeremeta)
  • [8e0f71a]: Examples -> Additional examples (Łukasz Szeremeta)
  • [ee6786d]: README improvements (Łukasz Szeremeta)
  • [4986884]: Change ftp link to http link (Łukasz Szeremeta)
  • [b01263b]: Update README.md (Łukasz Szeremeta)
  • [65f2e27]: Add link to supported keys Wiki (Łukasz Szeremeta)
  • [7fdaba4]: Improve README (Łukasz Szeremeta)
  • [6214314]: Update README.md (Łukasz Szeremeta)
  • [2eb53c0]: Update README.md (Łukasz Szeremeta)
  • [be676c9]: Improve workflows (Łukasz Szeremeta)
  • [48e650c]: Turnstyle -> Wait for concurrent jobs (Łukasz Szeremeta)
  • [09484d0]: Cache local Maven repository (Łukasz Szeremeta)
  • [fc69245]: Add quotes for schema:temporal in JSON-LD outputs (Łukasz Szeremeta)
  • [4d293be]: Improve tests (Łukasz Szeremeta)
  • [b675822]: Change default licence to CC-BY 3.0 (Łukasz Szeremeta)
  • [c511826]: Bump to 2.0.0 (Łukasz Szeremeta)