Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved database/querying (+ additional changes). #14

Merged
merged 63 commits into from
Nov 12, 2019

Conversation

svandenhoek
Copy link
Collaborator

  • Repository split into 2 major directories: "app" & "database".
  • App now uses an optimized TDB.
  • SPARQL construct .rq scripts are included to create optimized TDB from full one.
  • Designed with DisGeNET v6 in mind with phenotype-disease annotiations from DisGeNET v5 (instead of purely v5).
  • Orphanet HOOM data is now also used to find relevant information.
  • Results now only use GDA scores of GDAs matched with relevant diseases instead of all diseases that matched to found genes.
  • Fixed multiple maven build warnings.
  • General code improvements (such as for the BiologicalEntityCollection).
  • Split README into seperate ones (and made adjustments according to the changes).
  • Some code was commented as it is currently not used (code causing errors f.e. due to existing unit tests were not commented).
  • Adjustments to outputwriter due to optimized TDB including less information (some information is not stored/retrieved anymore).
  • Added separate .rq files with SPARQL select queries which can be used with Apache Jena's tdbquery for testing/validation (optimized vs full query/TDB).

…tySets that missing previously) to remove "unchecked cast" warning. Solution might be less efficient.
}

protected void setCompareValue(int compareValue) {
this.compareValue = requireNonNull(compareValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareValue is non-null because it is a primitive. The requireNonNull check always passes. Did you intend to check whether compareValue is zero?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was simply a tendency to overvalidate things without considering it might have been unnecessary. Will fix.


[jena_download]: https://jena.apache.org/download/index.cgi
[jena_configure]: https://jena.apache.org/documentation/tools/#setting-up-your-environment
[disgenet_rdf_v6_dump]: http://rdf.disgenet.org/download/v5.0.0/disgenetv5.0-rdf-v5.0.0-dump.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Points to v5 resources instead of v6.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

## Creating optimized TDB

1. Create a directory to store optimized `.ttl` files in.
2. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/hpo.rq 1> /path/to/optimized/ttl/hpo.ttl`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_creation --> database/optimized_construct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to vibe/database/sparql_queries/optimized_construct/.


1. Download the data.
2. Rename `owlapi.xrdf` to `owlapi.xml` (otherwise `tdbloader2` will give `org.apache.jena.riot.RiotException: Failed to determine the content type`)
3. Run `tdbloader2 --loc /path/to/initial/TDB /path/to/disgenet_v6/dump/*.ttl /path/to/disgenet_v5/pda.ttl /path/to/disgenet_v5/phenotype.ttl /path/to/disgenet_v5/void.ttl /path/to/sio-release.owl /path/to/owlapi.xml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tdbloader for windows

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special paragraph for Windows users was added.

4. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/gene.rq 1> /path/to/optimized/ttl/gene.ttl`
5. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/gda.rq 1> /path/to/optimized/ttl/gda.ttl`
6. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/source.rq 1> /path/to/optimized/ttl/source.ttl`
7. Run `tdbloader2 --loc /path/to/store/optimized/TDB /path/to/optimized/ttl/*.ttl /path/to/sio-release.owl`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that when you receive the following error:

org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1

then the encoding of the ttls might be incorrect and needs to be changed to UTF-8 manually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added F.A.Q. with this information.

@@ -19,7 +21,7 @@
/**
* The HGNC (HUGO Gene Nomenclature Committee) name.
*/
private String symbol;
// private String symbol; // Currently unused.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why not remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leaving it in for now and doing a general code cleanup when:

  • A more definitive output writer is implemented.
  • This project is added to sonarcloud.

@@ -81,14 +81,16 @@ public void run() throws IOException {
BufferedWriter writer = getWriter();

// Writes header.
writer.write("gene" + getSeparator() + "diseases" + getSeparator() + "highest GDA score" +
getSeparator() + "DSI" + getSeparator() + "DPI");
// Currently dsi & dpi are not retrieved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why not remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See response above.

Assert.assertEquals(collection.getByDisease(diseases[1]), null);
}

//TODO: More tests for basic java.util.Collection functionalities!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to create an issue for your TODOs to prevent them sticking around forever. Better: resolve the todo now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove the TODO. Methods that are more complicated than simply doing the exact same action on the combinationsMap or combinationsMap.keySet() should now all have at least one unit-test.

@dennishendriksen dennishendriksen merged commit 37b87e9 into molgenis:master Nov 12, 2019
@svandenhoek svandenhoek deleted the tdb_design_app branch December 10, 2019 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants