-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved database/querying (+ additional changes). #14
Improved database/querying (+ additional changes). #14
Conversation
svandenhoek
commented
Nov 5, 2019
- Repository split into 2 major directories: "app" & "database".
- App now uses an optimized TDB.
- SPARQL construct .rq scripts are included to create optimized TDB from full one.
- Designed with DisGeNET v6 in mind with phenotype-disease annotiations from DisGeNET v5 (instead of purely v5).
- Orphanet HOOM data is now also used to find relevant information.
- Results now only use GDA scores of GDAs matched with relevant diseases instead of all diseases that matched to found genes.
- Fixed multiple maven build warnings.
- General code improvements (such as for the BiologicalEntityCollection).
- Split README into seperate ones (and made adjustments according to the changes).
- Some code was commented as it is currently not used (code causing errors f.e. due to existing unit tests were not commented).
- Adjustments to outputwriter due to optimized TDB including less information (some information is not stored/retrieved anymore).
- Added separate .rq files with SPARQL select queries which can be used with Apache Jena's tdbquery for testing/validation (optimized vs full query/TDB).
…d TDB (for speed/equality output).
…ts to associated test classes).
… now uses custom TDB.
…ame results as original TDB/query.
…tySets that missing previously) to remove "unchecked cast" warning. Solution might be less efficient.
} | ||
|
||
protected void setCompareValue(int compareValue) { | ||
this.compareValue = requireNonNull(compareValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compareValue is non-null because it is a primitive. The requireNonNull check always passes. Did you intend to check whether compareValue is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was simply a tendency to overvalidate things without considering it might have been unnecessary. Will fix.
database/README.md
Outdated
|
||
[jena_download]: https://jena.apache.org/download/index.cgi | ||
[jena_configure]: https://jena.apache.org/documentation/tools/#setting-up-your-environment | ||
[disgenet_rdf_v6_dump]: http://rdf.disgenet.org/download/v5.0.0/disgenetv5.0-rdf-v5.0.0-dump.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Points to v5 resources instead of v6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
database/README.md
Outdated
## Creating optimized TDB | ||
|
||
1. Create a directory to store optimized `.ttl` files in. | ||
2. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/hpo.rq 1> /path/to/optimized/ttl/hpo.ttl` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db_creation --> database/optimized_construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to vibe/database/sparql_queries/optimized_construct/
.
|
||
1. Download the data. | ||
2. Rename `owlapi.xrdf` to `owlapi.xml` (otherwise `tdbloader2` will give `org.apache.jena.riot.RiotException: Failed to determine the content type`) | ||
3. Run `tdbloader2 --loc /path/to/initial/TDB /path/to/disgenet_v6/dump/*.ttl /path/to/disgenet_v5/pda.ttl /path/to/disgenet_v5/phenotype.ttl /path/to/disgenet_v5/void.ttl /path/to/sio-release.owl /path/to/owlapi.xml` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tdbloader for windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Special paragraph for Windows users was added.
4. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/gene.rq 1> /path/to/optimized/ttl/gene.ttl` | ||
5. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/gda.rq 1> /path/to/optimized/ttl/gda.ttl` | ||
6. Run `tdbquery --loc=/path/to/initial/TDB/ --query=/path/to/vibe/db_creation/sparql_queries/source.rq 1> /path/to/optimized/ttl/source.ttl` | ||
7. Run `tdbloader2 --loc /path/to/store/optimized/TDB /path/to/optimized/ttl/*.ttl /path/to/sio-release.owl` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note that when you receive the following error:
org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1
then the encoding of the ttls might be incorrect and needs to be changed to UTF-8 manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added F.A.Q. with this information.
@@ -19,7 +21,7 @@ | |||
/** | |||
* The HGNC (HUGO Gene Nomenclature Committee) name. | |||
*/ | |||
private String symbol; | |||
// private String symbol; // Currently unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why not remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest leaving it in for now and doing a general code cleanup when:
- A more definitive output writer is implemented.
- This project is added to sonarcloud.
@@ -81,14 +81,16 @@ public void run() throws IOException { | |||
BufferedWriter writer = getWriter(); | |||
|
|||
// Writes header. | |||
writer.write("gene" + getSeparator() + "diseases" + getSeparator() + "highest GDA score" + | |||
getSeparator() + "DSI" + getSeparator() + "DPI"); | |||
// Currently dsi & dpi are not retrieved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why not remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See response above.
Assert.assertEquals(collection.getByDisease(diseases[1]), null); | ||
} | ||
|
||
//TODO: More tests for basic java.util.Collection functionalities! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to create an issue for your TODOs to prevent them sticking around forever. Better: resolve the todo now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove the TODO. Methods that are more complicated than simply doing the exact same action on the combinationsMap
or combinationsMap.keySet()
should now all have at least one unit-test.