Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ld_dir(...) using graph URI in global.graph even when specified explicitly #638

Open
randykerber opened this issue Mar 23, 2017 · 1 comment

Comments

@randykerber
Copy link

I'm passing a ld_dir( dir, filePattern, graphUri ) command into isql.

In the directory I am loading data from, there is a global.graph file with the default graph
URI ("http://rdf.ncats.nih.gov/opddr") for data files in that directory.

Even though in the ld_dir() commands passed in I am specifying the third argument, Virtuoso is
ignoring that argument and instead using the value in the 'global.graph' file.

This happens even for the file named '2015 pubchem_pd2_assay.ttl.gz' where there is a '2015
pubchem_pd2_assay.ttl.gz.graph' file with the correct graph URI.

The same behavior is seen both in Ubuntu and with virtuoso running in docker on MacOS.

Below is some sample output from a bash shell:

Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.4.2.3217-pthreads as of Mar 17 2017
Compiled for Linux (x86_64-unknown-linux-gnu)
Copyright (C) 1998-2016 OpenLink Software


ubuntu$ ls -l /staging/ncats/data

-rw-rw-r-- 1 ubuntu ubuntu    30 Jan 14  2016 global.graph
-rw-rw-r-- 1 ubuntu ubuntu   848 Oct 13  2015 npcpd2_assay.ttl.gz
-rw-rw-r-- 1 ubuntu ubuntu  1034 Oct 13  2015 npcpd2_bao.ttl.gz
-rw-rw-r-- 1 ubuntu ubuntu 21460 Oct 13  2015 npcpd2_substance.ttl.gz
-rw-rw-r-- 1 ubuntu ubuntu  1442 Oct 13  2015 pubchem_pd2_assay.ttl.gz
-rw-rw-r-- 1 ubuntu ubuntu    38 Jan 14  2016 pubchem_pd2_assay.ttl.gz.graph

ubuntu$ cat /staging/ncats/data/global.graph
http://rdf.ncats.nih.gov/opddr

ubuntu$ cat /staging/ncats/data/pubchem_pd2_assay.ttl.gz.graph
http://rdf.ncats.nih.gov/opddr/pubchem

before calling 'ld_dir()', no entries in DB.DBA.load_list table for files in /staging/ncats/data.

ubuntu$ isql 1111 dba dba exec="select ll_file, ll_graph from DB.DBA.load_list where ll_file like '/staging/ncats%';"

Connected to OpenLink Virtuoso
Driver: 07.20.3217 OpenLink Virtuoso ODBC Driver
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
Type HELP; for help and EXIT; to exit.
ll_file                                                                           ll_graph
VARCHAR NOT NULL                                                                  VARCHAR
_______________________________________________________________________________


0 Rows. -- 0 msec.

ubuntu$ isql 1111 dba dba exec="ld_dir( '/staging/ncats/data', 'pubchem_pd2_assay.ttl.gz', 'http://rdf.ncats.nih.gov/opddr/pubchem' );"

Done. -- 3 msec.

ubuntu$ isql 1111 dba dba exec="ld_dir( '/staging/ncats/data', 'npcpd2_substance.ttl.gz', 'http://rdf.ncats.nih.gov/opddr/substance' );"

Done. -- 0 msec.

Both files loaded into URI in the global.graph file despite URIs passed to 'ld_dir()'

ubuntu$ isql 1111 dba dba exec="select ll_file, ll_graph from DB.DBA.load_list where ll_file like '/staging/ncats%';"

ll_file                                               ll_graph
_______________________________________________________________________________

/staging/ncats/data/npcpd2_substance.ttl.gz           http://rdf.ncats.nih.gov/opddr
/staging/ncats/data/pubchem_pd2_assay.ttl.gz          http://rdf.ncats.nih.gov/opddr

2 Rows. -- 1 msec.
@HughWilliams
Copy link
Collaborator

This is expected behaviour as detailed in the docs at:

https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader#Bulk loading process

ie .graph file take precedences over the graph IRI specified in ld_dir() ...

Note also the pubchem_pd2_assay.ttl.gz.graph file name you specify is incorrect and should be pubchem_pd2_assay.ttl.graph i.e. ignore the .gz compression extension as it will be uncompress for loading at which point its name will be pubchem_pd2_assay.ttl ... and is why its contents were not used when ld_dir() was run in the example you gave ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants