Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with query synthesis : ~/src/test segfaults #9

Open
louisjac opened this issue Jul 6, 2018 · 5 comments
Open

Problem with query synthesis : ~/src/test segfaults #9

louisjac opened this issue Jul 6, 2018 · 5 comments

Comments

@louisjac
Copy link

louisjac commented Jul 6, 2018

Hello,

I have tried to use the tool to generate queries but I have problems changing the use cases. My first problem it that gmark segfaults on various "use cases" I design. For instance, a minimal example where gmark segfaults on my machine (I supposed that only 1 conjunct is allowed but maybe not):

git clone https://github.com/graphMark/gmark.git
cd gmark
sed 's/<conjuncts min="3" max="4"/<conjuncts min="1" max="3"/' -i use-cases/test.xml
cd src
make
./test -c ../use-cases/test.xml -g ../demo/test/test-graph.txt -w ../demo/test/test-workload.xml -r ../demo/test/

Furthermore, I have tried several settings but I cannot manage to create recursive queries. It seems to me that the setting should be the star setting but it does not seem to work. For instance, none of the queries in the demo/ folder seems to include recursivity while the use-cases all have
<multiplicity star="0.5"/>

Thanks in advance !

@ang3ela
Copy link

ang3ela commented Jul 23, 2018

Hi,
you have to be careful with changing one parameter and leave the others unchanged in the workload configuration. In your example, you changed the number of conjuncts but what about the number of conjuncts and the multiplicity? Are they compatible?

Btw, star is the query shape, not the multiplicity.

You have to use the multiplicity parameter to obtain recursive queries. For instance, setting it to 0.5 means that roughly 50% of your queries will have Kleene-star.

And, yes we have recursive queries in the pre-defined query workloads. In SPARQL syntax, we bounded them with a fixed number. In OpenCypher, this is not the case.

I hope this helps!

Thanks for using gMark,
Angela

@louisjac
Copy link
Author

Hi, thanks for the answer.

you have to be careful with changing one parameter and leave the others unchanged in the workload configuration. In your example, you changed the number of conjuncts but what about the number of conjuncts and the multiplicity? Are they compatible?

Well, it is not very clear to me what controls what in the settings. From what I understood the queries are composed of:

  • N Path Patterns (PP)
  • each PP having the shape A or A^{,3} (I am not sure of the semantics of {,3} but I suppose it indicates that A should be repeated 0, 1, 2 or 3 times);
  • each A has the shape (w_1| ... | w_K);
  • each w_i has the shape p_1/.../p_L.

Therefore:

  • N controls the numbers of PP and is called conjuncts
  • K controls the number of disjunctions and thus is called disjuncts
  • L controls the number of predicates and is called length

and all these parameters seemed independent but I might be in the wrong; so do not hesitate to correct my interpretation.

Btw, star is the query shape, not the multiplicity.

You have to use the multiplicity parameter to obtain recursive queries. For instance, setting it to 0.5 means that roughly 50% of your queries will have Kleene-star.

Yes, as I mentionned in my first comment, the setting I was looking at was indeed "multiplicity star" but as you said, the SPARQL queries do not contain such stars but instead contain a "bounded star" using a custom extension of PP.

Louis

@ang3ela
Copy link

ang3ela commented Jul 27, 2018

Hi,

I would not call shape A or A^{,3} since it is not the shape parameter.

As I said, if you do not want multiplicity, you just need to set the multiplicity parameter to 0.

Notice that A^{,3} is only use in the SPARQL concrete syntax, whereas in other syntaxes, such as Cypher, we directly use A^. We set up the bound in order to be able to execute SPARQL queries in a popular engine (for the experiments in our paper).

Similarly, I would not call shape:

each A has the shape (w_1| ... | w_K);
each w_i has the shape p_1/.../p_L.

it is the syntax, and has no connection whatsoever with the shapes of queries.

Regarding your writing:

N controls the numbers of PP and is called conjuncts (CORRECT)
K controls the number of disjunctions and thus is called disjuncts (CORRECT)
L controls the number of predicates and is called length (NOT CORRECT)

L is in fact the length of concatenations in RPQs.

Example

abc (length is 3)

The parameters are somehow dependent of each other since they are constraints in the query workload generation process.

Remember that gMark can generate UCRPQs and with these three parameters you can control the number of U, the number of C and the length of your paths in the RPQ part of the query.

If you want Kleene-star, you specify it in the multiplicity parameter.

Also, you may want to not constraint that much the shapes of the queries.

Finally, if you send me the configuration file on which you get segfault, I can take a look.

Best regards,
Angela

@louisjac
Copy link
Author

louisjac commented Jul 28, 2018

Hi, thanks for the fast answer!

I understand what you are saying at shape but I was indeed only interested in SPARQL queries for the moment.

I don't really see the difference between what I understood regarding the length and what you are explaining. Maybe I was unclear: for me L controls the number of predicates per w_i (thus a/b/c has length 3) which seems to be also what you are explaining (but I might be missing something).

If that is true I do not really understand how all these parameters are interdependent as they seem to address different parts of the query.

Finally, if you send me the configuration file on which you get segfault, I can take a look.

In my first comment I gave such an example: you take the use-case/test.xml but you replace
conjuncts min="3" max="4" with conjuncts min="1" max="3" . It corresponds to this file
test.xml.gz.

Warm regards.
Louis

(edited because the anchors are invisible)

@ang3ela
Copy link

ang3ela commented Jul 30, 2018

Hi again,

honestly, it should work in this case since the nr. of conjuncts is still including values in the default configuration of test.

The only explanation and help I can provide at this point is to try with other schemas (social, shop and uniprot) since test is a kind of toy schema (only five labels). The schema of test is too simple and that is why gMark fails sometimes because it cannot apply all the configuration parameters.

Varying the parameters is a good direction to undertake.

Concerning recursive queries, ofc you can get those. In sparql, you replace the bounded value with * .

Best regards,
Angela

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants