Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BQL does not include IN expression #263

Open
marcoct opened this issue Oct 18, 2015 · 5 comments
Open

BQL does not include IN expression #263

marcoct opened this issue Oct 18, 2015 · 5 comments

Comments

@marcoct
Copy link
Collaborator

marcoct commented Oct 18, 2015

This command is not supported

bdb.execute('select * from satellites where Purpose IN ("Communications", "Reconnaissance")')
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/bayesdb.py", line 149, in execute
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 104, in <genexpr>
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 91, in <genexpr>
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 77, in parse_bql_phrases
bayeslite.parse.BQLParseError: Parse error: syntax error near Communications

But with sql_execute it is:

bdb.sql_execute('select * from satellites where Purpose IN ("Communications", "Reconnaissance")

In the longer-tmer, is the intention for SQL to co-exist alongside BQL with a separate entry point? Is there a plan for extending BQL to implement a superset of SQL syntax?

In the short-term, are we intending for users to use sql_execute() as well as execute()? Is there a list of SQL features that are not currently supported by BQL somewhere?

@marcoct
Copy link
Collaborator Author

marcoct commented Oct 18, 2015

This feature would be helpful to run things like this:

estimate dependence probability as dp from pairwise columns of satellites_cc where name0 IN ("Ant
icipated_Lifetime", "Contractor") AND name1 IN ("Purpose", "Eccentricity") 

Note that separately, for the above to be run, the user must be aware of the field names "name0" and "name1", which are (I think) currently undocumented.

@riastradh-probcomp
Copy link
Contributor

What is missing is IN with a collection of literals. This shouldn't be too hard to add, if anyone would like to take a stab at it:

  1. Add an AST type to ast.py for literal IN expressions, alongside the query IN expressions.
  2. Add a grammar rule or two to grammar.y, for the equality symbol.
  3. Add parse actions to parse.py for these grammar rules to make the AST nodes.
  4. Add a case to compile_expression in compiler.py to compile it to SQL.
  5. Add some test cases.

Each of these changes should be maybe three or four lines -- not hard, and a worthwhile tour of the parts of the BQL->SQL compiler.

@riastradh-probcomp
Copy link
Contributor

On documentation of columns in the fancy BQL queries: yes, that should go in the cheat sheet we discussed, until someone either determines that the current state of affairs is worth committing to or invents a better one.

On sql_execute: It is not going away until all of SQL is added to BQL, which may take a long time.

@axch
Copy link
Contributor

axch commented Oct 19, 2015

Labeling "help wanted" because the issue now seems to contain enough material for a new contributor to use it as a starter project.

@raxraxraxraxrax raxraxraxraxrax added this to the 0.1.4 milestone Nov 19, 2015
@gregory-marton
Copy link
Contributor

Question: do we actually compute the values correctly for this, even if the explicit lists were supported?

Would we want to revisit resampling for the restricted population, and running some analysis? Or giving the user the option to do that, especially if we're using a small sample of the available data for most of our analysis?

estimate dependence probability as dp from pairwise columns of satellites_cc where name0 IN ("Anticipated_Lifetime", "Contractor") AND name1 IN ("Purpose", "Eccentricity")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants