Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix query parameter processing to more correctly match PostgreSQL's lexer #315

Merged
merged 4 commits into from Aug 14, 2014

Conversation

Emill
Copy link
Contributor

@Emill Emill commented Aug 9, 2014

Fixes all the problems mentioned in #309.
I also separated the "states" of the state machine so lookup of state doesn't have to be executed for every character, which should enhance performance.
I also combined multi-statement identifying into the parameter processing routine so the string doesn't have to be scanned twice.

@glenebob
Copy link
Contributor

glenebob commented Aug 9, 2014

Hey Emill,

I've looked at this briefly, and it looks pretty good. It is amazingly readable considering the complexity of the routine. I like the idea of separate nested loops for each token type, to minimize decision making on each character, and I definitely like the single-scan approach to detecting multiple commands. The tests all pass, but outside of that I haven't had time to do much serious checking of functionality and performance (although it certainly appears that it should perform very well). But I do have some initial observations; hopefully not too nitpicky :)

I'm kind of leery of goto's, but they do seem to work well here. However, the entire outer loop is implemented using goto's. Can we find a way to use the language's looping support for that? It doesn't look like it would be too difficult.

I noticed that the routine knows if its processing a prepared statement. However, when it discovers that it is processing a multi-command statement, it continues even if it's doing a prepared statement. I think it would be better to throw an exception at that point, rather than spending a potentially considerable amount of time working on a statement that is doomed. Then I don't think we even need to tell the caller if it's a multi-statement.

There's room for readability improvements via formatting. The project has a few "rules", if you will. For one, blocked if/loop statements are preferred:

if (something)
{
SomethingElse();
}

Also, the goto labels would stand out a little better if either they were out-dented, or if the code they point to was indented.

-Glen

@Emill
Copy link
Contributor Author

Emill commented Aug 9, 2014

Is the preferred coding style documented somewhere?

…n a query that is not allowed, quit immediately. Also improve readability.
@glenebob
Copy link
Contributor

glenebob commented Aug 9, 2014

I sure thought there was, but I can't find it :(

@franciscojunior, can you help here?

-Glen

@Emill
Copy link
Contributor Author

Emill commented Aug 9, 2014

Better now?

@franciscojunior
Copy link
Member

Is the preferred coding style documented somewhere?

Err...
Unfortunately, no :(

I tried to write something but couldn't get it done. It is my fault.

What we have is based on the code already written. As @glenebob pointed out, in Npgsql locked if/loop statements are preferred. Of course this "rule" accept exceptions and I think that readability is a very good reason to have an exception.

In the specific case of if blocks, I think it is more readable if we use a style like that:

if (something)
    something();
else
    somethingelse();

Only when we have more than one statement we would use curly braces:

if (something)
{
    something1();
    something2();
}
else
{
    somethingelse1();
    somethingelse2();
}

Then, when I checked the code, I noticed that we have mixed cases. :(

We definitely need a code style document. I'll create an issue so we can discuss about that and then write the document in our wiki.

@franciscojunior
Copy link
Member

I just created an issue where we can discuss about Npgsql code style: #316
All comments are very welcome.

@Emill
Copy link
Contributor Author

Emill commented Aug 9, 2014

Haha, you contradict each other :)

There's room for readability improvements via formatting. The project has a few "rules", if you will. For one, blocked if/loop statements are preferred:

if (something)
{
SomethingElse();
}

vs.

if (something)
    something();
else
    somethingelse();

I just rewrote it to the blocked style...

@franciscojunior
Copy link
Member

Haha, you contradict each other :)

Yeah, that wasn't my intention.
I notice that recent code is using what @glenebob said. I'm who is talking some nonsense. :(
Sorry about that.

@glenebob
Copy link
Contributor

glenebob commented Aug 9, 2014

Hi Emill,

I do like the formatting better now. The goto labels are much more obvious now, and that really helps. Thanks for doing that.

But I don't understand why you added the allowMultipleStatements parameter. It always equals (!prepare).

-Glen

@Emill
Copy link
Contributor Author

Emill commented Aug 9, 2014

It's always false for stored procedures, even for non-prepared statements, at least in the current version. Do we want to check that?
I mean, the ADO.NET documentation simply says the command text should be equal to the function name and nothing about semicolons.
Same thing for TableDirect. ADO.NET docs simply says put in the table name in the command text and nothing about multiple statements.

@Emill
Copy link
Contributor Author

Emill commented Aug 9, 2014

One thing I notice now: since commit 06328fc, if output parameters are specified in the query, they are simply removed from the query. Before that commit, they weren't touched. Is that a mistake or is it something intended?

Compare https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1613 (newer) and https://github.com/npgsql/Npgsql/blob/40b2076c46f49c6373d1d21b3be50e9a58e1bb55/src/Npgsql/NpgsqlCommand.cs#L1588 (older)

@glenebob
Copy link
Contributor

I think you're reading it wrong. It looks to me like it writes the param
name out if it didn't do a sub, no matter what kind of parameter. I can't
imagine it would have passed the tests if what you're saying is true.

Writes param name here:
https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1620

-Glen

On Sat, Aug 9, 2014 at 4:21 PM, Emill notifications@github.com wrote:

One thing I notice now: since commit 06328fc
06328fc,
if output parameters are specified in the query, they are simply removed
from the query. Before that commit, they weren't touched. Is that a mistake
or is it something intended?

Compare
https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1613
(newer) and
https://github.com/npgsql/Npgsql/blob/40b2076c46f49c6373d1d21b3be50e9a58e1bb55/src/Npgsql/NpgsqlCommand.cs#L1588
(older)


Reply to this email directly or view it on GitHub
#315 (comment).

@Emill
Copy link
Contributor Author

Emill commented Aug 11, 2014

Maybe I'm wrong, but in that version you are linking to, shouldn't line
1615 and 1613 switch places?

2014-08-11 17:17 GMT+02:00 Glen Parker notifications@github.com:

I think you're reading it wrong. It looks to me like it writes the param
name out if it didn't do a sub, no matter what kind of parameter. I can't
imagine it would have passed the tests if what you're saying is true.

Writes param name here:

https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1620

-Glen

On Sat, Aug 9, 2014 at 4:21 PM, Emill notifications@github.com wrote:

One thing I notice now: since commit 06328fc
<
06328fcab3ecf66835bde0feae9da2a0e6e8cb7c>,

if output parameters are specified in the query, they are simply removed
from the query. Before that commit, they weren't touched. Is that a
mistake
or is it something intended?

Compare

https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1613
(newer) and

https://github.com/npgsql/Npgsql/blob/40b2076c46f49c6373d1d21b3be50e9a58e1bb55/src/Npgsql/NpgsqlCommand.cs#L1588
(older)


Reply to this email directly or view it on GitHub
#315 (comment).


Reply to this email directly or view it on GitHub
#315 (comment).

@glenebob
Copy link
Contributor

Ah yes, now I see. I think you're right. But, how do all the tests pass?
Maybe we don't have tests for out params?

-Glen

On Mon, Aug 11, 2014 at 8:38 AM, Emill notifications@github.com wrote:

Maybe I'm wrong, but in that version you are linking to, shouldn't line
1615 and 1613 switch places?

2014-08-11 17:17 GMT+02:00 Glen Parker notifications@github.com:

I think you're reading it wrong. It looks to me like it writes the param
name out if it didn't do a sub, no matter what kind of parameter. I
can't
imagine it would have passed the tests if what you're saying is true.

Writes param name here:

https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1620

-Glen

On Sat, Aug 9, 2014 at 4:21 PM, Emill notifications@github.com wrote:

One thing I notice now: since commit 06328fc
<

06328fcab3ecf66835bde0feae9da2a0e6e8cb7c>,

if output parameters are specified in the query, they are simply
removed
from the query. Before that commit, they weren't touched. Is that a
mistake
or is it something intended?

Compare

https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1613

(newer) and

https://github.com/npgsql/Npgsql/blob/40b2076c46f49c6373d1d21b3be50e9a58e1bb55/src/Npgsql/NpgsqlCommand.cs#L1588

(older)


Reply to this email directly or view it on GitHub
#315 (comment).


Reply to this email directly or view it on GitHub
#315 (comment).


Reply to this email directly or view it on GitHub
#315 (comment).

@Emill
Copy link
Contributor Author

Emill commented Aug 11, 2014

We obviously don't have a test for that ;)
Den 11 aug 2014 18:11 skrev "Glen Parker" notifications@github.com:

Ah yes, now I see. I think you're right. But, how do all the tests pass?
Maybe we don't have tests for out params?

-Glen

On Mon, Aug 11, 2014 at 8:38 AM, Emill notifications@github.com wrote:

Maybe I'm wrong, but in that version you are linking to, shouldn't line
1615 and 1613 switch places?

2014-08-11 17:17 GMT+02:00 Glen Parker notifications@github.com:

I think you're reading it wrong. It looks to me like it writes the
param
name out if it didn't do a sub, no matter what kind of parameter. I
can't
imagine it would have passed the tests if what you're saying is true.

Writes param name here:

https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1620

-Glen

On Sat, Aug 9, 2014 at 4:21 PM, Emill notifications@github.com
wrote:

One thing I notice now: since commit 06328fc
<

06328fcab3ecf66835bde0feae9da2a0e6e8cb7c>,

if output parameters are specified in the query, they are simply
removed
from the query. Before that commit, they weren't touched. Is that a
mistake
or is it something intended?

Compare

https://github.com/npgsql/Npgsql/blob/06328fcab3ecf66835bde0feae9da2a0e6e8cb7c/src/Npgsql/NpgsqlCommand.cs#L1613

(newer) and

https://github.com/npgsql/Npgsql/blob/40b2076c46f49c6373d1d21b3be50e9a58e1bb55/src/Npgsql/NpgsqlCommand.cs#L1588

(older)


Reply to this email directly or view it on GitHub
#315 (comment).


Reply to this email directly or view it on GitHub
#315 (comment).


Reply to this email directly or view it on GitHub
#315 (comment).


Reply to this email directly or view it on GitHub
#315 (comment).

@roji
Copy link
Member

roji commented Aug 12, 2014

Unfortunately seems like we need to include a fix for this for 2.2, right? I mean, we can't knowingly release a version with no output param support?

@Emill
Copy link
Contributor Author

Emill commented Aug 12, 2014

I've fixed so it doesn't touch output parameters, so now they work as expected ;)
I too think this should be included in 2.2, especially because it supports E''-strings. And like I said in #309, there is an SQL injection vulnerability if we don't support E''-strings.

Is there anything else now that should be changed in this pull request? Otherwise I suggest it should be merged.

But I don't understand why you added the allowMultipleStatements parameter. It always equals (!prepare).

It's always false for stored procedures, even for non-prepared statements, at least in the current version. Do we want to check that?
I mean, the ADO.NET documentation simply says the command text should be equal to the function name and nothing about semicolons.
Same thing for TableDirect. ADO.NET docs simply says put in the table name in the command text and nothing about multiple statements.

comments on that?

@franciscojunior
Copy link
Member

I've fixed so it doesn't touch output parameters, so now they work as expected ;)
I too think this should be included in 2.2, especially because it supports E''-strings. And like I said in #309, there is an SQL injection vulnerability if we don't support E''-strings.

I also agree we should include it in 2.2 release.
+1

@franciscojunior
Copy link
Member

Is there anything else now that should be changed in this pull request? Otherwise I suggest it should be merged.

I think we should add some tests for this output parameter issue.

@glenebob
Copy link
Contributor

On Tue, Aug 12, 2014 at 5:17 AM, Emill notifications@github.com wrote:

I've fixed so it doesn't touch output parameters, so now they work as
expected ;)
I too think this should be included in 2.2, especially because it supports
E''-strings. And like I said in #309
#309, there is an SQL injection
vulnerability if we don't support E''-strings.

Is there anything else now that should be changed in this pull request?
Otherwise I suggest it should be merged.

But I don't understand why you added the allowMultipleStatements
parameter. It always equals (!prepare).

It's always false for stored procedures, even for non-prepared statements,
at least in the current version. Do we want to check that?
I mean, the ADO.NET documentation simply says the command text should be
equal to the function name and nothing about semicolons.
Same thing for TableDirect. ADO.NET docs simply says put in the table
name in the command text and nothing about multiple statements.

comments on that?

No, you're right. It looks good.

I'm really skittish on such a major change to go into 2.2 so quickly, but
it does seem like a very important bug fix.

+1 to adding tests for output parameters. That should have never gotten
past the test suite in the first place.

-Glen


Reply to this email directly or view it on GitHub
#315 (comment).

@Emill
Copy link
Contributor Author

Emill commented Aug 12, 2014

I added one test case for the problem with output parameters in the latest commit.

@Emill
Copy link
Contributor Author

Emill commented Aug 14, 2014

Merge?

@franciscojunior
Copy link
Member

Em 13/08/2014 22:09, "Emill" notifications@github.com escreveu:

Merge?

+1


Reply to this email directly or view it on GitHub.

Emill added a commit that referenced this pull request Aug 14, 2014
Fix query parameter processing to more correctly match PostgreSQL's lexer
@Emill Emill merged commit 85ef261 into npgsql:master Aug 14, 2014
@roji
Copy link
Member

roji commented Aug 14, 2014

I'm going to backport this into the 2.2 release branch, everyone seems to be in agreement we need to release 2.2 with this. We can do an rc2 and wait another week or two?

@roji roji added bug labels Aug 14, 2014
@roji roji added this to the 2.2 milestone Aug 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants