Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/vet: detect mismatch in no. of columns and expressions in sql queries #28611

Closed
agnivade opened this issue Nov 6, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@agnivade
Copy link
Member

commented Nov 6, 2018

What version of Go are you using (go version)?

$ go version
go version go1.11.2 linux/amd64

Consider a sql query like this -

_, err = db.Exec("insert into table (col1, col2) values ($1, $2)", param1, param2)

Now, it let's say that you get a quick feature request to add a new column, you do -

_, err = db.Exec("insert into table (col1, col2, col3) values ($1, $2)", param1, param2, param3)

Let's assume you are lazy programmer who doesn't write tests. So everything compiles fine and you deploy. Until you get this error in prod -

pq: INSERT has more target columns than expressions

All because you missed a $3.

Alternatively, you might add the $3 and forget to add the col3, in which case, the error is -

pq: INSERT has more expressions than target columns

This can also happen while removing a particular column from a query. Another scenario can be when you forget to add/remove the param3, although that has been a rarity for me. It is also difficult to stumble into this in an UPDATE query since you will have values in the format of col=$1 pairs.

But I have been bitten by the first 2 errors way too often, and wanted to check whether it's a good idea to have a vet check for this. Or whether it is feasible at all in the first place to do.

/cc @mvdan @kardianos

Please feel free to close if this doesn't make sense, or not feasible to do.

@mvdan

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

Sounds reasonable, and reminds me of the checks that already take place for fmt.Printf.

The check ticks the correctness vet requirement, since a different number of arguments is definitely a bug. Precision shouldn't be a problem either, as it should be easy to avoid false positives. Frequency is always a bit more hard to prove - do you have any proof of people running into this issue besides your own experience?

/cc @alandonovan @josharian

@agnivade

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

do you have any proof of people running into this issue besides your own experience?

I don't. Maybe we can spread this bug around and let people +1 or -1 the issue and gauge reaction.

@kardianos

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

This would be great for an external tool.

I'm not sure it could properly lex SQL server, pg, MySQL, Oracle, both named and indexed(@ $ :), understand comments (vendors support different comment styles), properly handle :: symbols but reading :N vars. It would need to handle T-SQL and PL/SQL. As soon as the sql string was not a constant it would need to ignore the check.

This would be fine in a limited case for an external tool.

@dgryski

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

The query strings are interpreted by the driver and are database specific. PostgreSQL uses $1, MySQL uses ?, SQLite can use ? but has others. Any check would need to understand SQL and dialects for all common servers in order to know if there's a problem at compile time.

@mvdan

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

I didn't know that database/sql had to work with many syntaxes - then perhaps it's too complex of a check for vet. I'm suddenly less sure that a precise check would be easy to do :)

I agree that an external tool could be a good start, to gather data about the check's viability and usefulness.

@agnivade

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

Yeah .. that's why I thought it wouldn't be feasible. Maybe I will create something just for postgres since I need it.

Closing. Thanks for the input everyone.

@agnivade agnivade closed this Nov 6, 2018

@alandonovan

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

It seems like a good application for a static checker. SQL is a fairly widely used standard package, and SQL queries would benefit from detection similar to printf format strings. That said, SQL is much less frequently used than printf, and printf misformatting often goes undetected because it does not return an error, whereas a malformed SQL query is destined to return an error and such errors are invariably checked, so your checker will uncover relatively few problems in practice.

If you proceed, I suggest you use the new golang.org/x/tools/go/analysis.Analyzer API so that the checker can be freely incorporated into vet or other tools. Take a look at the existing printf checker as a starting point. You should attempt to quickly skip the analyzer if the package does not import database/sql, or you should attempt to identify and check db.Exec wrapper functions similar to the way we identify and check printf wrapper functions in packages other than fmt.

@agnivade

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

Awesome, thanks Alan !

@agnivade

This comment has been minimized.

Copy link
Member Author

commented Jan 8, 2019

FYI - I went ahead and made this https://github.com/agnivade/sqlargs. It works only for postgres queries as of now. @alandonovan - Any feedback will be appreciated !

I didn't use the Fact type as I didn't see any need for it. From what I understand, it is needed when another pass needs info from the current pass, or tests need it. But I was able to run tests by just testing the Reportf output which gets caught by "//want " stanzas. Is this okay ?

P.S. I am thinking of writing a blog post on how to write a vet analyzer pass.

@alandonovan

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

Correct, facts are only needed in the situation you describe. In your scenario, you might want to use facts if you have functions that call db.Exec, and you want to check calls to them as if they were direct calls to db.Exec.

Quick review:

  • Consider using the inspector pass; it's faster than ast.Inspect and easier to use.
  • sel.Sel == nil can't happen
  • Rather than use arg[0].BasicLit and Unquote, use info.Types[arg0].Value, which will give you the constant value directly. Also, it works for any constant, such as k, or "foo"+"bar", not just string literals.
  • Style: for "if fnName". use a switch with 'case "Exec", "QueryRow", "Query"'.
  • call ptr.Elem.String only once as it alllocates memory. Better still, call it zero times by picking apart the pointer-to-Named and checking that the name is one of (FB, Tx, Stmt) and the package is database/sql.
    This may be an unnecessary optimization here because you've already applied enough conditions, but especially in the early checks of a visit function, avoiding allocation can really make a big difference.
  • delete FactTypes, you don't need it.
@agnivade

This comment has been minimized.

Copy link
Member Author

commented Jan 8, 2019

Thanks for taking the time to review !

info.Types[arg0].Value sounds great !

For the switch case, I had thought of it. Since I am actually checking for the inverse of the condition, doing case "Exec", "QueryRow", "Query" becomes a mess of nested code, which I wanted to avoid.

FactTypes - yes I will remove that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.