Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting values using function during insert, update and querying (e.g. pgcrypto) #928

Open
mdedetrich opened this issue Jul 22, 2014 · 15 comments

Comments

@mdedetrich
Copy link
Contributor

added by @cvogt:
While the discussion was largely around crypto, the request of this ticket is to support applying db functions to values before insertion.

According to tminglei/slick-pg#82, due to slicks design, its not possible to implement something like pgcryptos encryption module.

In slick, you can't freely express SQL like the following

update set col1 = encrypt(?) from table1 where id = ?

because slick only accepts standard types (i.e long,int) instead of Column[T] in inserts/updates, it won't work with the SQL query above (it will just insert the serialized value of the type)

How hard would it be to implement this functionality, so stuff like database level encryption can be done using slick (I assume other databases also have similar encryption modules)

@cvogt
Copy link
Member

cvogt commented Jul 22, 2014

If you encrypt on the client side, you could already do that using http://slick.typesafe.com/doc/2.0.2/userdefined.html#scalar-types

Running a server-side function on the value before update/insert is not supported at the moment.

Besides, database-side encryption seems conceptually flawed. So doing it on the client is probably the right thing to do regardless.

Who are you trying to protect against? You DBA from seeing user passwords? A hacker who infiltrated the database server? Both could trivially enable some sort of logging, or replace the encrypt function by something that logs the keys and unencrypted values. Because you are sending unencrypted information and keys to the place, where you are intending to store the encrypted information. Doesn't sound particularly great.

pgcrypto may possibly provide some sort of protection against SQL injection attackers accessing encrypted information in plain text, but even that is not obviously certain and SQL injection vulnerabilities are more unlikely with Slick (you would have to circumvent type-safe api and SQL interpolation in order to achieve that).

pgcrypto can probably successfully protect you from people who can access external storage where you store a backup/dump of your database. But that's a very narrow use case. If you do client-side encryption the above use cases are all included for free (as far as anything in crypto can be considered free). (You still need to be protective of the place where you store the keys in any case.)

Also see:
http://dba.stackexchange.com/questions/24370/how-to-use-aes-encryption-in-postgresql
http://dba.stackexchange.com/questions/59154/postgresql-storing-encrypted-data-or-encrypt-database-with-pgcrypto-what-is-m
http://stackoverflow.com/questions/8054503/storing-encrypted-data-in-postgres

@mdedetrich
Copy link
Contributor Author

At our company we are currently looking at client side vs database side encryption and both have their pros and cons. One (fairly) big con in regards to using Scala types is that we are unable to use the json datatype (we not only do need to encrypt text fields, but JSON data structures as well, and PostGreSql has a JSON datatype which we use through https://github.com/tminglei/slick-pg). If we use database side encryption, we can still use the JSON datatype while it being encrypted, where as if we do client side encryption, we would have to store it as text and do all of the JSON processing client side.

pgcrypto can probably successfully protect you from people who can access external storage where you store a backup/dump of your database.

We actually fall into this use case, we will be storing a lot of sensitive financial information, and other people in the company will be reading through backups/dumps that are going to need to be encrypted

Long story short, we are storing a lot of sensitive data and we are just looking at different methods on encrypting the data. Also have a look at this for more reasons for pgcrypto encryption

http://stackoverflow.com/questions/9326264/is-there-some-security-advantage-to-using-pgcrypto-instead-of-partition-level-en

@cvogt
Copy link
Member

cvogt commented Jul 23, 2014

If we use database side encryption, we can still use the JSON datatype while it being encrypted

You mean something like that?

SELECT decrypt(someJson) as j FROM someTable WHERE j::json->'b' == 123

http://stackoverflow.com/questions/9326264/is-there-some-security-advantage-to-using-pgcrypto-instead-of-partition-level-en

You probably know this, but this compares partition level and of these points the only security features you do not loose, when doing server-side rather than client-side encryption are:

Your backups (dumps) are encrypted

For the things you chose to encrypt, yes.

You can give the DBA SQL level access without compromising data.

Probably, but I would take extra are in checking that this cannot be broken out of somehow, e.g. by overriding encrypt by a custom function in some way (not sure postgres allows that).

pgcrypto can probably successfully protect you from people who can access external storage where you store a backup/dump of your database.

We actually fall into this use case, we will be storing a lot of sensitive financial information, and other people in the company will be reading through backups/dumps that are going to need to be encrypted

If that is the only thing you are protecting against, a much simpler solution would be: strip the data people are not supposed to see from the dumps you hand to them. No encryption needed.

P.S. I know this encryption architecture stuff is not slick related really, but I have some background in security, so this is fun.

@mdedetrich
Copy link
Contributor Author

SELECT decrypt(someJson) as j FROM someTable WHERE j::json->'b' == 123

Yes that is precisely what I mean, and its actually one of the major reasons for us to use database level encryption

Probably, but I would take extra are in checking that this cannot be broken out of somehow, e.g. by overriding encrypt by a custom function in some way (not sure postgres allows that).

I don't think it is possible, it would definitely have come up as a major loophole in postgres if this was the case

If that is the only thing you are protecting against, a much simpler solution would be: strip the data people are not supposed to see from the dumps you hand to them. No encryption needed.

We actually have a lot of data that is secure, that is only one example. Without giving too much away (for NDA reasons), what is being built is a finance product that will be storing a lot of financial data, some will be queried very often, some wont be, and it will all be continuously backed up. There might be programs in other languages (like ruby or something else), that will also be quering this data, which is where the argument for client side encryption starts getting weaker

P.S. I know this encryption architecture stuff is not slick related really, but I have some background in security, so this is fun.

All good, as I said we are currently exploring options, and there are significantly good reasons why we would wan't to use pgcrypto. I am getting an impression that integrating something like pgrypto into slick (either directly or by an extension, such as through slick-pg) is going to be a lot of work.

I also don't think that slick should just cut off options for ideological reasons, there is reason why slick for example, provides a mechanism to drop to pure SQL. Theoretically speaking, you shouldn't do that at all, but there are good in reasons in rare circumstances why you would use pure SQL.

@cvogt
Copy link
Member

cvogt commented Jul 24, 2014

Probably, but I would take extra are in checking that this cannot be broken out of somehow, e.g. by overriding encrypt by a custom function in some way (not sure postgres allows that).

I don't think it is possible, it would definitely have come up as a major loophole in postgres if this was the case

I didn't mean it as a flaw of pgcrypt. With the right configuration I am sure it holds up to their claims. But what options are there for breaking security through wrong configuration. E.g. does security still hold, even if other postgres extensions allow filesystem access (e.g. http://www.postgresql.org/message-id/CAHyXU0zjQ=Fv7b+6inC2Ck9x5Z6s3JG0LXDySNysUCYAGcSJzQ@mail.gmail.com ), which may allow disabling the crypto plugin. Just saying these are things that need to be thought about even as a pgcrypto user.

There might be programs in other languages (like ruby or something else), that will also be quering this data, which is where the argument for client side encryption starts getting weaker

There are cross-language encryption standards like AES and PGP.

I also don't think that slick should just cut off options for ideological reasons, there is reason why slick for example, provides a mechanism to drop to pure SQL. Theoretically speaking, you shouldn't do that at all, but there are good in reasons in rare circumstances why you would use pure SQL.

I wasn't arguing that Slick shouldn't support it. It just doesn't right now and I was suggesting alternative out of unrelated interest. But I suppose a case can be made that pgcrypto covers a narrow use case, when most people are better off with client side encryption, so it has probably lower priority. Are there other use cases, where calling a SQL function on values before update is required?

@cvogt cvogt changed the title Limitation in slick in expressing queries Converting values using function during insert, update and querying (e.g. pgcrypto) Jul 24, 2014
@cvogt cvogt added this to the Future milestone Jul 24, 2014
@mdedetrich
Copy link
Contributor Author

I didn't mean it as a flaw of pgcrypt. With the right configuration I am sure it holds up to their claims. But what options are there for breaking security through wrong configuration. E.g. does security still hold, even if other postgres extensions allow filesystem access (e.g. http://www.postgresql.org/message-id/CAHyXU0zjQ=Fv7b+6inC2Ck9x5Z6s3JG0LXDySNysUCYAGcSJzQ@mail.gmail.com ), which may allow disabling the crypto plugin. Just saying these are things that need to be thought about even as a pgcrypto user.

Of course, my main point is that in our case, there is a fairly compelling reason to use PGCrypto

There are cross-language encryption standards like AES and PGP.

Indeed there are, but I have had issues in the past with regards to encryption working a bit weirdly across different languages

I wasn't arguing that Slick shouldn't support it. It just doesn't right now and I was suggesting alternative out of unrelated interest. But I suppose a case can be made that pgcrypto covers a narrow use case, when most people are better off with client side encryption, so it has probably lower priority. Are there other use cases, where calling a SQL function on values before update is required?

I am not that much aware of other SQL implementations where you call an SQL function before update/insert etc etc. Maybe Oracle maybe has something like this?

@ayuudee
Copy link

ayuudee commented Aug 5, 2014

Hey, I could've used this underlying feature also.

In my case, a long standing database bug caused me to need to wrap any rendering of a value in a function. In the end, I couldn't find a clean way to do that, but I can see the general utility of this as a hedge against db peculiarities (for which the general alternative solution is to go raw SQL -- this offers a compromise for special cases).

Specifically, something like:

case class Feature(point: Point)
class FeatureTable(tag: Tag) extends Table[Feature](tag, "feature") {
  def point = column[Point]("pt", O.DBType("GEOGRAPHY(Point, 4326)"))
  def * = (point) <>(Pt.tupled, Pt.unapply)
}
object Features extends TableQuery(new FeatureTable(_))

// Plus a ColumnType like the below that's properly in-scope for FeatureTable.
abstract class GeometryMapper[A <: Point](override val sqlType: Int = java.sql.Types.OTHER,
                                             override val sqlTypeName: String = "geography")
  extends DriverJdbcType[A] {

  override def setValue(v: A, p: PositionedParameters) {
    val v = toDbStringForm(v)
    p.setString(v)    
  }

  // ... Removed for brevity, but something somewhere needs to wrap the value.
  // lets presume it's wrapped with db_function(node)
}

Case 1. Filter.

Features.filter(_.point === someOtherPoint)
// would be.
"select * from feature where point = db_function(?)"

Case 2. Insert.

Features += aPointIWantToPersist
// would be.
"insert into feature(point) values(db_function(?))"

@arobertn
Copy link

For some reason this issue turned into a debate about the merits of a specific usage, but as ayuudee also points out, the request is simply to remove the present asymmetry where custom functions can only be used on the query side but not the insert/update side. Is there a reason to have the asymmetry, or is it mainly an implementation issue?

And does anyone else have other specific use cases to motivate it? I have to admit finding my way here because of wanting to use pgcrypto myself, but it seems like there would be other common situations where it would be handy to insert/update with DB function usage.

@cvogt
Copy link
Member

cvogt commented Nov 13, 2014

@arobertn A discussion about specific usage is useful to determine the importance of this feature and the applicability to the suggested use cases.

As I pointed out, I think the usefulness for crypto is minor with the risk of actually being harmful, tempting people into solutions that provide a false sense of security.

@ayuudee's use case is certainly a good one. In the core team, I think we have other more pressing things though. The fallback is using plain SQL as usual.

@arobertn
Copy link

OK. FWIW, I'm no security expert but do feel that good security is a matter of clogging as many avenues of incursion as possible, and that using pgcrypto can defend against an important class of cases where static access to DB data is gained, but not process access. Whether someone derives a false sense of security from that is a different problem, though if you are saying there is something especially seductive and misleading about it this mechanism in particular, that is unfortunate.

@jstultz
Copy link

jstultz commented Feb 6, 2015

One general case where it seems like this is needed (at least, I would like to have it right now, and if there's a better way I'd love to hear about it) is when using SQL column types that are not directly supported by slick, and using a database function is necessary to translate to/from the column type to something that slick does understand. For example, using the POINT column type on a MySQL table, I can use a Column[String] in slick, using the asText MySQL function to convert the value to a string on reads, and geomFromText on writes. I can then write a projection to convert the string to and from some scala Point type (even better would be a MappedColumnType to use the Point type directly, but combining MappedColumnType and a database function is something I have no idea how to do).

However, the limitation mentioned in this issue prevents me from doing so cleanly. Ideally I would be able to specify a single projection that dictated use of asText on read and geomFromText on write; failing that, separate projections for read and insert, but not even that works.

@cvogt
Copy link
Member

cvogt commented Feb 9, 2015

That's a good use case. Better extensibility for Slick. Maybe @szeiger can squeeze this one in some time.

@jstultz
Copy link

jstultz commented Feb 11, 2015

I should also mention some related issues, as I see them. I may be misunderstanding some of this, or be unaware of some functionality, so please correct if I'm mistaken about any of this.

1: projections must define conversions in both directions; thus if I want to provide a projection that uses a SQL function on a column on read, I must also define the reverse mapping, even though it will never work at runtime for writes, and will only fail at runtime, not compile time, which is always a bummer. It would be nice to be able to define read-only projections.

2: If we suppose that using functions on writes is supported, there's still a need to be able to specify different functions for read and for write. Unless a function is it's own inverse, using the same function to read and write results in the (likely undesirable property) that performing a read and then writing back the same object results in a change to the database row.

Much of this can probably avoided (at least in the case of the POINT type) if there's a way to extend slick functionality in a different way than described above to support SQL additional column types without applying SQL functions at read/write time. I'm not positive that there isn't, but I haven't found documentation indicating that there is, either.

@virusdave
Copy link
Contributor

virusdave commented May 5, 2017

Here's another use case for this, which has bitten us:

We have a sequence that is used as input to a function which generates primary keys across multiple tables (for various reasons, that you can assume are legitimate for the purpose of this discussion). Thus our inserts tend to look like

INSERT INTO table(pkey, f1, f2, ...)
VALUES (my_pkey_function(NEXTVAL(my_pkey_seq)), 123, 456, ...)

This bug prevents us from using anything other than the sql interpolated strings for inserts, which while reasonable for this minimal example, can be pretty complex and unwieldy in practice.

@jstultz
Copy link

jstultz commented May 6, 2017

I should update in case anyone reaches this issue having had the same issues that I had with MySQL geometry types: I worked around that problem recently by mapping those columns to byte arrays rather than strings, and performing all conversions in scala in mapped column types and set parameter implicits; no SQL functions needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants