Skip to content

Commit

Permalink
Ensure uploaded secrets are stable (#24325)
Browse files Browse the repository at this point in the history
* Ensure uploaded secrets are stable

Fixes: #23034

Background:
Uploaded secrets are stored as bytes in our application db since cloud
doesn't have a filesystem. To make db connections we stuff them into
temporary files and use those files.

We also are constantly watching for db detail changes so we can
recompose the connection pool. Each time you call
`db->pooled-connection-spec` we check if the hash of the connection spec
has changed and recompose the pool if it has.

Problem:
These uploaded files have temporary files and we make new temp files
each time we call `db->pooled-connection-spec`. So the hashes always
appear different:

```clojure
connection=> (= x y)
true
connection=> (take 2
                   (clojure.data/diff (connection-details->spec :postgres (:details x))
                                      (connection-details->spec :postgres (:details y))))
({:sslkey
  #object[java.io.File 0x141b0f09 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_1388256635324085910.tmp"],
  :sslrootcert
  #object[java.io.File 0x6f443fac "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_9248342447139746747.tmp"],
  :sslcert
  #object[java.io.File 0xbb13300 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_17076432929457451876.tmp"]}
 {:sslkey
  #object[java.io.File 0x6fbb3b7b "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_18336254363340056265.tmp"],
  :sslrootcert
  #object[java.io.File 0x6ba4c390 "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_11775804023700307206.tmp"],
  :sslcert
  #object[java.io.File 0x320184a0
  "/var/folders/1d/3ns5s1gs7xjgb09bh1yb6wpc0000gn/T/metabase-secret_10098480793225259237.tmp"]})
```

And this is quite a problem: each time we get a db connection we are
making a new file, putting the contents of the secret in it, and then
considering the pool stale, recomposing it, starting our query. And if
you are on a dashboard, each card will kill the pool of the previously
running cards.

This behavior does not happen with the local-file path because the
secret is actually the filepath and we load that. So the file returned
is always the same. It's only for the uploaded bits that we dump into a
temp file (each time).

Solution:
Let's memoize the temp file created by the secret. We cannot use the
secret as the key though because the secret can (always?) includes a
byte array:

```clojure
connection-test=> (hash {:x (.getBytes "hi")})
1771366777
connection-test=> (hash {:x (.getBytes "hi")})
-709002180
```

So we need to come up with a stable key. I'm using `value->string` here,
falling back to `(gensym)` because `value->string` doesn't always return
a value due to its cond.

```clojure
(defn value->string
  "Returns the value of the given `secret` as a String.  `secret` can be a Secret model object, or a
  secret-map (i.e. return value from `db-details-prop->secret-map`)."
  {:added "0.42.0"}
  ^String [{:keys [value] :as _secret}]
  (cond (string? value)
        value
        (bytes? value)
        (String. ^bytes value StandardCharsets/UTF_8)))
```

Why did this bug come up recently?
[pull/21604](#21604) gives some
light. That changed
`(hash details)` -> `(hash (connection-details->spec driver details))`

with the message

> also made some tweaks so the SQL JDBC driver connection pool cache is
> invalidated when the (unpooled) JDBC spec returned by
> connection-details->spec changes, rather than when the details map
> itself changes. This means that changes to other things outside of
> connection details that affect the JDBC connection parameters, for
> example the report-timezone or start-of-week Settings will now properly
> result in the connection pool cache being flushed

So we want to continue to hash the db spec but ensure that the spec is
stable.

* typehint the memoized var with ^java.io.File

* Switch memoization key from String to vector of bytes

Copying comment from Github:

When you upload a sequence of bytes as a secret, we want to put them in
a file once and only once and always reuse that temporary file. We will
eventually hash the whole connection spec but I don't care about
collisions there. It's only given the same sequence of bytes, you should
always get back the exact same temporary file that has been created.

So i'm making a function `f: Secret -> file` that given the same Secret
always returns the exact same file. This was not the case before
this. Each uploaded secret would return a new temporary file with the
contents of the secret each time you got its value. So you would end up
with 35 temporary files each with the same key in it.

An easy way to get this guarantee is to memoize the function. But the
secret itself isn't a good key to memoize against because it contains a
byte array.

If the memoization key is the byte-array itself, this will fail because
arrays have reference identity:

```clojure
user=> (= (.getBytes "hi") (.getBytes "hi"))
false
```

So each time we load the same secret from the database we get a new byte
array, ask for its temp file and get a different temp file each time.

This means that memoization cannot be driven off of the byte array. But
one way to gain this back quickly is just stuff those bytes into a
string, because strings compare on value not identity. This is what is
currently in the PR (before this change). I was banking on the
assumption that Strings are just opaque sequences of bytes that will
compare byte by byte, regardless of whether those bytes make sense.

But you've pointed out a good point that maybe that is flirting with
undefined behavior. If we use the hash of the contents of the byte array
as the memoization key with (j.u.Arrays/hashCode array), then we open
ourselves up the (albeit rare) case that two distinct secret values hash
to the same value. This sounds really bad. Two distinct secrets (think
two ssh keys) but both would map to only one file containing a single
ssh key.

An easy way to have the value semantics we want for the memoization is
just to call (vec array) on the byte array and use this sequence of
bytes as the memoization key. Clojure vectors compare by value not
reference. So two secrets would return the same file if and only if the
sequence of bytes are identical, in which case we would expect the files
to be identical. This gives me the same guarantee that I was wanting
from the String behavior I used initially without entwining this with
charsets, utf8, etc.
  • Loading branch information
dpsutton committed Jul 27, 2022
1 parent 60cb22c commit 7359927
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 4 deletions.
21 changes: 20 additions & 1 deletion src/metabase/models/secret.clj
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
(ns metabase.models.secret
(:require [cheshire.generate :refer [add-encoder encode-map]]
[clojure.core.memoize :as memoize]
[clojure.java.io :as io]
[clojure.string :as str]
[clojure.tools.logging :as log]
Expand Down Expand Up @@ -54,7 +55,7 @@
(->> (filter #(= :secret (keyword (:type %))) conn-props)
(reduce (fn [acc prop] (assoc acc (:name prop) prop)) {})))

(defn value->file!
(defn value->file!*
"Returns the value of the given `secret` instance in the form of a file. If the given instance has a `:file-path` as
its source, a `File` referring to that is returned. Otherwise, the `:value` is written to a temporary file, which is
then returned.
Expand Down Expand Up @@ -100,6 +101,24 @@
(.write out v)))
tmp-file)))

(def
^java.io.File
^{:arglists '([{:keys [connection-property-name id value] :as secret} driver?])}
value->file!
"Returns the value of the given `secret` instance in the form of a file. If the given instance has a `:file-path` as
its source, a `File` referring to that is returned. Otherwise, the `:value` is written to a temporary file, which is
then returned.
`driver?` is an optional argument that is only used if an ostensibly existing file value (i.e. `:file-path`) can't be
resolved, in order to render a more user-friendly error message (by looking up the display names of the connection
properties involved)."
(memoize/memo
(with-meta value->file!*
{::memoize/args-fn (fn [[secret _driver?]]
;; not clear if value->string could return nil due to the cond so we'll just cache on a key
;; that is unique
[(vec (:value secret))])})))

(defn get-sub-props
"Return a map of secret subproperties for the property `connection-property-name`."
[connection-property-name]
Expand Down
24 changes: 22 additions & 2 deletions test/metabase/driver/sql_jdbc/connection_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[metabase.driver.sql-jdbc.connection :as sql-jdbc.conn]
[metabase.driver.sql-jdbc.test-util :as sql-jdbc.tu]
[metabase.driver.util :as driver.u]
[metabase.models.database :refer [Database]]
[metabase.models :refer [Database Secret]]
[metabase.sync :as sync]
[metabase.test :as mt]
[metabase.test.data :as data]
Expand Down Expand Up @@ -153,4 +153,24 @@
(is (not= db-hash-1 db-hash-2)))))))
(finally
;; restore the original test DB details, no matter what just happened
(db/update! Database (mt/id) :details (:details db))))))))
(db/update! Database (mt/id) :details (:details db)))))))
(testing "postgres secrets are stable (#23034)"
(mt/with-temp* [Secret [secret {:name "file based secret"
:kind :perm-cert
:source nil
:value (.getBytes "super secret")}]]
(let [db {:engine :postgres
:details {:ssl true
:ssl-mode "verify-ca"
:ssl-root-cert-options "uploaded"
:ssl-root-cert-creator-id (mt/user->id :crowberto)
:ssl-root-cert-source nil
:ssl-root-cert-id (:id secret)
:ssl-root-cert-created-at "2022-07-25T15:57:51.556-05:00"}}]
(is (instance? java.io.File
(:sslrootcert (#'sql-jdbc.conn/connection-details->spec :postgres
(:details db))))
"Secrets not loaded for db connections")
(is (= (#'sql-jdbc.conn/jdbc-spec-hash db)
(#'sql-jdbc.conn/jdbc-spec-hash db))
"Same db produced different hashes due to secrets")))))
25 changes: 24 additions & 1 deletion test/metabase/models/secret_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,30 @@
(let [result (byte-array (.length val-file))]
(with-open [in (DataInputStream. (io/input-stream val-file))]
(.readFully in result))
result))))))))))
result)))))))))
(testing "value->file! returns the same file for secrets"
(testing "for file paths"
(let [file-secret-val "dingbat"
^File tmp-file (doto (File/createTempFile "value-to-file-test_" ".txt")
(.deleteOnExit))]
(spit tmp-file file-secret-val)
(mt/with-temp* [Secret [secret {:name "file based secret"
:kind :perm-cert
:source "file-path"
:value (.getAbsolutePath tmp-file)}]]
(is (instance? java.io.File (secret/value->file! secret nil)))
(is (= (secret/value->file! secret nil)
(secret/value->file! secret nil))
"Secret did not return the same file"))))
(testing "for upload files (#23034)"
(mt/with-temp* [Secret [secret {:name "file based secret"
:kind :perm-cert
:source nil
:value (.getBytes "super secret")}]]
(is (instance? java.io.File (secret/value->file! secret nil)))
(is (= (secret/value->file! secret nil)
(secret/value->file! secret nil))
"Secret did not return the same file")))))

(defn- decode-ssl-db-property [content mime-type property]
(let [value-key (keyword (str property "-value"))
Expand Down

0 comments on commit 7359927

Please sign in to comment.