Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump/Load Metabase DB to/from H2 file #10662

Closed
wants to merge 35 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ab50944
[cmd] dump to h2 wip
ogeagla Aug 21, 2019
fb76399
[cmd] dump to h2 wip
ogeagla Aug 21, 2019
5d91236
[api] wip dump to h2 endpoint
ogeagla Aug 21, 2019
d48921b
[api] wip dump to h2
ogeagla Aug 21, 2019
96f8a05
[crypto] pub-key for aes secret, aes for dump contents WIP
ogeagla Aug 22, 2019
a26f843
[crypto] zip/unzip encrypted content
ogeagla Aug 22, 2019
f03aad9
[crypto] wip up/download secure dumps
ogeagla Aug 22, 2019
88554f2
[s3][test] e2e encryption test, s3 upload/download
ogeagla Aug 23, 2019
3b6289a
[test] crypto resources
ogeagla Aug 23, 2019
1d9922f
[api] secure dump up/down endpoints
ogeagla Aug 24, 2019
c90e435
[s3] working upload to s3 url
ogeagla Aug 26, 2019
f265e0e
[crypto] decrypt dump from provided secret key from submarine
ogeagla Aug 27, 2019
aa579a0
[secure-dump] wip e2e
ogeagla Aug 27, 2019
1f3a477
[crypto][style] use bytes, break up commands (wip)
ogeagla Aug 27, 2019
02b4eb7
[refactor]
ogeagla Aug 28, 2019
345977a
[docs]
ogeagla Aug 28, 2019
6e90725
[docs]
ogeagla Aug 28, 2019
ea18d6c
[dump] fix npe when no dump file provided
ogeagla Aug 28, 2019
8dec304
Ensure that we don't overwrite existing output database
walterl Aug 28, 2019
2e88c02
Cosmetic changes
walterl Aug 28, 2019
4dde536
Get db type and connection details from specified connection, not glo…
walterl Aug 28, 2019
d1a2f3d
Bail if _output_ db is H2, not input db
walterl Aug 28, 2019
5ac0514
Since the output db value is a filename we can just assume that the d…
walterl Aug 28, 2019
244faba
[secure-dump] fix async api endpoints
ogeagla Aug 29, 2019
2b809eb
[style]
ogeagla Aug 29, 2019
8aadcf5
Refactor to dump currently configured Metabase db to output H2 dir
walterl Aug 30, 2019
6abb6bd
[secret] generate aes secret
ogeagla Aug 30, 2019
9c53382
[secure-dump] rely on user arg for dump path input to secure upload
ogeagla Aug 30, 2019
4b8ba26
[style] separate zip ns
ogeagla Aug 30, 2019
d758fff
[deps] remove amazonica
ogeagla Aug 30, 2019
61f426d
Exit with return code, when one was returned
walterl Aug 30, 2019
150305e
Upper case column names
walterl Aug 30, 2019
70e2fb2
Return positive error code in stead of calling `System/exit` directly
walterl Aug 30, 2019
228d409
Output error in red
walterl Aug 30, 2019
ddb874f
[api] non-async the dump endpoints
ogeagla Aug 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions project.clj
Expand Up @@ -48,6 +48,7 @@
:exclusions [org.clojure/clojure
org.clojure/clojurescript]] ; fixed length queue implementation, used in log buffering
[amalloy/ring-gzip-middleware "0.1.4"] ; Ring middleware to GZIP responses if client can handle it
[amazonica "0.3.145"] ; AWS SDK wrapper
[aleph "0.4.6" :exclusions [org.clojure/tools.logging]] ; Async HTTP library; WebSockets
[bigml/histogram "4.1.3"] ; Histogram data structure
[buddy/buddy-core "1.5.0" ; various cryptograhpic functions
Expand All @@ -62,6 +63,7 @@
[clojurewerkz/quartzite "2.1.0" ; scheduling library
:exclusions [c3p0]]
[colorize "0.1.1" :exclusions [org.clojure/clojure]] ; string output with ANSI color codes (for logging)
[com.amazonaws/aws-java-sdk-s3 "1.11.618"] ; AWS S3 SDK
[com.cemerick/friend "0.2.3" ; auth library
:exclusions [commons-codec
org.apache.httpcomponents/httpclient
Expand Down Expand Up @@ -97,6 +99,7 @@
javax.jms/jms
com.sun.jdmk/jmxtools
com.sun.jmx/jmxri]]
[me.raynes/fs "1.4.6"] ; FS tools
[medley "1.2.0"] ; lightweight lib of useful functions
[metabase/connection-pool "1.0.2"] ; simple wrapper around C3P0. JDBC connection pools
[metabase/mbql "1.3.3"] ; MBQL language schema & util fns
Expand Down
98 changes: 98 additions & 0 deletions src/metabase/api/dump.clj
@@ -0,0 +1,98 @@
(ns metabase.api.dump
"/api/dump endpoints."
(:require [cheshire.core :as json]
[clojure.core.async :as a]
[clojure.string :as str]
[clojure.tools.logging :as log]
[compojure.core :refer [POST]]
[medley.core :as m]
[metabase.api.common :as api]
[metabase.cmd :as cmd]
[metabase.mbql.schema :as mbql.s]
[metabase.models
[card :refer [Card]]
[database :as database :refer [Database]]
[query :as query]]
[metabase.query-processor :as qp]
[metabase.query-processor
[async :as qp.async]
[util :as qputil]]
[metabase.query-processor.middleware.constraints :as constraints]
[metabase.util
[date :as du]
[export :as ex]
[i18n :refer [trs tru]]
[schema :as su]]
[schema.core :as s]
[metabase.api.dataset :as dataset-api])
(:import clojure.core.async.impl.channels.ManyToManyChannel))


(def dump-targets
"Map of export types to their relevant metadata"
{"h2" {}})

(def DumpTarget
"Schema for valid dump target formats."
(apply s/enum (keys {"h2" {}})))

(def dump-target-regex
"Regex for matching valid export formats (e.g., `json`) for queries.
Inteneded for use in an endpoint definition:

(api/defendpoint POST [\"/:export-format\", :export-format export-format-regex]"
(re-pattern (str "(" (str/join "|" (keys dump-targets)) ")")))

(s/defn as-async
"Write the results of an async query to API `respond` or `raise` functions in `export-format`. `in-chan` should be a
core.async channel that can be used to fetch the results of the query."
{:style/indent 3}
[respond :- (s/pred fn?), raise :- (s/pred fn?), in-chan :- ManyToManyChannel]
(a/go
(try
(let [results (a/<! in-chan)]
(if (instance? Throwable results)
(raise results)
(respond results)))
(catch Throwable e
(raise e))
(finally
(a/close! in-chan))))
nil)

;; curl -i -X POST -H "Content-Type: application/json" -d '{"db-conn-str": "test1", "h2-conn-str": "test2"}' -H "X-Metabase-Session: 273cdf75-3e9a-42e7-a7fd-57421d69ec76" "localhost:3000/api/dump/to-h2"
(api/defendpoint-async
POST ["/to-h2"]
"Dump db to H2 file."
[{{:keys [db-conn-str h2-filename] :as body} :body} respond raise]
{db-conn-str su/NonBlankString
h2-filename su/NonBlankString}
(as-async respond raise
(let []
(log/info (trs "Dumping to H2: " db-conn-str h2-filename))
(cmd/dump-to-h2 db-conn-str h2-filename))))

(api/defendpoint-async
POST ["/secure-upload"]
"Encrypt, compress, and upload an H2 dump to S3. Does not perform an H2 dump."
[{{:keys [s3-upload-str] :as body} :body} respond raise]
{s3-upload-str su/NonBlankString}
(as-async respond raise
(let []
(log/info (trs "Secure dump and upload: " s3-upload-str))
(cmd/secure-dump-and-upload s3-upload-str nil))))

(api/defendpoint-async
POST ["/download-and-unlock"]
"Download, uncompress, and unencrypt secure dump from S3. Does not load the H2 db."
[{{:keys [h2-dump-path s3-bucket s3-key secret-key] :as body} :body} respond raise]
{h2-dump-path su/NonBlankString
s3-bucket su/NonBlankString
s3-key su/NonBlankString
secret-key su/NonBlankString}
(as-async respond raise
(let []
(log/info (trs "Download secure dump: " h2-dump-path s3-bucket s3-key (count secret-key)))
(cmd/secure-dump-download-and-unlock h2-dump-path s3-bucket s3-key secret-key))))

(api/define-routes)
2 changes: 2 additions & 0 deletions src/metabase/api/routes.clj
Expand Up @@ -11,6 +11,7 @@
[dashboard :as dashboard]
[database :as database]
[dataset :as dataset]
[dump :as dump]
[email :as email]
[embed :as embed]
[field :as field]
Expand Down Expand Up @@ -67,6 +68,7 @@
(context "/dashboard" [] (+auth dashboard/routes))
(context "/database" [] (+auth database/routes))
(context "/dataset" [] (+auth dataset/routes))
(context "/dump" [] (+auth dump/routes))
(context "/email" [] (+auth email/routes))
(context "/embed" [] (+message-only-exceptions embed/routes))
(context "/field" [] (+auth field/routes))
Expand Down
23 changes: 23 additions & 0 deletions src/metabase/cmd.clj
Expand Up @@ -37,6 +37,29 @@
(binding [mdb/*disable-data-migrations* true]
((resolve 'metabase.cmd.load-from-h2/load-from-h2!) h2-connection-string))))

(defn ^:command dump-to-h2
"Transfer data from existing database specified by env vars to the newly created H2 DB."
([]
(dump-to-h2 nil nil))
([db-connection-string]
(dump-to-h2 db-connection-string nil))
([db-connection-string h2-filename]
(classloader/require 'metabase.cmd.dump-to-h2)
(binding [mdb/*disable-data-migrations* true]
((resolve 'metabase.cmd.dump-to-h2/dump-to-h2!) db-connection-string h2-filename))))

(defn ^:command secure-dump-and-upload
([s3-upload-url-str curr-db-conn-str]
(classloader/require 'metabase.cmd.dump-upload)
(binding [mdb/*disable-data-migrations* true]
((resolve 'metabase.cmd.dump-upload/up!) s3-upload-url-str curr-db-conn-str))))

(defn ^:command secure-dump-download-and-unlock
([h2-dump-path s3-bucket s3-key secret-key]
(classloader/require 'metabase.cmd.dump-download)
(binding [mdb/*disable-data-migrations* true]
((resolve 'metabase.cmd.dump-download/down!) h2-dump-path s3-bucket s3-key secret-key))))

(defn ^:command profile
"Start Metabase the usual way and exit. Useful for profiling Metabase launch time."
[]
Expand Down
61 changes: 61 additions & 0 deletions src/metabase/cmd/dump_download.clj
@@ -0,0 +1,61 @@
(ns metabase.cmd.dump-download
(:require [metabase.crypto.symmetric :as symm]
[clojure.java.io :as io])
(:import (java.util.zip ZipInputStream)
(java.io ByteArrayOutputStream)))

(defn- file->bytes ^bytes [file]
;(println "f->b ")
;(Thread/sleep 5000)
(with-open [xin (io/input-stream file)
xout (ByteArrayOutputStream.)]
(io/copy xin xout)
(.toByteArray xout)))

(defn- unzip-secure-dump [{:keys [zip-path dump-path secret-path]}]
(println "Unzipping " zip-path " --> " dump-path secret-path)
(with-open [stream (->
(io/input-stream zip-path)
(ZipInputStream.))]
(loop [entry (.getNextEntry stream)]
(when entry
(let [entry-name (.getName entry)]
(println "Unzipping " entry-name)
(case entry-name
"dump.enc" (io/copy stream (io/file dump-path))
"secret.enc" (io/copy stream (io/file secret-path)))
(recur (.getNextEntry stream)))))))

(defn uri->file [uri file]
(println "Transfer: " uri " -> " file)
(io/make-parents file)
(with-open [in (io/input-stream uri)
out (io/output-stream file)]
(io/copy in out)))


(defn- decrypt-payload-bytes ^bytes [^bytes enc-payload secret-key]
(symm/decrypt-b64-bytes enc-payload secret-key))
(defn down! [h2-dump-path s3-bucket s3-key secret-key]
(let [

enc-dump-path "./dumps_in/dumped.enc"
enc-secret-path "./dumps_in/dumped_secret.enc"
zip-path "./dumps_in/dumped.zip"

;;Item will have been made public so we can download this
_ (uri->file (format "https://%s.s3.amazonaws.com/%s" s3-bucket s3-key) zip-path)

_ (unzip-secure-dump {:zip-path zip-path
:dump-path enc-dump-path
:secret-path enc-secret-path})
enc-payload (file->bytes enc-dump-path)
enc-payload-decrypted (decrypt-payload-bytes enc-payload secret-key)]
(with-open [w (io/output-stream h2-dump-path)]
(.write w enc-payload-decrypted))

;(println "Loading from H2 dump:")
;(metabase.cmd.load-from-h2/load-from-h2! h2-dump-path)


(println "Done " h2-dump-path s3-bucket s3-key secret-key)))