Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

support snapshots or similar #2906

Closed
mickroll opened this issue Sep 29, 2020 · 7 comments
Closed

support snapshots or similar #2906

mickroll opened this issue Sep 29, 2020 · 7 comments

Comments

@mickroll
Copy link

mickroll commented Sep 29, 2020

One of the use cases for h2 databases is definitley being a storage backend for testing. We use it for our integration tests and really like its speed and ease to use 馃憤

Now we want to isolate our tests and for that, we currently rely on basically 3 schemes:

  1. transaction rollback, hoping the test did not commit any data: this is error prone, and won't catch half the use cases
  2. manually coding data-removal commands: this is error prone, one test misses to clean all data and another unrelated test fails due to residual data. It is very hard to find the culprit.
  3. clearing the whole database and rebuilding it from scratch using scripts (or flyway/liquibase): takes ages regarding the amount of tests and complexity of data.

We generally have a few megabytes (10..100) of database data as a 'base line', and each test creates a very small amount of data on top of that.

I am aware of SCRIPT/RUNSCRIPT or the possibility to 'manually' backup/restore the underlying database files. But all of these solutions have their own drawbacks:

  • creating and parsing a large sql files takes time
  • changing the database files while h2 is running doesn't seem right and asks for trouble

A really nice feature would be to have a createsnapshot/restoresnapshot functionality built into h2.

From my point of view it could work as follows:
createsnapshot:

  1. disallow new transactions from being created / disallow commands outside of transactions
  2. wait for already running transactions to complete
  3. copy underlying database files, maybe to a subdirectory with name of the snapshot
  4. continue normal operation

restoresnapshot:

  1. disallow new transactions from being created / disallow commands outside of transactions
  2. wait for already running transactions to complete
  3. delete underlying database files
  4. copy snapshot database files to original place
  5. continue normal operation

Any in-memory-objects could:

  • either be saved to disk as well
  • or held in memory: snapshot/restore would be blazingly fast

Now, what do you think?

@andreitokar
Copy link
Contributor

andreitokar commented Sep 29, 2020

changing the database files while h2 is running doesn't seem right and asks for trouble

and a dozen lines later

  1. delete underlying database files
  2. copy snapshot database files to original place

Looks like a contradiction to me.
On the other hand, how this would be different from shutting down database, copying file from a known good copy and starting again. Should not take ages.

@mickroll
Copy link
Author

mickroll commented Sep 30, 2020

I do not think it is a contradiction, because when triggered from within h2, the database can be prepared for the whole process. I see a lot of reasons why the whole snapshot-process should be covered by the database itself, and not from the outside:

  • Shutting down the database is not an option, because components using this database may continue to run across multiple tests. They would loose their connection etc.
  • Using the file-copy-mechanism should definitely be handled internally by h2 and not from the outside. And this is only one possible solution.
  • In-memory-databases do not rely on files and may be switched extremely fast.
  • Caching structures may also be stored alongside the database data.
  • Maybe snapshots are incorporated into the database files in the future, instead of duplicating them.
  • Maybe some kind of delta-storage is implemented in the future.

None of these future changes would break existing code that relies on the proposed createsnapshot/restoresnapshot functionality.

@andreitokar
Copy link
Contributor

Shutting down the database is not an option, because components using this database may continue to run across multiple tests. They would loose their connection etc.

But what is there to loose, besides connection itself, if all transactions need to be closed, therefore all connections will be idle anyway? Also IMHO, it is not a good idea to keep connection open between test cases.

As far as use of in-memory database in conjunction with copying to/from real file - that popped up a few times before, i.e. here.

In any case, it is not a trivial exercise, but approach with shutdown / copy / start, or shutdown / start with initialization script (for in-memory case) may solve your problem today.

@mickroll
Copy link
Author

mickroll commented Oct 1, 2020

i.e. here

These are the messages in the mentioned google group:

niklas...@gmail.com
20.06.2019, 11:32:38
an H2 Database
Hi,

We are using H2 for end-to-end tests for our JEE application.

So basically the setup is:

* start H2 (in memory) and application server
* run many testcases
* stop H2 and application server

Each testcase brings its own data and clears all tables before running.
Although H2 is pretty fast the setup of the data takes some time.

What I am thinking about is finding a way to start H2 e.g. from a file that contains some prefilled data and then only add the special data the test needs on top.
But I have not found any way to reset the database to the initial state after the test. 
Of course copying the file again and restarting H2 would work, but this is probably also slow when running very many tests.

Has anybody a similar setup and have you solved this problem somehow? Running in-memory but backed by a read-only file for example?

Thanks & best regards,
  Niklas
Noel Grandin
20.06.2019, 11:37:30
an h2-da...@googlegroups.com

This has come up several times before, and the reply I gave there was that we have a FilePath abstraction which sits
between our engine and various kind of storage backends (memory, disk, etc).

It should be possible to
(a) restore a disk file to an in-memory backend, since the FilePath abstraction looks like a filesystem
(b) make a copy of a populated database from an in-memory backend to a disk file.
(c) overwrite an existing in-memory database "virtual file" from another in-memory "virtual file"

However, to my knowledge, no-one has ever done the work to actually make this happen.

(not including a mention of savepoint/rollback, because this would need a single running transaction)

@manticore-projects
Copy link
Contributor

Hi Mick.

Pardon my ignorance, but have you tried to simply Copy that H2 Database file (e. g. using Apache Commons IO) in order to create your snapshot?
We do exactly that on quite large databases (> 2GByte) running in server mode, even on a Windows Filesystem.
We ensure that there is no User Connected and have never seen any corruption or negative side effects.

Our biggest challenge from that practise is, that we have now a large zoo of such H2 database files laying around and when we want to migrate to a newer H2 version, we need to migrate all of them, while each may have its own incompatibilities (depending, when exactly they have been created). But that's the only Contra argument I have in mind.

Cheers

@manticore-projects
Copy link
Contributor

Alternatively: Why not just loop through all schemas and tables (e.g. via JDBC DatabaseMetaData) and create snapshots as per sql CREATE TABLE target AS SELECT * FROM source;

If the schemas do not change, then you could add a timestamp field in order to identify your snapshot.
If the schemas change, then create new schemas with a timestamp alike suffix (e.g. MY_SCHEMA_20200930120000)

Cheers

@mickroll
Copy link
Author

mickroll commented Oct 7, 2020

Hi @manticore-projects,

thanks for your ideas, but:

  • copying files won't work with our in-memory databases
  • looping through tables is not enough. There are sequences, views, changed constraints, maybe procedures... the lot

We now went with SCRIPT command for dumping and DROP ALL OBJECTS + executing the generated script for restoring.
This comes with the (now usual) workaround for #2390 (sorting VIEW-related commands to the end of the dump).

I hoped for a more native approach but it seems this won't happen in the near future.

regards,
Michael

@mickroll mickroll closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants