Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mnesia is unable to merge the schema for tables using external storage backends #7423

Open
ieQu1 opened this issue Jun 20, 2023 · 15 comments
Open
Assignees
Labels
bug Issue is reported as a bug Planned Focus issue added in sprint planning stalled waiting for input by the Erlang/OTP team team:PS Assigned to OTP team PS waiting waiting for changes/input from author

Comments

@ieQu1
Copy link

ieQu1 commented Jun 20, 2023

Describe the bug
mnesia_schema.erl contains the following code:

change_storage_type(N, ram_copies, Cs) ->

This function is called during schema merging. However, it doesn't handle external storage backends created via mnesia:add_backend_type (e.g. https://github.com/aeternity/mnesia_rocksdb), causing the following error:

(<0.2909.0>) call mnesia_schema:change_storage_type('emqx2@127.0.0.1',{ext,rocksdb_copies,mnesia_rocksdb},{cstruct,emqx_ee_schema_registry_protobuf_cache_tab,set,[],[],[],
         [{{rocksdb_copies,mnesia_rocksdb},['emqx1@127.0.0.1']}],
         0,read_write,false,[],[],false,protobuf_cache,
         [fingerprint,module,module_binary],
         [],[],[],
         {{1686330881043734966,-576460752303423474,1},'emqx1@127.0.0.1'},
         {{4,1},{'emqx1@127.0.0.1',{1687,264314,407434}}}}) ({mnesia_schema,
                                                              merge_storage_type,
                                                              5})

To Reproduce

  1. Add an external storage backend (e.g. mnesia_rocksdb:register) in a cluster of two nodes (A and B)
  2. Create a table with this backend and ensure it has copies on both nodes.
  3. Trigger schema merge. We did it by shutting down B, removing a remote table copy on the surviving node A and restarting B, but there could be an easier way.
  4. Mnesia on B fails to start with this error.

Expected behavior
Schema is merged.

Affected versions
Probably all OTP versions that support 3rd party backends.

Additional context

@ieQu1 ieQu1 added the bug Issue is reported as a bug label Jun 20, 2023
@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Jun 21, 2023
@dgud
Copy link
Contributor

dgud commented Aug 22, 2023

I have problems reproducing this with the instructions you gave, can you write a testcase in mnesia.
There is a table type ext_ets and ext_dets default configured that you can use to remove the mnesia_rocksdb
dependency.

@IngelaAndin IngelaAndin added the waiting waiting for changes/input from author label Sep 7, 2023
@ieQu1
Copy link
Author

ieQu1 commented Sep 13, 2023

Hello,

I think we found a reliable way to reproduce it in our own test suite. Porting to the OTP test suite may take time, since I am not familiar with it, but I might attempt it.

The steps are:

  • Create a mnesia cluster with 2 nodes N1, N2
  • Create a table T with ext backend that has copies on both nodes
  • Shut down N2
  • Start a new node N3, and create a copy of T there
  • Restart N2
  • N2 crashes with the case clause in change_storage_type.

@IngelaAndin IngelaAndin added waiting waiting for changes/input from author and removed waiting waiting for changes/input from author labels Oct 11, 2023
@dgud
Copy link
Contributor

dgud commented Oct 19, 2023

Would still like to have a testcase or some code that I can run which reproduces this.

@IngelaAndin
Copy link
Contributor

ping @ieQu1

@axpxp
Copy link

axpxp commented May 22, 2024

@dgud I have a testcase, how to send it to you?

@dgud
Copy link
Contributor

dgud commented May 22, 2024

Post it here, or add a gist.
The testcase should be without mnesia_rocksdb I don't want to debug that.

@axpxp
Copy link

axpxp commented May 22, 2024

@dgud I have re-uploaded erlang27 version, please check, thank you
#8045

@Mikaka27
Copy link
Contributor

I don't think I'm seeing the correct problem when running this:

mnesia_bug_stacktrace.txt

Please verify, not familiar with rocksdb at all.
Would it be possible to have a reproduction without mnesia_rocksdb?

@ieQu1
Copy link
Author

ieQu1 commented May 29, 2024

Hello,

Sorry for no answer, I was head deep in other stuff. This problem is pretty rare (thankfully), so I don't know the precise conditions to trigger it. I pinpointed the function with a missing clause from the stacktrace, and have a preliminary fix, but no reliable way to test it.

Maybe OTP experts can suggest what scenarios can trigger various types of schema merge. Edit: apparently it's right there #7423 (comment)

@Mikaka27
Copy link
Contributor

Hello,

Sorry for no answer, I was head deep in other stuff. This problem is pretty rare (thankfully), so I don't know the precise conditions to trigger it. I pinpointed the function with a missing clause from the stacktrace, and have a preliminary fix, but no reliable way to test it.

Maybe OTP experts can suggest what scenarios can trigger various types of schema merge.

But you mean that there is an error in your reproduction? And that's why I'm seeing this wrong stacktrace?

But otherwise this reproduction should trigger the error? After running a few times perhaps?

@ieQu1
Copy link
Author

ieQu1 commented May 29, 2024

Your stacktrace looks different. You likely have stumbled on a different issue that looks specific to rocksdb:

{noproc,
                                       {gen_server,call,
                                        [mnesia_rocksdb_admin,
                                         {rdb,{get_ref,t}},
                                         infinity]}}

This doesn't look like a Mnesia process.

In our case, BUP was not involved. It happened after a regular node restart.

@axpxp
Copy link

axpxp commented May 30, 2024

I don't think I'm seeing the correct problem when running this:

mnesia_bug_stacktrace.txt

Please verify, not familiar with rocksdb at all. Would it be possible to have a reproduction without mnesia_rocksdb?

mnesia:add_backend_type(Alias, Module) after,If the table is empty,Everything is fine,because Module:init_backend first call.
mnesia:add_backend_type(Alias, Module) after,If there's data in the table,mnesia:backup("bk.BUP") after mnesia:install_fallback("bk.BUP") after mnesia:start(),There will be bugs,because Module:init_backend not call.

@IngelaAndin IngelaAndin added the Planned Focus issue added in sprint planning label Jun 19, 2024
@Mikaka27
Copy link
Contributor

Hello,

I think we found a reliable way to reproduce it in our own test suite. Porting to the OTP test suite may take time, since I am not familiar with it, but I might attempt it.

The steps are:

* Create a mnesia cluster with 2 nodes N1, N2

* Create a table T with ext backend that has copies on both nodes

* Shut down N2

* Start a new node N3, and create a copy of T there

* Restart N2

* N2 crashes with the case clause in `change_storage_type`.

Couple of questions about this process:

  • Does N3 have it's table created with or without external backend?
  • Does N3 have it's table created before or after joining the cluster?

@zmstone
Copy link

zmstone commented Aug 23, 2024

Hi @Mikaka27

  • N3 has the table created with external backend
  • N3 has the table created before joining the cluster

@Mikaka27
Copy link
Contributor

Mikaka27 commented Aug 23, 2024

Hi @Mikaka27

* N3 has the table created with external backend

* N3 has the table created before joining the cluster

Hi, I'm having trouble to reproduce this problem.
Could you share which mnesia (and other) functions (and their arguments) you use in each step?

* Create a mnesia cluster with 2 nodes N1, N2
* Create a table T with ext backend that has copies on both nodes
* Shut down N2
* Start a new node N3, and create a copy of T there
* Restart N2
* N2 crashes with the case clause in `change_storage_type`.

And also it would be good to see how you initialize the schema on each node.

@IngelaAndin IngelaAndin added the stalled waiting for input by the Erlang/OTP team label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug Planned Focus issue added in sprint planning stalled waiting for input by the Erlang/OTP team team:PS Assigned to OTP team PS waiting waiting for changes/input from author
Projects
None yet
Development

No branches or pull requests

6 participants