Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla core dump on EC2, reporting Checksum error #598

Closed
tzach opened this issue Nov 23, 2015 · 15 comments
Closed

Scylla core dump on EC2, reporting Checksum error #598

tzach opened this issue Nov 23, 2015 · 15 comments
Assignees
Labels
Milestone

Comments

@tzach
Copy link
Contributor

tzach commented Nov 23, 2015

Using Scylla AMI 0.12
A stress run result in scylla service goes down

Last messages on journal

d)
Nov 23 14:36:33 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-1130808.log complete, 0 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-1130801.log complete, 7192 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-18014398510612785.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-1130800.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: WARNING: exceptional future ignored of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-18014398510612786.log complete, 10688 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-1130798.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: WARNING: exceptional future ignored of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-18014398510612784.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: WARNING: exceptional future ignored of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-1130799.log complete, 6815 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: Exiting on unhandled exception of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 0] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 1] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]:  [shard 1] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 sudo[10715]: pam_unix(sudo:session): session closed for user scylla

core sump

      Message: Process 2927 (scylla) of user 995 dumped core.

                Stack trace of thread 2927:
                #0  0x00007f7da64d19c8 raise (libc.so.6)
                #1  0x00007f7da64d365a abort (libc.so.6)
                #2  0x0000000000deb2d9 _ZN6futureIJEE12then_wrappedIZNS0_7finallyIZ7do_withI13basic_sstringIcjLj15EEZZN3rpc11send_helperIN3net10serializerENS8_14messaging_verbENS6_12no_
                #3  0x0000000000e53080 _ZZN3rpc11send_helperIN3net10serializerENS1_14messaging_verbENS_12no_wait_typeEJRK15frozen_mutationSt6vectorIN3gms12inet_addressESaISA_EESA_jmEEED
                #4  0x0000000000e533c5 _ZZN3rpc11send_helperIN3net10serializerENS1_14messaging_verbENS_12no_wait_typeEJRK15frozen_mutationSt6vectorIN3gms12inet_addressESaISA_EESA_jmEEED
                #5  0x0000000000d94d5f _ZN3net19send_message_onewayIJRK15frozen_mutationSt6vectorIN3gms12inet_addressESaIS6_EES6_jmEEEDaPNS_17messaging_serviceENS_14messaging_verbENS_8s
                #6  0x0000000000aa84b4 _ZZZZN7service13storage_proxy22init_messaging_serviceEvENKUl15frozen_mutationSt6vectorIN3gms12inet_addressESaIS4_EES4_jmE1_clES1_S6_S4_jmENKUlRKS1
                #7  0x0000000000aaa2bf apply (scylla)
                #8  0x0000000000e36e64 _ZZN3rpc11recv_helperIN3net10serializerENS1_14messaging_verbESt8functionIFNS_12no_wait_typeE15frozen_mutationSt6vectorIN3gms12inet_addressESaIS9_E
                #9  0x0000000000e37218 _ZNSt17_Function_handlerIFv13lw_shared_ptrIN3rpc8protocolIN3net10serializerENS3_14messaging_verbEE6server10connectionEEl16temporary_bufferIcEEZNS1
                #10 0x0000000000e2d26d _ZNKSt8functionIFv13lw_shared_ptrIN3rpc8protocolIN3net10serializerENS3_14messaging_verbEE6server10connectionEEl16temporary_bufferIcEEEclES9_lSB_ (
                #11 0x0000000000e2d408 _ZN8futurizeIvE5applyIZZN3rpc8protocolIN3net10serializerENS4_14messaging_verbEE6server10connection7processEvENUlvE0_clEvEUlS6_l16temporary_bufferI
                #12 0x00000000004757cd _ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE (scylla)
                #13 0x000000000049e9b0 _ZN7reactor3runEv (scylla)
                #14 0x00000000004fd174 _ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla)
                #15 0x000000000041d23b main (scylla)
                #16 0x00007f7da64bd700 __libc_start_main (libc.so.6)
                #17 0x0000000000472749 _start (scylla)

                Stack trace of thread 2928:
                #0  0x00007f7da65a0193 epoll_wait (libc.so.6)
                #1  0x0000000000645c9c eal_intr_thread_main (scylla)
                #2  0x00007f7da6864555 start_thread (libpthread.so.0)
                #3  0x00007f7da659fb9d __clone (libc.so.6)

                Stack trace of thread 2938:
                #0  0x00007f7da686c54d read (libpthread.so.0)
                #1  0x0000000000475e95 _ZN11thread_pool4workEv (scylla)
                #2  0x00000000004eb08e _ZNKSt8functionIFvvEEclEv (scylla)
                #3  0x00007f7da6864555 start_thread (libpthread.so.0)
                #4  0x00007f7da659fb9d __clone (libc.so.6)

              Stack trace of thread 2937:
                #0  0x00007f7da686c54d read (libpthread.so.0)
                #1  0x0000000000475e95 _ZN11thread_pool4workEv (scylla)
                #2  0x00000000004eb08e _ZNKSt8
@slivne
Copy link
Contributor

slivne commented Nov 23, 2015

isn't this a dup of #593

@tzach please note that you are replaying commitlogs - so it mean the
server was stopped/killed and then restarted - is that by intent.

On Mon, Nov 23, 2015 at 4:41 PM, Tzach Livyatan notifications@github.com
wrote:

Using Scylla AMI 0.12
A stress run result in scylla service goes down

Last messages on journal

d)
Nov 23 14:36:33 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-1130808.log complete, 0 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-1130801.log complete, 7192 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-18014398510612785.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-1130800.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: WARNING: exceptional future ignored of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-18014398510612786.log complete, 10688 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-1130798.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: WARNING: exceptional future ignored of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Error recovering /data/commitlog/CommitLog-1-18014398510612784.log: std::runtime_error (Checksum error in data entry)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: WARNING: exceptional future ignored of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] commitlog_replayer - Log replay of /data/commitlog/CommitLog-1-1130799.log complete, 6815 replayed mutations (0 invalid, 0 skipped)
Nov 23 14:36:34 ip-10-16-211-188 scylla_run[10715]: Exiting on unhandled exception of type 'std::runtime_error': Checksum error in data entry
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 0] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 1] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 scylla[10728]: [shard 1] compaction_manager - compaction task handler stopped due to shutdown
Nov 23 14:36:34 ip-10-16-211-188 sudo[10715]: pam_unix(sudo:session): session closed for user scylla

core sump

  Message: Process 2927 (scylla) of user 995 dumped core.

            Stack trace of thread 2927:
            #0  0x00007f7da64d19c8 raise (libc.so.6)
            #1  0x00007f7da64d365a abort (libc.so.6)
            #2  0x0000000000deb2d9 _ZN6futureIJEE12then_wrappedIZNS0_7finallyIZ7do_withI13basic_sstringIcjLj15EEZZN3rpc11send_helperIN3net10serializerENS8_14messaging_verbENS6_12no_
            #3  0x0000000000e53080 _ZZN3rpc11send_helperIN3net10serializerENS1_14messaging_verbENS_12no_wait_typeEJRK15frozen_mutationSt6vectorIN3gms12inet_addressESaISA_EESA_jmEEED
            #4  0x0000000000e533c5 _ZZN3rpc11send_helperIN3net10serializerENS1_14messaging_verbENS_12no_wait_typeEJRK15frozen_mutationSt6vectorIN3gms12inet_addressESaISA_EESA_jmEEED
            #5  0x0000000000d94d5f _ZN3net19send_message_onewayIJRK15frozen_mutationSt6vectorIN3gms12inet_addressESaIS6_EES6_jmEEEDaPNS_17messaging_serviceENS_14messaging_verbENS_8s
            #6  0x0000000000aa84b4 _ZZZZN7service13storage_proxy22init_messaging_serviceEvENKUl15frozen_mutationSt6vectorIN3gms12inet_addressESaIS4_EES4_jmE1_clES1_S6_S4_jmENKUlRKS1
            #7  0x0000000000aaa2bf apply (scylla)
            #8  0x0000000000e36e64 _ZZN3rpc11recv_helperIN3net10serializerENS1_14messaging_verbESt8functionIFNS_12no_wait_typeE15frozen_mutationSt6vectorIN3gms12inet_addressESaIS9_E
            #9  0x0000000000e37218 _ZNSt17_Function_handlerIFv13lw_shared_ptrIN3rpc8protocolIN3net10serializerENS3_14messaging_verbEE6server10connectionEEl16temporary_bufferIcEEZNS1
            #10 0x0000000000e2d26d _ZNKSt8functionIFv13lw_shared_ptrIN3rpc8protocolIN3net10serializerENS3_14messaging_verbEE6server10connectionEEl16temporary_bufferIcEEEclES9_lSB_ (
            #11 0x0000000000e2d408 _ZN8futurizeIvE5applyIZZN3rpc8protocolIN3net10serializerENS4_14messaging_verbEE6server10connection7processEvENUlvE0_clEvEUlS6_l16temporary_bufferI
            #12 0x00000000004757cd _ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE (scylla)
            #13 0x000000000049e9b0 _ZN7reactor3runEv (scylla)
            #14 0x00000000004fd174 _ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla)
            #15 0x000000000041d23b main (scylla)
            #16 0x00007f7da64bd700 __libc_start_main (libc.so.6)
            #17 0x0000000000472749 _start (scylla)

            Stack trace of thread 2928:
            #0  0x00007f7da65a0193 epoll_wait (libc.so.6)
            #1  0x0000000000645c9c eal_intr_thread_main (scylla)
            #2  0x00007f7da6864555 start_thread (libpthread.so.0)
            #3  0x00007f7da659fb9d __clone (libc.so.6)

            Stack trace of thread 2938:
            #0  0x00007f7da686c54d read (libpthread.so.0)
            #1  0x0000000000475e95 _ZN11thread_pool4workEv (scylla)
            #2  0x00000000004eb08e _ZNKSt8functionIFvvEEclEv (scylla)
            #3  0x00007f7da6864555 start_thread (libpthread.so.0)
            #4  0x00007f7da659fb9d __clone (libc.so.6)

          Stack trace of thread 2937:
            #0  0x00007f7da686c54d read (libpthread.so.0)
            #1  0x0000000000475e95 _ZN11thread_pool4workEv (scylla)
            #2  0x00000000004eb08e _ZNKSt8


Reply to this email directly or view it on GitHub
#598.

@tzach
Copy link
Contributor Author

tzach commented Nov 23, 2015

isn't this a dup of #593

Feel free to close this issue if its duplicate.

@tzach please note that you are replaying commitlogs - so it mean the
server was stopped/killed and then restarted - is that by intent.

Not on purpose
This might be a result of:

  • ansible script restart the server after update scylla.yaml (but before there is data)
  • process core dump and than restart by systemd

I'm guessing the second is more likely.

@slivne
Copy link
Contributor

slivne commented Nov 23, 2015

On Mon, Nov 23, 2015 at 4:54 PM, Tzach Livyatan notifications@github.com
wrote:

isn't this a dup of #593 #593

Feel free to close this issue if its duplicate.

@tzach https://github.com/tzach please note that you are replaying
commitlogs - so it mean the
server was stopped/killed and then restarted - is that by intent.

Not on purpose
This might be a result of:

  • ansible script restart the server after update scylla.yaml (but
    before there is data)

Is there any data - aside from cluster data - ?

In this case its strange that this happend - as the only items that should
exist are related to system tables. Its especially bad in this case as if
the system tables are not flushed and the commitlog is not correct then we
may not be able to revive from such a failure - is there any way to check
this out ?

  • process core dump and than restart by systemd

Is there an additional core on the machine - or is this the only one ?, if
there is an additional core - then it may provide the needed information
and we have another bug to fix.

If there is no additional core then its likely not the case.

I'm guessing the second is more likely.


Reply to this email directly or view it on GitHub
#598 (comment).

@tzach
Copy link
Contributor Author

tzach commented Nov 23, 2015

Look like the core dump is the original issue, and the commit log check sum errors happend after restart (again and again).
Also, the profile use LeveledCompactionStrategy (might be related)

@tzach
Copy link
Contributor Author

tzach commented Nov 23, 2015

I reproduce the problem, (or a problem) by

  • stopping the service
  • cleaning the data
  • start it again
  • run the same load

the core new dump

     Message: Process 19013 (scylla) of user 995 dumped core.

                Stack trace of thread 19017:
                #0  0x0000000000472ee3 _ZNSt13__atomic_baseImE8fetch_orEmSt12memory_order (scylla)
                #1  0x00007f412cdda430 __restore_rt (libpthread.so.0)
                #2  0x00007f412cb0d193 epoll_wait (libc.so.6)
                #3  0x0000000000645c9c eal_intr_thread_main (scylla)
                #4  0x00007f412cdd1555 start_thread (libpthread.so.0)
                #5  0x00007f412cb0cb9d __clone (libc.so.6)

                Stack trace of thread 19034:
                #0  0x00007f412cdd954d read (libpthread.so.0)
                #1  0x0000000000475e95 _ZN11thread_pool4workEv (scylla)
                #2  0x00000000004eb08e _ZNKSt8functionIFvvEEclEv (scylla)
                #3  0x00007f412cdd1555 start_thread (libpthread.so.0)
                #4  0x00007f412cb0cb9d __clone (libc.so.6)

                Stack trace of thread 19032:
                #0  0x00007f412cb0d193 epoll_wait (libc.so.6)
                #1  0x0000000000480078 _ZN21reactor_backend_epoll16wait_and_processEv (scylla)
                #2  0x000000000049e9da _ZN7reactor9poll_onceEv (scylla)
                #3  0x00000000004ad4b6 _ZZN3smp9configureEN5boost15program_options13variables_mapEENKUlvE_clEv.constprop.2703 (scylla)
                #4  0x0000000000472ebe _ZNKSt8functionIFvvEEclEv (scylla)
                #5  0x000000000064245b eal_thread_loop (scylla)
                #6  0x00007f412cdd1555 start_thread (libpthread.so.0)
                #7  0x00007f412cb0cb9d __clone (libc.so.6)

                Stack trace of thread 19033:
                #0  0x00007f412cdd954d read (libpthread.so.0)
                #1  0x0000000000475e95 _ZN11thread_pool4workEv (scylla)
                #2  0x00000000004eb08e _ZNKSt8functionIFvvEEclEv (scylla)
                #3  0x00007f412cdd1555 start_thread (libpthread.so.0)
                #4  0x00007f412cb0cb9d __clone (libc.so.6)

           Stack trace of thread 19013:
                #0  0x00007ffec01f6b5f __vdso_clock_gettime (linux-vdso.so.1)
                #1  0x00007f412cb1ad9d __clock_gettime (libc.so.6)
                #2  0x00007f412fe3fa4e _ZNSt6chrono3_V212system_clock3nowEv (libstdc++.so.6)
                #3  0x000000000049f07c _ZN7reactor3runEv (scylla)
                #4  0x00000000004fd174 _ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla)
                #5  0x000000000041d23b main (scylla)
                #6  0x00007f412ca2a700 __libc_start_main (libc.so.6)
                #7  0x0000000000472749 _start (scylla)

@avikivity
Copy link
Member

This looks like a live process, all the threads are where you expect
them to be.

Or was the list truncated? There should be 2 threads per lcore.

On 11/23/2015 06:35 PM, Tzach Livyatan wrote:

I reproduce the problem, (or a problem) by

  • stopping the service
  • cleaning the data
  • start it again
  • run the same load

the core new dump

|Message: Process 19013 (scylla) of user 995 dumped core. Stack trace
of thread 19017: #0 0x0000000000472ee3
_ZNSt13__atomic_baseImE8fetch_orEmSt12memory_order (scylla) #1
0x00007f412cdda430 __restore_rt (libpthread.so.0) #2
0x00007f412cb0d193 epoll_wait (libc.so.6) #3 0x0000000000645c9c
eal_intr_thread_main (scylla) #4 0x00007f412cdd1555 start_thread
(libpthread.so.0) #5 0x00007f412cb0cb9d __clone (libc.so.6) Stack
trace of thread 19034: #0 0x00007f412cdd954d read (libpthread.so.0) #1
0x0000000000475e95 _ZN11thread_pool4workEv (scylla) #2
0x00000000004eb08e _ZNKSt8functionIFvvEEclEv (scylla) #3
0x00007f412cdd1555 start_thread (libpthread.so.0) #4
0x00007f412cb0cb9d __clone (libc.so.6) Stack trace of thread 19032: #0
0x00007f412cb0d193 epoll_wait (libc.so.6) #1 0x0000000000480078
_ZN21reactor_backend_epoll16wait_and_processEv (scylla) #2
0x000000000049e9da _ZN7reactor9poll_onceEv (scylla) #3
0x00000000004ad4b6
_ZZN3smp9configureEN5boost15program_options13variables_mapEENKUlvE_clEv.constprop.2703
(scylla) #4 0x0000000000472ebe _ZNKSt8functionIFvvEEclEv (scylla) #5
0x000000000064245b eal_thread_loop (scylla) #6 0x00007f412cdd1555
start_thread (libpthread.so.0) #7 0x00007f412cb0cb9d __clone
(libc.so.6) Stack trace of thread 19033: #0 0x00007f412cdd954d read
(libpthread.so.0) #1 0x0000000000475e95 _ZN11thread_pool4workEv
(scylla) #2 0x00000000004eb08e _ZNKSt8functionIFvvEEclEv (scylla) #3
0x00007f412cdd1555 start_thread (libpthread.so.0) #4
0x00007f412cb0cb9d __clone (libc.so.6) Stack trace of thread 19013: #0
0x00007ffec01f6b5f __vdso_clock_gettime (linux-vdso.so.1) #1
0x00007f412cb1ad9d __clock_gettime (libc.so.6) #2 0x00007f412fe3fa4e
_ZNSt6chrono3_V212system_clock3nowEv (libstdc++.so.6) #3
0x000000000049f07c _ZN7reactor3runEv (scylla) #4 0x00000000004fd174
_ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla) #5
0x000000000041d23b main (scylla) #6 0x00007f412ca2a700
__libc_start_main (libc.so.6) #7 0x0000000000472749 _start (scylla) |


Reply to this email directly or view it on GitHub
#598 (comment).

@tzach tzach added the bug label Jan 10, 2016
@tzach tzach added this to the GA milestone Jan 10, 2016
@tzach
Copy link
Contributor Author

tzach commented Jan 10, 2016

A user report the same issue with 0.13 AMI

@elcallio
Copy link
Contributor

You are running Scylla AMI 0.12. It apparently does not have the changes to make CRC failures non-fatal.
Recent scylla does not have the error message "Checksum error in data entry" at all, so, this is not running the fixed code.

Whether or not the commit log segments should be corrupted or not is another issue, but assuming a harsh kill, and the above pointing to actual data sections having been incompletely written, it is not that strange really.

@elcallio
Copy link
Contributor

But having said that, and gathered feet in my mouth, I do see a pretty obvious bug in the replay iterator that might have a little something to do with the issue (false crc errors).
Fixing.

@tzach
Copy link
Contributor Author

tzach commented Jan 11, 2016

You are running Scylla AMI 0.12. It apparently does not have the changes to make CRC failures non-fatal.

Customer error, with similar logs is from AMI 0.13

@elcallio
Copy link
Contributor

So, good news is that with a fix for a file position issue, I can, as far as I can tell, read the log segments fine.
Bad news is that when running the data in question, I still get a whole lotta marshal_exception in both the commit log, but also when trying to write sstables. The former I would assume is because the mutations replayed contain data that has changed representation across versions(?)
The latter might be due to frozen mutations creating mem tables with slightly invalid data?

I'll send the patch for the commit log reader issue, but I think the marshalling might indicate some other compatibility issue.

@slivne
Copy link
Contributor

slivne commented Jan 12, 2016

Are you reading the commitlogs,sstables created by 0.13 using head (with your patch) ?

If so can we try and do a test using 0.13 and your patches - does that still have an issue ?

@elcallio
Copy link
Contributor

Cherry-picking the commit log fix onto 0.13 lets me start scylla with the data dump and no errors/printouts.

@penberg
Copy link
Contributor

penberg commented Jan 18, 2016

@slivne @elcallio So what's the status of this issue now? Fixed in 0.15 by 1d449b4?

@slivne
Copy link
Contributor

slivne commented Jan 18, 2016

yes - based on Calle's test - untill we get a different sample that causes an issue

@slivne slivne closed this as completed Jan 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants