-
Notifications
You must be signed in to change notification settings - Fork 407
Status: Work-in-progress
Try to reproduce a problem on the latest version. Probably it is already fixed.
Sometimes Tarantool has a problem that triggered in CI runner and cannot be reproduced locally. There is a helpful Github Action action-tmate
(doc) that helps to debug your GitHub Actions via SSH by using tmate
.
By default a failed step will cause all following steps to be skipped. You can specify that the tmate
session only starts if a previous step failed.
- name: Setup tmate session
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3
Reproduced locally? If so, nice. If not, whether it reproduces on a given OS or distro? If so, good. strace
and gdb
requires ptrace
capacity of a Docker container: use docker run --privileged -it blah:blah
(or --cap-add SYS_PTRACE
). If you setup everything already in a container, but forgot --privileged
, stop it, add ptrace
capacity into json, start again.
grep -o CapAdd /var/lib/docker/containers/6b842b1290f5ca4e7b6a46bc6b985a93ab52f70d68804c75cc5c180e113126f8/hostconfig.json`
CapAdd:["SYS_PTRACE"]
If you suspect that the problem is a regression, let's check and, if so, just bisect it. It is semi-automatic action that may give much more information about the problem without any real investigation.
There are good examples, where git bisect allows to better understand the problem in a short time:
- https://github.com/tarantool/tarantool/issues/3747
- https://github.com/tarantool/tarantool/issues/4179
- https://github.com/tarantool/tarantool/issues/4736
- https://github.com/tarantool/tarantool/issues/4774
$ git clone --recurse-submodules git@github.com:tarantool/tarantool.git
$ cd tarantool
$ git bisect start
$ git clean -xffd; git submodule foreach --recursive 'git clean -xffd'
$ cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_FEEDBACK_DAEMON=OFF -DENABLE_BUNDLED_LIBCURL=OFF && make -j
$ ./src/tarantool ../repro.lua # !! BUG !!
$ git bisect bad
$ git clean -xffd; git submodule foreach --recursive 'git clean -xffd'
$ git checkout <..commit hash..>
$ git submodule update --recursive --init
$ cmake . -DCMAKE_BUILD_TYPE=Debug -DENABLE_BACKTRACE=ON -DENABLE_DIST=ON -DENABLE_FEEDBACK_DAEMON=OFF -DENABLE_BUNDLED_LIBCURL=OFF && make -j
$ ./src/tarantool ../repro.lua # !! OK !!
$ git bisect good
w
<..repeat git clean, cmake && make, repro.lua..>
Use say_debug
in C, io.stderr:write()
or log
-methods in Lua.
See https://www.tarantool.io/en/doc/latest/reference/reference_lua/log/
Enabling verbose mode in third party components:
- Curl
- curl_global_trace;
-
CURLOPT_VERBOSE (could be enabled in Lua:
require('http.client').new():request(method, url, body, { verbose=true })
, see documentation); - CURLOPT_DEBUGFUNCTION;
A weird someone-breaks-a-memory problem may be caught early on some assertion in the Debug build. Our packages are built with -DCMAKE_BUILD_TYPE=RelWithDebInfo
: it has debugging info, but has no assertions and built with -O2
. Debug mode could be enabled with -DCMAKE_BUILD_TYPE=Debug
. Use debug build for a module if the problem may be there.
Build Tarantool with enabled Address Sanitizer (ENABLE_ASAN
) or Valgrind (ENABLE_VALGRIND
).
ASAN vs Valgrind: https://developers.redhat.com/blog/2021/05/05/memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind and https://github.com/google/sanitizers/wiki/AddressSanitizerComparisonOfMemoryTools
Note, Valgrind cannot run executables that were built by Clang compiler. It was fixed in Valgrind 3.19, see https://bugs.kde.org/show_bug.cgi?id=452758.
Tarantool has a suppression list with known memory leaks, this list should be specified in environment variable on Tarantool start:
LSAN_OPTIONS=suppressions=${PWD}/asan/lsan.supp tarantool
Otherwise Addres Sanitizer will report about memory leaks to stdout.
==532476== Memcheck, a memory error detector
==532476== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==532476== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==532476== Command: /home/sergeyb/sources/MRG/tarantool/build/src/tarantool /home/sergeyb/sources/MRG/tarantool-ee/tarantool/test-run/lib/luatest/bin/luatest -c test/integration/general_test.lua
==532476==
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x23
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x1b
==532476== Valgrind: debuginfo reader: ensure_valid failed:
==532476== Valgrind: during call to ML_(img_get)
==532476== Valgrind: request for range [299270124, +4) exceeds
==532476== Valgrind: valid image size of 23591872 for image:
==532476== Valgrind: "/home/sergeyb/sources/MRG/tarantool/build/src/tarantool"
==532476==
==532476== Valgrind: debuginfo reader: Possibly corrupted debuginfo file.
==532476== Valgrind: I can't recover. Giving up. Sorry.
==532476==
Valgrind limitation: https://github.com/tarantool/tarantool/issues/3071
Fun fact: there is a built in Lua variable tarantoool.build.asan
that controls max slice check, see commit 232c28f333ac75be26abb54dc308e2e5813b7acd:
commit 232c28f333ac75be26abb54dc308e2e5813b7acd
Author: Nikolay Shirokovskiy <nshirokovskiy@tarantool.org>
Date: Tue Jul 11 17:36:37 2023 +0300
fiber: turn off max slice check with ASAN
FIXME: tarabrt.sh
/ gdb.sh
можно поменять на tt coredump
уже, see https://www.tarantool.io/en/doc/latest/reference/tooling/tt_cli/coredump/
Give the following link to a customer: https://raw.githubusercontent.com/tarantool/tarantool/master/tools/tarabrt.sh Basically, usage is the following:
$ /path/to/tarabrt.sh -c /path/to/core -d /target/dir
The customer sends us a tarball with coredump file, executable, libraries.
Now use tools/gdb.sh
from tarantool repository to run gdb
with collected executable and libraries. Basically, usage is the following:
$ tar -xvf /path/to/archive.tar.gz
$ /path/to/gdb.sh
I have a 'screencast' of usage of a preliminary version of the script: see https://lists.tarantool.org/tarantool-patches/20210315163046.lobxpj7pgy7tvgiu@tkn_work_nb/. But most of the highlighted problems are already fixed.
Tarantool написан на языке С, но пользователь может работать с ним с помощью Lua. Ранее отлаживать исходники на Lua с помощью стандартных средств консоли Tarantool было очень сложно. В этом релизе 2.11 добавили оболочку консольного отладчика, которая значительно упрощает отладку модулей Lua.
Отладчик Lua поддерживает все стандартные отладочные операции:
- пошаговое выполнение;
- проверку значений переменных в локальном или глобальном контексте;
- обход кадров стека;
- установку точек остановки и многое другое.
local dbg = require("luadebug")
dbg() -- Вызов оболочки отладчика
local a = 1
local function b()
print("hello")
local m = 1
local b = 2
local a = 1
local c = m + b
if c == 2 then
b = 22
end
end
if a == 1 then
b()
a = 2
b()
end
$ tarantool dbg-example.lua
luadebug: Loaded for 2.11.0-entrypoint-1031-gdc451913f
break via dbg() => debug-dbg.lua:3 in chunk at debug-dbg.lua:0
1 local debugger = require 'luadebug'
2 debugger()
3 => local date = require 'datetime'
4
5 local T = date.new{hour = 3, tzoffset = '+0300'}
6 print(T)
luadebug>
В дополнение к использованию отладчика через вызов require('luadebug')(), о котором говорилось выше, можно использовать менее инвазивный подход — новый параметр -d. Таким образом, вы можете отлаживать Lua-скрипты напрямую:
$ tarantool -d debug.lua
Tarantool debugger 2.11.0-entrypoint-1031-gdc451913f
type 'help' for interactive help
luadebug: Loaded for 2.11.0-entrypoint-1031-gdc451913f
break via debug.lua => debug.lua:1 in chunk at debug.lua:0
1 => local date = require 'datetime'
2
3 local T = date.new{hour = 3, tzoffset = '+0300'}
4 print(T)
Screencast: https://asciinema.org/a/560039
Note, Lua debugger has a limitations - there is no fiber support yet.
Note, SMALL allocators poisons deallocated memory regions by certain symbols. It could be helpful on debugging session.
- region -
P
, see https://github.com/tarantool/small/blob/373d78f0fc3cc7fea4de592d4c3f9c09ef923c6b/small/region.c#L90 - slab -
P
, see https://github.com/tarantool/small/blob/373d78f0fc3cc7fea4de592d4c3f9c09ef923c6b/small/slab_cache.c#L112
There are two debugger extensions: LuaJIT-specific luajit-gdb.py
and Tarantool-specific tarantool-gdb.py
.
NOTE: В ближайшее время расширения luajit-gdb.py
и luajit_lldb.py
будут объединены.
$ cat ~/.gdbinit
# https://stackoverflow.com/a/3176802/1598057
set history save on
set history size unlimited
set history remove-duplicates 100
set history filename ~/.gdb_history
Load extension: source ~/tools/tarantool/luajit/luajit-gdb.py
.
Load extension: source tools/tarantool-gdb.py
.
(gdb) source ~/sources/MRG/tarantool/tools/tarantool-gdb.py
(gdb) info tt-fibers
Id Target Name Stack Function
114 TtFiber "etcd-client read stream" 0x7f3202a80918 0x555e49eec670 <lua_fiber_run_f>
113 TtFiber "etcd-client write stream" 0x7f3202980a58 0x555e49eec670 <lua_fiber_run_f>
111 TtFiber "lua" 0x7f3203380a78 0x555e49eec670 <lua_fiber_run_f>
108 TtFiber "lua" 0x7f3203080a78 0x555e49eec670 <lua_fiber_run_f>
105 TtFiber "lua" 0x7f3203980a78 0x555e49eec670 <lua_fiber_run_f>
103 TtFiber "luatest" 0x7f3209380978 0x555e49ee8060 <run_script_f>
101 TtFiber "lua" 0x7f3209180c68 0x555e49eec670 <lua_fiber_run_f>
* 1 TtFiber "sched" * 0x7fffd0d4a118 0x0
(gdb)
(gdb) tt-fiber 114
#1 __pthread_kill_internal (signo=6, threadid=139852930820864) at ./nptl/pthread_kill.c:78
78 in ./nptl/pthread_kill.c
(gdb) bt
#3 0x0000555e49f2012b in fiber_yield_impl (will_switch_back=true) at /home/sergeyb/sources/MRG/tarantool/src/lib/core/fiber.c:753
#4 0x0000555e49f1f5ce in fiber_yield () at /home/sergeyb/sources/MRG/tarantool/src/lib/core/fiber.c:760
#5 0x0000555e49f202af in fiber_yield_timeout (delay=15768000000) at /home/sergeyb/sources/MRG/tarantool/src/lib/core/fiber.c:804
#6 0x0000555e49f2f2ed in fiber_cond_wait_timeout (c=0x7f3209040250, timeout=15768000000)
<snipped>
Ищем файбер по fid
(fiber id
):
(gdb) p ((struct fiber *)((char *)cord_ptr->alive->next - 144))->fid
$98 = 125
(gdb) p ((struct fiber *)((char *)cord_ptr->alive->next->next - 144))->fid
$99 = 119
Начинать с одного ->next
. Добавлять ->next
до попадания.
Посмотреть гостевой стек:
(gdb) lj-stack ((struct fiber *)((char *)cord_ptr->alive->next->next - 144))->storage.lua.stack
Отсюда 144:
(gdb) p &cord_ptr->fiber->link
$94 = (rlist *) 0x55a29fb92488 <main_cord+1160>
(gdb) p cord_ptr->fiber
$95 = (fiber *) 0x55a29fb923f8 <main_cord+1016>
(gdb) p 0x55a29fb923f8-0x55a29fb92488
$96 = -144
Tarantool speaks with clients using IPROTO protocol based on MsgPack. See IPROTO description in Tarantool Documentation. Inspecting binary traffic is a boring task. There are two network sniffers that supports IPROTO decoding: Wireshark and Tcpdump. Wireshark supports custom dissectors written in Lua, tcpdump has a work-in-progress IPROTO support.
To enable the plugin, follow the steps below:
Clone the tarantool-dissector
repository:
git clone https://github.com/tarantool/tarantool-dissector.git
Copy or symlink the plugin files into the Wireshark plugin directory:
mkdir -p ~/.local/lib/wireshark/plugins
cd ~/.local/lib/wireshark/plugins
ln -s /path/to/tarantool-dissector/MessagePack.lua ./
ln -s /path/to/tarantool-dissector/tarantool.dissector.lua ./
(For the location of the plugin directory on macOS and Windows, please refer to the Plugin folders chapter in the Wireshark documentation.)
Run the Wireshark GUI and ensure that the plugins are loaded:
Open Help > About Wireshark > Plugins.
Find MessagePack.lua and tarantool.dissector.lua in the list.
Now you can inspect incoming and outgoing Tarantool packets with user-friendly annotations.
Visit the project page for details: https://github.com/tarantool/tarantool-dissector.
To use tcpdump with IPROTO support one need to build tcpdump with applied patch. See steps to build in https://github.com/the-tcpdump-group/tcpdump/blob/master/INSTALL.md.
- Попыриться в htop, увидеть, где bottleneck по CPU (если он есть).
- Еще есть 'l' вместо 'f' — дает file:lino вместо названий функций.
- Сам не пробовал, но есть рецепт от Сергоса:
see https://www.tarantool.io/en/doc/latest/book/admin/server_introspection/#poor-mans-profilers
LuaJIT has a builtin profiler. It generates simple textual summaries or source code annotations. It can be accessed with the -jp command line option or from Lua code by loading the underlying jit.p
module.
Note: Sampling works for both interpreted and JIT-compiled code. The results for JIT-compiled code may sometimes be surprising. LuaJIT heavily optimizes and inlines Lua code — there's no simple one-to-one correspondence between source code lines and the sampled machine code.
require('jit.p').start('f', 'profile.txt')
require('jit.p').stop()
It is possible to gnerate flamegraph using bundled profiler:
require('jit.p').start('FG','output')
flamegraph.pl output > output.svg
flamegraph.pl
is a script that can be obtained in repository https://github.com/brendangregg/FlameGraph.
See notes about jit.p
in Tarantool Documentation.
Если в Lua не видно явно горячей функции, то тут возможны варианты: bottleneck вообще не там (см. пункт 6) или нагрузка размазана (см. пункт 7).
See https://habr.com/ru/companies/vk/articles/758128/
See https://www.tarantool.io/en/doc/latest/reference/tooling/luajit_memprof/
perf record -a -g
and perf report
. Тут можно увидеть, например, упираемся ли мы в декодинг tls на nginx, принимающем входящие соединения. Хотя процессными мерами (см. пункт 7) это тоже можно выяснить.
https://www.tarantool.io/en/doc/latest/book/admin/server_introspection/#perf
- Это общая рекомендация: отрезать крупные куски, затыкать их заглушками и смотреть, что изменилось (по RPS, по профилю — но не одновременно). Можно делать заглушки с сопоставимыми по размеру данными, можно намеренно с маленькими.
Тут клево оторвать вообще все, кроме сетевых походов, и посмотреть на недостижимый идеал. Потом отрезаниями кода понять, где сколько до идеала скрывается.
see https://www.tarantool.io/en/doc/latest/book/admin/server_introspection/#gperftools
require('fiber').top_enable()
- Большие readahead.
- Много таплов в runtime arena.
Ещё можно в ванильном луа включить трейс. Не нашел этого в доке, обошелся этим https://www.lua.org/pil/23.2.html
- попробовать выполнить скрипт в PUC Rio Lua
Попробовать выключить разные компоненты и выяснить их влияние:
- OSX/arm64 -> GC64
- FFI
- GC64
- Enable / disable JIT, компиляция (
jit.off
) - уровни оптимизации и виды оптимизации, см. https://ujit.readthedocs.io/en/latest/public/test-reduce-cases.html
- is it arch-specific
амальгамация Lua скрипта
luarocks install amalg
- https://github.com/siffiejoe/lua-amalg
- https://github.com/rindeal/Amalgamate
- https://github.com/vinniefalco/Amalgamate
- https://github.com/edlund/amalgamate
- https://www.mikeash.com/pyblog/friday-qa-2018-06-29-debugging-with-c-reduce.html
минимизация тестового примера
- избавиться от использования внешних модулей в Lua скрипте
- минимизировать руками или автоматически
Create a script interestingness.sh
:
#!/bin/sh
<fullpath>/src/luajit crash.lua 2>&1 | tee output.txt
grep "LuaJIT ASSERT lj_asm.c:147: checkmclim: red zone overflow:" output.txt
Execute creduce
:
$ creduce interestingness.sh crash.lua
creduce
is good enough in reducing Lua snippets. However, one could replace creduce
with another one reducer. Consider these - alternatives: cvise
, treereduce
and jubnzv/dd. Note, testcases produced by fuzzing tests could be reduced automatically by test itself, pass -minimize_crash=1
option and wait.
For other alternatives, see https://github.com/ligurio/sqa-wiki/wiki/testcase-minimization.
formatting
- применить форматирование к crash.lua
- https://staff.fnwi.uva.nl/h.vandermeer/docs/lua/luajit/luajit_debug.html
- если не получается получить репродьюсер, то записать поведение с помощью rr
- https://staff.fnwi.uva.nl/h.vandermeer/docs/lua/luajit/luajit_debug.html
- https://blast.hk/moonloader/luajit/ext_profiler.html
- https://ujit.readthedocs.io/en/latest/public/test-benchmark.html
- https://ujit.readthedocs.io/en/latest/public/test-reduce-cases.html
- https://github.com/geoffleyland/luatrace
Copyright © 2014-2024 Sergey Bronnikov. Follow me on Mastodon @sergeyb@honk.bronevichok.ru and Telegram.
Learning
- Glossary
- Books:
- Courses
- Learning Tools
- Bugs And Learned Lessons
- Cheatsheets
Tools / Services / Tests
- Quality Assurance Tools
- Test Runners
- Testing-As-A-Service
- Conformance Test Suites
- Test Infrastructure
- Fault injection
- TTCN-3
- Continuous Integration
- Speedup your CI
- Performance
- Formal Specification
- Toy Projects
- Test Impact Analysis
- Formats
Functional testing
- Automated testing
- By type:
WIP sections
Community
Links