Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

observer 启动失败,日志文件打印的错误信息太笼统,无法定位具体的错误 #97

Closed
GangLiCN opened this issue Jun 10, 2021 · 5 comments
Assignees

Comments

@GangLiCN
Copy link

[Issue summary]
observer 启动失败,但是从日志文件(observer,log),用户很难获得有用的错误信息,日志里记录了一个返回代码(-4147),
但是根据这个错误代码却查询不到对应的错误信息,不能像Oracle的oerr工具那么智能(只要你输入一个错误号,就能返回
对应的错误信息,而且这个错误信息支持多国语言版本,例如将locale设置成zh_CN再执行oerr xxx, 返回的信息就是中文的错误信息。

[Steps]

  1. 照着官网的部署文档做了一遍,结果在启动集群的时候失败,看了observer.log, 好像说是配置不合理,具体的错误信息如下:
    ERROR [SERVER] init_config (ob_server.cpp:832) [1320][0][Y0-0000000000000000] [lt=32] invalid config from cmdline options(opts_.optstr_="__min_full_resource_pool_memory=268435456,datafile_size=8G,memory_limit=4G,system_memory=2G,stack_size=512K,cpu_count=1,cache_wash_threshold=1G,workers_per_cpu_quota=10,schema_history_expire_time=1d,net_thread_count=1,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_disk_percentage=20", ret=-4147) BACKTRACE:0x90a107e 0x90008fb 0x24d18a1 0x251b43b 0x8702a2e 0x86fe493 0x24be805 0x7fb34909f555 0x24bd4e9

完整的ERROR信息 from observer.log
[root@redis-server-1 log]# grep -i "error" observer.log
[2021-06-10 17:36:44.655716] ERROR [SERVER] init_config (ob_server.cpp:832) [6630][0][Y0-0000000000000000] [lt=32] invalid config from cmdline options(opts_.optstr_="__min_full_resource_pool_memory=268435456,datafile_size=8G,memory_limit=2G,system_memory=2G,stack_size=128K,cpu_count=1,cache_wash_threshold=512M,workers_per_cpu_quota=1,schema_history_expire_time=1d,net_thread_count=1,sys_bkgd_migration_retry_num=3,minor_freeze_times=10,enable_separate_sys_clog=0,enable_merge_by_turn=False,datafile_disk_percentage=20", ret=-4147) BACKTRACE:0x90a107e 0x90008fb 0x24d18a1 0x251b43b 0x8702a2e 0x86fe493 0x24be805 0x7f5e95a07555 0x24bd4e9
[2021-06-10 17:36:44.656767] INFO ob_server_config.cpp:242 [6630][0][Y0-0000000000000000] [lt=4] | ignore_replay_checksum_error = False
[2021-06-10 17:36:44.656810] INFO ob_server_config.cpp:242 [6630][0][Y0-0000000000000000] [lt=4] | ignore_replica_checksum_error = False
[2021-06-10 17:36:44.657534] INFO ob_server_config.cpp:242 [6630][0][Y0-0000000000000000] [lt=4] | enable_rich_error_msg = False
[2021-06-10 17:36:44.661805] ERROR [SERVER] init (ob_server.cpp:165) [6630][0][Y0-0000000000000000] [lt=5] init config fail(ret=-4147) BACKTRACE:0x90a107e 0x90008fb 0x24c152f 0x24c04f6 0x86fee88 0x24be805 0x7f5e95a07555 0x24bd4e9
[2021-06-10 17:36:44.663553] ERROR stop (ob_ddl_task_executor.cpp:176) [6630][0][Y0-0000000000000000] [lt=5] invalid tg id BACKTRACE:0x90a107e 0x90008fb 0x24c00eb 0x24bd7a5 0x6257766 0x5ef0630 0x65dfebf 0x86fd61f 0x86fef2d 0x24be805 0x7f5e95a07555 0x24bd4e9
[2021-06-10 17:36:44.663732] ERROR wait (ob_ddl_task_executor.cpp:181) [6630][0][Y0-0000000000000000] [lt=177] invalid tg id BACKTRACE:0x90a107e 0x90008fb 0x24c00eb 0x24bd7a5 0x6257956 0x5ef0638 0x65dfebf 0x86fd61f 0x86fef2d 0x24be805 0x7f5e95a07555 0x24bd4e9
[2021-06-10 17:36:44.666418] ERROR [SERVER] main (main.cpp:485) [6630][0][Y0-0000000000000000] [lt=6] observer init fail(ret=-4147) BACKTRACE:0x90a107e 0x90008fb 0x24c152f 0x24c04f6 0x24bea5e 0x7f5e95a07555 0x24bd4e9

看了半天配置文件,好像没发现有什么错误,我是在虚拟机上测试的,因为没有分配那么多内存和CPU, 所以只修改了对应的参数。
oceanbase-ce:
servers:
- 127.0.0.1
global:
home_path: /root/observer
devname: lo
mysql_port: 2883
rpc_port: 2882
zone: zone1
cluster_id: 1
datafile_size: 8G
memory_limit: 4G
system_memory: 2G
stack_size: 512K
cpu_count: 1
cache_wash_threshold: 512M
__min_full_resource_pool_memory: 268435456
workers_per_cpu_quota: 1
schema_history_expire_time: 1d
net_thread_count: 1
sys_bkgd_migration_retry_num: 3
minor_freeze_times: 10
enable_separate_sys_clog: 0
enable_merge_by_turn: FALSE
datafile_disk_percentage: 20

  1. 关于配置文件应该在什么场景下进行校验
    对配置文件参数的合法性校验 第一步应该是在obd cluster deploy执行的时候去做校验:
    --如果校验通过,则完成部署
    --如果校验未通过(例如/root/observer 目录不为空),打印对应的错误信息。

然后,在启动集群的时候 肯定也要重新校验这个配置文件,因为用户很有可能在deploy后又修改了配置文件。

[Suggestions]
目前的校验流程本身没有什么大问题,主要问题出在与用户的交互上。日志文件是辅助用户定位错误的重要线索,
但是从目前来看,这个日志文件虽然打印的内容很多,但是提供的实质性的,有用的信息有限,很难提供给用户
清晰的线索去定位真正的问题。这点是需要改进的,离产品的标准化还是有一段距离的。

@SanmuWangZJU
Copy link
Contributor

SanmuWangZJU commented Jun 10, 2021

  1. See file src/share/ob_errno.h for an explanation of the error codes, error code -4147 is OB_INVALID_CONFIG
  2. error occur at src/observer/ob_server.cpp, which indicates error config, we can find file src/share/parameter/ob_parameter_seed.ipp contains configure parameters. According to this file, the value of stack_size is in the range between 512K and 20M
    the config provided by https://github.com/oceanbase/obdeploy/blob/master/example/mini-local-example.yaml is the MINIMUM CONFIGURATION to be able to start the observer process, please DO NOT dial down the configuration anymore
  3. You can start a discussion in GitHub and Get involved in the development of OB
  4. Log file can really help developers to locate problems. But a little source code reading skills are needed to understand the contents of the logs. As for how to provide users with clear clues to locate the problem based on the logs, this needs to be discussed further together, and finally, thank you for providing such valuable ideas to the community

@nroskill
Copy link
Contributor

nroskill commented Jun 11, 2021

There is inner restriction that memory_limit should be [8G, ).

Documents and parameter description are wrong, and we will fix this later.

For your question, set memory_limit to 8G may solve the problem.

Thank you very much for your feedback.

@GangLiCN
Copy link
Author

Bad luck, after modifed configuration parameters, observer still failed to start up.

......
[2021-06-11 15:00:12.069212] ERROR [SERVER] main (main.cpp:485) [17818][0][Y0-0000000000000000] [lt=7] observer init fail(ret=-4147) BACKTRACE:0x90a107e 0x90008fb 0x24c152f 0x24c04f6 0x24bea5e 0x7fe790bc2555 0x24bd4e9
......

@SanmuWangZJU
Copy link
Contributor

please provide more log for this error

  1. adjust log level to DEBUG
  2. find error log which contains valid trace id(string like Yxxxxxxx-xxxxxxx)
  3. grep the trace_id in observer.log

@watchpoints
Copy link
Contributor

watchpoints commented Jun 16, 2021

  • this is a err config
    please

grep ERROR observer.log

find the first erros,

  • you have not show the first err

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants