Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker1.3.6 阿里云RDS实例无法获取TOP表空间 #9

Closed
eason0420 opened this issue Nov 6, 2018 · 13 comments
Closed

docker1.3.6 阿里云RDS实例无法获取TOP表空间 #9

eason0420 opened this issue Nov 6, 2018 · 13 comments

Comments

@eason0420
Copy link

eason0420 commented Nov 6, 2018

RDS的进程状态和慢SQL都是可以获取的,就是TOP空间显示不出来

我发现一个问题
1、把阿里云RDS配置中实例ID和对应主库实例名称删除后,进程和top空间就可以显示,但是慢SQL就查不到了
2、把阿里云RDS配置中实例ID和对应主库实例名称增加后,进程和top空间就不显示了,但是慢SQL就可以显示了

@hhyo
Copy link
Owner

hhyo commented Nov 6, 2018

db_diagnostic.py#L25
slowlog.py#L36
上面的代码可以看出是否调用RDS的SDK就是按照关联表来判断的,项目中有慢日志、进程、top表空间三处会去调用RDS的SDK
由于在RDS管理后台中,进程列表和top表空间属于CloudDBA的功能,是需要管理key权限的,只读key无权限调用,具体可以查看downloads/log/archer.log里面的日志信息,会有报错日志

@eason0420
Copy link
Author

eason0420 commented Nov 7, 2018

我用的是阿里云主账号的管理key,如果不添加主库实例和阿里云RDS实例ID的对应是可以查到显示进程和top空间信息的,但是慢日志是无法查询显示的,我点慢查日志后downloads/log/archer.log这里面总是打印下面的信息
[2018-11-07 10:07:58,015][MainThread:140414134847296][task_id:django_apscheduler][jobstores.py:187][DEBUG]- Got event: <SchedulerEvent (code=16)>, <class 'apscheduler.events.SchedulerEvent'>, {'code': 16, 'alias': 'default'}
[2018-11-07 10:07:58,015][MainThread:140414134847296][task_id:django_apscheduler][jobstores.py:187][DEBUG]- Got event: <SchedulerEvent (code=1)>, <class 'apscheduler.events.SchedulerEvent'>, {'code': 1, 'alias': None}
[2018-11-07 10:07:58,016][APScheduler:140413938427648][task_id:django_apscheduler][jobstores.py:66][DEBUG]- get_due_jobs for time=2018-11-07 10:07:58.016142+08:00
[2018-11-07 10:07:58,022][APScheduler:140413938427648][task_id:django_apscheduler][jobstores.py:69][DEBUG]- Got []
[2018-11-07 10:08:05,690][MainThread:140414134847296][task_id:django_apscheduler][jobstores.py:187][DEBUG]- Got event: <SchedulerEvent (code=16)>, <class 'apscheduler.events.SchedulerEvent'>, {'code': 16, 'alias': 'default'}
[2018-11-07 10:08:05,690][MainThread:140414134847296][task_id:django_apscheduler][jobstores.py:187][DEBUG]- Got event: <SchedulerEvent (code=1)>, <class 'apscheduler.events.SchedulerEvent'>, {'code': 1, 'alias': None}
[2018-11-07 10:08:05,691][APScheduler:140413938165504][task_id:django_apscheduler][jobstores.py:66][DEBUG]- get_due_jobs for time=2018-11-07 10:08:05.691178+08:00
[2018-11-07 10:08:05,697][APScheduler:140413938165504][task_id:django_apscheduler][jobstores.py:69][DEBUG]- Got []

如果我添加上主库名称和阿里云RDS实例id的对应后,这个实例的进程和top空间就无法查询显示出来了,但是慢查日志反而可以查询显示了,而且点实例管理时日志也是打印这些信息
[2018-11-07 10:13:20,472][MainThread:139869766567744][task_id:django_apscheduler][jobstores.py:187][DEBUG]- Got event: <SchedulerEvent (code=16)>, <class 'apscheduler.events.SchedulerEvent'>, {'code': 16, 'alias': 'default'}
[2018-11-07 10:13:20,472][MainThread:139869766567744][task_id:django_apscheduler][jobstores.py:187][DEBUG]- Got event: <SchedulerEvent (code=1)>, <class 'apscheduler.events.SchedulerEvent'>, {'code': 1, 'alias': None}
[2018-11-07 10:13:20,473][APScheduler:139869570148096][task_id:django_apscheduler][jobstores.py:66][DEBUG]- get_due_jobs for time=2018-11-07 10:13:20.473070+08:00
[2018-11-07 10:13:20,478][APScheduler:139869570148096][task_id:django_apscheduler][jobstores.py:69][DEBUG]- Got []

我这个是跑的docker版1.3.6,这个现象很奇怪,如果是key问题的话,应该是都显示不出报错才正常的,现在也没有错误日志

@hhyo
Copy link
Owner

hhyo commented Nov 7, 2018

上面的日志是定时job的日志,不相关

  • 加上关联关系能显示慢日志说明key调用是正常的,chrome游览器可以f12查看进程和表空间的接口响应是什么,按理应该是报错的
  • 去除关联关系能显示进程和表空间是因为此时没有调用rds,直接采取sql在实例查询的,但是由于rds的账号权限限制,是无法kill process的,慢日志不能显示也是一样道理(本地没采集)

很抱歉我目前没有rds实例可以用来做测试,所以不确定阿里云是否关闭了clouddba的调用

@eason0420
Copy link
Author

咨询了下阿里云,进程和top的api关了。 那实例会话那里是否可以不用通过阿里云配置去查询

@hhyo
Copy link
Owner

hhyo commented Nov 7, 2018

果然阿里云客服一如既往和稀泥,我临时开了一个实例做了测试,使用管理key(AliyunRDSFullAccess授权策略)三个功能都正常,如果使用只读key,会话管理和表空间会有如下错误

[2018-11-07 23:33:52,119][MainThread:140675498256192][task_id:default][exception_logging_middleware.py:11][ERROR]- Traceback (most recent call last):
  File "/opt/venv4archery/lib/python3.6/site-packages/django/core/handlers/base.py", line 126, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/venv4archery/lib/python3.6/site-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/opt/archery/sql/db_diagnostic.py", line 27, in process
    result = aliyun_process_status(request)
  File "/opt/archery/sql/aliyun_rds.py", line 104, in process_status
    {"Language": "zh", "Command": command_type})
  File "/opt/archery/common/utils/aliyun_sdk.py", line 88, in RequestServiceOfCloudDBA
    result = self.request_api(request, values)
  File "/opt/archery/common/utils/aliyun_sdk.py", line 37, in request_api
    result = self.clt.do_action_with_exception(request)
  File "/opt/venv4archery/lib/python3.6/site-packages/aliyunsdkcore/client.py", line 288, in do_action_with_exception
    request_id=request_id)
aliyunsdkcore.acs_exception.exceptions.ServerException: HTTP Status: 403 Error:Forbidden.RAM User not authorized to operate on the specified resource, or this API does not support RAM. RequestID: 8C470334-77B3-429B-AC06-7F1B7EBA6BEE

目前修改key设置需要重启服务

@eason0420
Copy link
Author

我这个是用的admin的key,主要是archery日志里面也没有错误日志(用docker版1.3.6跑的)。你程序里调用的阿里云的哪个API查看进程和top空间的,我用key直接试下呢。

@hhyo
Copy link
Owner

hhyo commented Nov 8, 2018

接口和脚本都在这个文件:aliyun_sdk.py

@eason0420
Copy link
Author

我用python3通过key调用sdk获取数据,发现空间获取的数据为空,慢SQL是可以获取的。也没有报key权限的错。

[root@dbmonitor ~]# python3 getSpace.py
获取表空间信息:
{'status': 0, 'msg': 'ok', 'data': []}
获取慢SQL:
{'total': 5, 'rows': [{'ParseMaxRowCount': 1186758, 'MySQLTotalExecutionCounts': 1, 'SQLId': '991484519352209408', 'SQLText': 'select count ( 0 ) from ( se

@hhyo
Copy link
Owner

hhyo commented Nov 8, 2018

进rds控制台,查看clouddba里面的问题诊断,表空间和进程就是获取的那里的信息,也许本身就没有数据

@eason0420
Copy link
Author

这个实例是有的,在archery中我把阿里云RDS配置中实例ID和对应主库实例名称删除后 这个实例就是可以获取进程和空间信息的

@hhyo
Copy link
Owner

hhyo commented Nov 8, 2018

sql直接查肯定是有的,所以需要确认是不是clouddba也有,进rds控制台确认就行。rds的进程展示通过sql查询出来是不能kill的,所以不考虑跳过SDK查询,表空间可以考虑剥离开

@eason0420
Copy link
Author

我靠了,现在clouddba中都没有这些信息了,阿里云真坑

@eason0420
Copy link
Author

这个又咨询了他们的技术人员,clouddba权限还是都有的,是我把rds的sql_mode=only_full_group_by 打开了,clouddba获取top表空间失败了,应该是获取top表空间的SQL没有按标准写

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants