Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[procstat] Supports obtaining process information directly through the supervisor #13416

Closed
chenbt-hz opened this issue Jun 9, 2023 · 9 comments · Fixed by #13417
Closed
Labels
feature request Requests for new plugin and for new features to existing plugins waiting for response waiting for response from contributor

Comments

@chenbt-hz
Copy link
Contributor

chenbt-hz commented Jun 9, 2023

Use Case

Currently, procstat does not support the configuration of the supervisor, but obtains information through the supervisor http server of the supervisor plug-in.
Therefore, there are two issues:

  1. The supervisor needs to enable the HTTP service, and some versions are too old to support it. Additionally, the supervisor currently has a 100% CPU usage issue that has not been fixed. Upgrading the version requires changes to the environment and may result in service suspension, which may be unacceptable in some scenarios.
  2. The supervisor plugin has obtained too little information. Unable to monitor the Child process and obtain data such as memory usage and cup usage.

Expected behavior

I hope to collect the running information of the services managed by the supervisor through the Prostat plugin.

I have tried to implement this feature, but there may still be many shortcomings. I hope to receive guidance and assistance to continue improving.

Actual behavior

config like:

  pid_finder = "pgrep"
  supervisor_unit = "cmdb,sleep:sleep-0,cmdb_mq"

commind

cmdb                             RUNNING   pid 11779, uptime 2 days, 0:56:47
cmdb_mq                          FATAL     Exited too quickly (process log may have details)
sleep:sleep-0                    RUNNING   pid 11781, uptime 2 days, 0:56:47
sleep:sleep-1                    RUNNING   pid 11780, uptime 2 days, 0:56:47
sleep:sleep-2                    RUNNING   pid 11782, uptime 2 days, 0:56:47

Got:

>procstat,host=13_144,parent_pid=11779,pattern=11779,process_name=gunicorn,status=RUNNING,supervisor_unit=cmdb,uptimes=2,user=root child_major_faults=0i,child_minor_faults=0i,cpu_time=47i,cpu_time_guest=0,cpu_time_guest_nice=0,cpu_time_idle=0,cpu_time_iowait=0,cpu_time_irq=0,cpu_time_nice=0,cpu_time_soft_irq=0,cpu_time_steal=0,cpu_time_system=10.29,cpu_time_user=37.14,cpu_usage=0,created_at=1686125665000000000i,involuntary_context_switches=39i,major_faults=0i,memory_data=50212864i,memory_locked=0i,memory_rss=57044992i,memory_stack=151552i,memory_swap=0i,memory_usage=0.6956536769866943,memory_vms=361857024i,minor_faults=12734i,nice_priority=20i,num_fds=13i,num_threads=1i,pid=11790i,ppid=11779i,read_bytes=0i,read_count=1651i,realtime_priority=0i,rlimit_cpu_time_hard=9223372036854775807i,rlimit_cpu_time_soft=9223372036854775807i,rlimit_file_locks_hard=9223372036854775807i,rlimit_file_locks_soft=9223372036854775807i,rlimit_memory_data_hard=9223372036854775807i,rlimit_memory_data_soft=9223372036854775807i,rlimit_memory_locked_hard=65536i,rlimit_memory_locked_soft=65536i,rlimit_memory_rss_hard=9223372036854775807i,rlimit_memory_rss_soft=9223372036854775807i,rlimit_memory_stack_hard=9223372036854775807i,rlimit_memory_stack_soft=8388608i,rlimit_memory_vms_hard=9223372036854775807i,rlimit_memory_vms_soft=9223372036854775807i,rlimit_nice_priority_hard=0i,rlimit_nice_priority_soft=0i,rlimit_num_fds_hard=204800i,rlimit_num_fds_soft=204800i,rlimit_realtime_priority_hard=0i,rlimit_realtime_priority_soft=0i,rlimit_signals_pending_hard=31198i,rlimit_signals_pending_soft=31198i,signals_pending=0i,voluntary_context_switches=183632i,write_bytes=0i,write_count=1i 1686300837000000000
>procstat_lookup,host=13_144,parent_pid=11781,pattern=11781,pid_finder=pgrep,result=success,status=RUNNING,supervisor_unit=sleep:sleep-0,uptimes=2 pid_count=8i,result_code=0i,running=8i 1686300837000000000

Additional info

No response

@chenbt-hz chenbt-hz added the feature request Requests for new plugin and for new features to existing plugins label Jun 9, 2023
chenbt-hz added a commit to chenbt-hz/telegraf that referenced this issue Jun 9, 2023
@powersj
Copy link
Contributor

powersj commented Jun 9, 2023

Hi,

Not sure I am following your request.

Currently, procstat does not support the configuration of the supervisor, but obtains information through the supervisor http server of the supervisor plug-in.

Can you point to where you see this please?

Thanks

@powersj powersj added the waiting for response waiting for response from contributor label Jun 9, 2023
@chenbt-hz
Copy link
Contributor Author

Hi,

Not sure I am following your request.

Currently, procstat does not support the configuration of the supervisor, but obtains information through the supervisor http server of the supervisor plug-in.

Can you point to where you see this please?

Thanks

Hello, I'm not sure what your question is?
Here is my supplementary explanation:
My services are managed through a supervisor. I tried using the procstat plugin to collect process information, which can collect resource information such as mems and CPUs, but procstat may not be able to match the correct process. When I use the supervisor plugin, it requires me to use versions 3.2 and later, and requires me to enable the HTTP server, and cannot obtain information such as mems and CPUs like procstat.
I only started testing the above plugins this week, so I tried adding relevant features myself. If you have a better way, I would greatly appreciate it

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 10, 2023
@powersj
Copy link
Contributor

powersj commented Jun 13, 2023

Hello, I'm not sure what your question is?

And I'm not sure I understand the use case :)

supervisor_unit = "cmdb,sleep:sleep-0,cmdb_mq"

You have this in your actual behavior, but this does not exist today

procstat may not be able to match the correct process

Why not? Can you provide exactly what is missing? What you tried?

@powersj powersj added the waiting for response waiting for response from contributor label Jun 13, 2023
@chenbt-hz
Copy link
Contributor Author

Hello, I'm not sure what your question is?

And I'm not sure I understand the use case :)

supervisor_unit = "cmdb,sleep:sleep-0,cmdb_mq"

You have this in your actual behavior, but this does not exist today

Yes, it doesn't exist today. I modified procstat myself and added this feature. I hope to incorporate it into the master, if possible,#13417
There are still issues with this code.

procstat may not be able to match the correct process

Why not? Can you provide exactly what is missing? What you tried?

Actually, I haven't gone through a lot of validation. However, due to historical reasons, the same service (different versions) may have different process names in my cluster, making regular matching very cumbersome.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 14, 2023
@powersj
Copy link
Contributor

powersj commented Jun 14, 2023

However, due to historical reasons, the same service (different versions) may have different process names in my cluster, making regular matching very cumbersome.

You have not helped me understand or describe what your issue or use-case is any further.

@powersj powersj added the waiting for response waiting for response from contributor label Jun 14, 2023
@chenbt-hz
Copy link
Contributor Author

However, due to historical reasons, the same service (different versions) may have different process names in my cluster, making regular matching very cumbersome.

You have not helped me understand or describe what your issue or use-case is any further.

maybe like ...
\_ cmdb_v1 run -c 4 " For example, the PID is 11233"

\_ metric

\_ db

\_ webserver

\_ Collecting

\_ cmdb_v2 run -c 4 "For example, the PID is 14124"

\_ metrics_service

\_ web_service

\_ cmdb_service

\_ cmdb_service

When you run the supervisorctl status cmdb, you can directly obtain the pid and then find the Child process,regardless of changes between versions.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 14, 2023
@powersj
Copy link
Contributor

powersj commented Jun 14, 2023

Can you please share what was asked in the original bug template?

  1. What did the config you used with proc stat look like
  2. What output did you get? Even if it is wrong it helps us understand what is currently going on
  3. What output did you expect? This lets us see what you think is missing

@powersj powersj added the waiting for response waiting for response from contributor label Jun 14, 2023
@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

1 similar comment
@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

srebhan pushed a commit to chenbt-hz/telegraf that referenced this issue Nov 1, 2023
srebhan pushed a commit to chenbt-hz/telegraf that referenced this issue Nov 1, 2023
srebhan pushed a commit to chenbt-hz/telegraf that referenced this issue Nov 3, 2023
srebhan pushed a commit to chenbt-hz/telegraf that referenced this issue Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins waiting for response waiting for response from contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants