Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subcommands plugins {list,status}, to inquire plugins #91

Merged
merged 3 commits into from
Sep 21, 2023

Conversation

amotl
Copy link
Contributor

@amotl amotl commented Sep 20, 2023

About

At grafana-toolbox/grafana-client#110, @bhks added a few more API wrapper functions for Grafana. Thanks! This patch wraps them once more into the command line interface of grafana-wtf.

Synopsis

# Explore plugins.
grafana-wtf plugins list
grafana-wtf plugins status

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

Problem

@bhks: Even with the most recent Grafana 10.1.2 release, grafana-wtf plugins status does not work well on my machine. It fails to inquire the corresponding health check and metrics endpoints.

2023-09-20 23:09:47,850 [grafana_wtf.core                    ] INFO   : Health check failed: Server Error 503: Plugin unavailable
2023-09-20 23:09:47,895 [grafana_client.elements.plugin      ] INFO   : Got error in fetching metrics for plugin satellogic-3d-globe-panel and error = Server Error 503: Plugin unavailable
2023-09-20 23:09:47,896 [grafana_wtf.core                    ] INFO   : Metrics inquiry failed: get_plugin_metrics returned nothing

Thoughts

  • Maybe I am using them wrong?
  • Do they need to be enabled within grafana.ini?
  • On the other hand, maybe both are enterprise features, and not available in Grafana OSS?

Q&A

You can install the package including this feature directly from the corresponding branch using this pip command, in order to check if it works on your end. I will be happy to hear back about the outcome.

pip install --upgrade 'git+https://github.com/panodata/grafana-wtf@list-plugins'

@bhks
Copy link

bhks commented Sep 20, 2023

Thanks for pinging me into this thread.

The panel plugin does not have any server running which can give us health or metrics from the backend. Also not all of the plugin servers implement this.

The After plugin sdk kind of enforces for plugin developers to implement the protobuff model from version-1, so I don't know if all of the marketplace plugins of type datasource or App follows those.

@bhks
Copy link

bhks commented Sep 20, 2023

I can try this out when I get a chance and see what works and what not.

plugins = self.grafana.plugin.get_installed_plugins()
for plugin in plugins:
plugin = munchify(plugin)
item = Munch(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can filter out the panel type as they don't have server to respond from.

@bhks
Copy link

bhks commented Sep 20, 2023

Not all servers/plugins have implemented the metric endpoint, this was I believe introduced in the new SDK which helps build an external plugin.

Also all core plugins does not have a server so they are built within grafana golang code , they don't have health or metric endpoint. So we need to filter them out as well.

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

Thanks for your response. According to grafana-wtf plugins list, a representation looks like this.

{
    "name": "Alert list",
    "type": "panel",
    "id": "alertlist",
    "enabled": true,
    "pinned": false,
    "info": {
        "author": {
            "name": "Grafana Labs",
            "url": "https://grafana.com"
        },
        "description": "Shows list of alerts and their current status",
        "links": null,
        "logos": {
            "small": "public/app/plugins/panel/alertlist/img/icn-singlestat-panel.svg",
            "large": "public/app/plugins/panel/alertlist/img/icn-singlestat-panel.svg"
        },
        "build": {},
        "screenshots": null,
        "version": "",
        "updated": ""
    },
    "dependencies": {
        "grafanaDependency": "",
        "grafanaVersion": "*",
        "plugins": []
    },
    "latestVersion": "",
    "hasUpdate": false,
    "defaultNavUrl": "/grafana",
    "category": "",
    "state": "",
    "signature": "internal",
    "signatureType": "",
    "signatureOrg": ""
}

Also all core plugins does not have a server so they are built within grafana golang code , they don't have health or metric endpoint. So we need to filter them out as well.

Would skipping all plugins having "signature": "internal" on health and metrics inquiry a good option to proceed with here?

@bhks
Copy link

bhks commented Sep 20, 2023

Would skipping all plugins having "signature": "internal" on health and metrics inquiry a good option to proceed with here?

Exactly that and the following one as well

"type": "panel",

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

Maybe just including the items with "type": "datasource" would be the right choice, not bothering about skipping certain others like "signature": "internal" and "type": "panel" at all?

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

I've amended the patch to only use if item.type == "datasource" at this spot, significantly reducing unneccessary probes, and I think it works well so far. Thank you very much.

@amotl amotl requested a review from bhks September 20, 2023 22:25
@bhks
Copy link

bhks commented Sep 20, 2023

But internal/core plugins will not be able to respond like Cloudwatch, prometheus, rds they are datasource type.

Also the App Plugin do respond to these endpoints like api/plugins/aws-datasource-provisioner-app/health

{
  "message": "",
  "status": "OK"
}

Similarly for metrics api/plugins/aws-datasource-provisioner-app/metrics

# HELP go_sync_mutex_wait_total_seconds_total Approximate cumulative time goroutines have spent blocked on a sync.Mutex or sync.RWMutex. This metric is useful for identifying global changes in lock contention. Collect a mutex or block profile using the runtime/pprof package for more detailed contention data.
# TYPE go_sync_mutex_wait_total_seconds_total counter
go_sync_mutex_wait_total_seconds_total 0
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 8
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="CollectMetrics",grpc_service="pluginv2.Diagnostics",grpc_type="unary"} 1
grpc_server_msg_received_total{grpc_method="StreamStdio",grpc_service="plugin.GRPCStdio",grpc_type="server_stream"} 1
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="CollectMetrics",grpc_service="pluginv2.Diagnostics",grpc_type="unary"} 1
grpc_server_started_total{grpc_method="StartStream",grpc_service="plugin.GRPCBroker",grpc_type="bidi_stream"} 1
grpc_server_started_total{grpc_method="StreamStdio",grpc_service="plugin.GRPCStdio",grpc_type="server_stream"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.7
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 65535
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 13

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

Thanks. I've added if item.type == "datasource" and item.signature != "internal" to my working tree, and it significantly reduces unneccessary probes further.

@bhks
Copy link

bhks commented Sep 20, 2023

Do you mind updating that if with following :

if item.type != "panel" and item.signature != "internal"

@amotl amotl force-pushed the list-plugins branch 2 times, most recently from 5bd2674 to 435595b Compare September 20, 2023 23:30
Comment on lines 588 to 615
def test_plugins_status(grafana_version, docker_grafana, capsys, caplog):
"""
Verify the plugin status (metrics endpoint).

TODO: Verify a plugin which properly responds to a health check response.
"""
if version.parse(grafana_version) < version.parse("8"):
raise pytest.skip(f"Plugin status inquiry only works on Grafana 8 and newer")

# Before conducting a plugin status test, install a non-internal one.
grafana = grafana_client.GrafanaApi.from_url(url=docker_grafana, timeout=15)
grafana.plugin.install_plugin("yesoreyeram-infinity-datasource")

# Which subcommand to test?
set_command("plugins status", "--format=yaml")

# Run command and capture YAML output.
with caplog.at_level(logging.DEBUG):
grafana_wtf.commands.run()
captured = capsys.readouterr()
data = yaml.safe_load(captured.out)

# Grafana 6 has 28 plugins preinstalled.
assert len(data) >= 28

# Proof the output is correct.
infinity_datasource = next(item for item in data if item["id"] == "yesoreyeram-infinity-datasource")
assert "go_gc_duration_seconds" in infinity_datasource["metrics"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhks: This is a first real integration test, using your contribution to the grafana-client library.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work Andreas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just building upon your work here ;], thanks again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI currently fails here, because a few more adjustments will be submitted to grafana-client to make it work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also:

TODO: Verify a plugin which properly responds to a health check response.

Haven't been able to discover one, yet. If that plugin sample you just shared with me, will respond to such a response, I will be happy about it.

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

Do you mind updating that if with following :

if item.type != "panel" and item.signature != "internal"

Can you convince me why this is better? To me, it sounds more inadequate, because we are mostly talking about "datasource" plugins here? Are there any other types of plugins which yield sensible responses on their metrics or health endpoints?

If so, can you provide a sample (plugin id) of that kind, so I can use it on behalf of a corresponding software test case? Thanks!

Comment on lines 588 to 615
def test_plugins_status(grafana_version, docker_grafana, capsys, caplog):
"""
Verify the plugin status (metrics endpoint).

TODO: Verify a plugin which properly responds to a health check response.
"""
if version.parse(grafana_version) < version.parse("8"):
raise pytest.skip(f"Plugin status inquiry only works on Grafana 8 and newer")

# Before conducting a plugin status test, install a non-internal one.
grafana = grafana_client.GrafanaApi.from_url(url=docker_grafana, timeout=15)
grafana.plugin.install_plugin("yesoreyeram-infinity-datasource")

# Which subcommand to test?
set_command("plugins status", "--format=yaml")

# Run command and capture YAML output.
with caplog.at_level(logging.DEBUG):
grafana_wtf.commands.run()
captured = capsys.readouterr()
data = yaml.safe_load(captured.out)

# Grafana 6 has 28 plugins preinstalled.
assert len(data) >= 28

# Proof the output is correct.
infinity_datasource = next(item for item in data if item["id"] == "yesoreyeram-infinity-datasource")
assert "go_gc_duration_seconds" in infinity_datasource["metrics"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work Andreas.

@bhks
Copy link

bhks commented Sep 20, 2023

Can you convince me why this is better? To me, it sounds more inadequate, because we are mostly talking about "datasource" plugins here? Are there any other types of plugins which yield sensible responses on their metrics or health endpoints?

If so, can you provide a sample (plugin id) of that kind, so I can use it on behalf of a corresponding software test case?

Apologies about being too specific here.

May be the example plugin id in comment can help :#91 (comment)

I was referring to a third type which is App plugin type which have data source as well as panel plugins bundled. Example like

aws-datasource-provisioner-app.

Here are multiple app type plugins we can explore , I am not 100% sure if they all have these endpoints : https://grafana.com/grafana/plugins/app-plugins/

@amotl
Copy link
Contributor Author

amotl commented Sep 20, 2023

aws-datasource-provisioner-app works well, and provides both health and metrics. Thank you!

[
    {
        "name": "AWS Data Sources",
        "type": "app",
        "id": "aws-datasource-provisioner-app",
        "enabled": false,
        "category": "",
        "version": "1.13.0",
        "signature": "valid",
        "health": {
            "message": "",
            "status": "OK"
        },
        "metrics": "# HELP go_cgo_go_to_c_calls_calls_total Count of calls made from Go to C by the current process.\n# TYPE go_cgo_go_to_c_calls_calls_total counter\ngo_cgo_go_to_c_calls_calls_total 0\n# HELP go_cpu_classes_gc_mark_assist_cpu_seconds_total Estimated total CPU time goroutines spent performing GC tasks to assist the GC and prevent it from falling behind the application. This metric is an overestimate, and not directly ..."
    }
]

@amotl
Copy link
Contributor Author

amotl commented Sep 21, 2023

plugins = self.grafana.plugin.get_installed_plugins()
for plugin in plugins:
plugin = munchify(plugin)
item = Munch(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I haven't used munchify ever , that seems like a great way to remove the tedious key value mapping with strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can afford the runtime speed bump, it definitively saves a few keystrokes!

By the way, big thanks to @dsc, @vmalloc, @JosePVB, @innermatrix, @d1618033, and all others who conceived it, and are maintaining it.


# Status inquiry is not provided by all plugins. Let's filter them.
# Effectively, run it only on non-internal "app" and "datasource" items.
if item.type != "panel" and item.signature != "internal":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for incorporating !!

log.warning(f"Metrics inquiry failed for plugin {item.id}, type={item.type}: {ex}")
else:
log.info(f"Skipping status inquiry for plugin {item.id}, type={item.type}")
status.append(item)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice in one go we can get all of them.

Also, test status inquiry after plugin uninstall.
@amotl amotl merged commit 3885ff5 into this-and-that Sep 21, 2023
10 checks passed
@amotl amotl deleted the list-plugins branch September 21, 2023 10:49
@amotl amotl restored the list-plugins branch September 21, 2023 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants