Skip to content

Commit 0f58fd0

Browse files
mergify[bot]zmoog
andauthored
[Azure Monitor] Add default timegrain to Azure Storage Account metricset (#46786) (#46851)
Adds a new `default_timegrain` configuration option to allow users to customize the timegrain used in the Storage Account metricset. The default value remains PT5M, but users can now choose a different value. Without this option, users can only collect metrics with a PT5M time grain. It is a sensible default, but some users want to collect metrics with a PT1M time grain. To learn more, see elastic/integrations#15464. (cherry picked from commit 8f145b9) # Conflicts: # docs/reference/metricbeat/metricbeat-metricset-azure-storage.md # x-pack/metricbeat/module/azure/storage/_meta/docs.md Co-authored-by: Maurizio Branca <maurizio.branca@elastic.co>
1 parent 970c136 commit 0f58fd0

File tree

9 files changed

+282
-19
lines changed

9 files changed

+282
-19
lines changed

CHANGELOG.next.asciidoc

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,27 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
364364
- Only watch metadata for ReplicaSets in metricbeat k8s module {pull}41289[41289]
365365
- Preserve queries for debugging when `merge_results: true` in SQL module {pull}42271[42271]
366366
- Collect more fields from ES node/stats metrics and only those that are necessary {pull}42421[42421]
367+
- Add new metricset wmi for the windows module. {pull}42017[42017]
368+
- Update beat module with apm-server tail sampling monitoring metrics fields {pull}42569[42569]
369+
- Log every 401 response from Kubernetes API Server {pull}42714[42714]
370+
- Add a new `match_by_parent_instance` option to `perfmon` module. {pull}43002[43002]
371+
- Add a warning log to metricbeat.vsphere in case vSphere connection has been configured as insecure. {pull}43104[43104]
372+
- Changed the Elasticsearch module behavior to only pull settings from non-system indices. {pull}43243[43243]
373+
- Exclude dotted indices from settings pull in Elasticsearch module. {pull}43306[43306]
374+
- Add a `jetstream` metricset to the NATS module {pull}43310[43310]
375+
- Updated Meraki API endpoint for Channel Utilization data. Switched to `GetOrganizationWirelessDevicesChannelUtilizationByDevice`. {pull}43485[43485]
376+
- Upgrade Prometheus Library to v0.300.1. {pull}43540[43540]
377+
- Add GCP Dataproc metadata collector in GCP module. {pull}43518[43518]
378+
- Add new metrics to vSphere Virtual Machine dataset (CPU usage percentage, disk average usage, disk read/write rate, number of disk reads/writes, memory usage percentage). {pull}44205[44205]
379+
- Added checks for the Resty response object in all Meraki module API calls to ensure proper handling of nil responses. {pull}44193[44193]
380+
- Add latency config option to Azure Monitor module. {pull}44366[44366]
381+
- Increase default polling period for MongoDB module from 10s to 60s {pull}44781[44781]
382+
- Upgrade github.com/microsoft/go-mssqldb from v1.7.2 to v1.8.2 {pull}44990[44990]
383+
- Add SSL support for sql module: drivers mysql, postgres, and mssql. {pull}44748[44748]
384+
- Add support for Kafka 4.0 in the Kafka module. {pull}44723[44723]
385+
- Add NTP response validation for system/ntp module. {pull}46184[46184]
386+
- Add vertexai_logs metricset to GCP for prompt response collection from VertexAI service. {pull}46383[46383]
387+
- Add default timegrain to Azure Storage Account metricset. {pull}46786[46786]
367388

368389
*Metricbeat*
369390

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
mapped_pages:
3+
- https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-azure-storage.html
4+
applies_to:
5+
stack: ga
6+
---
7+
8+
% This file is generated! See scripts/docs_collector.py
9+
10+
# Azure storage metricset [metricbeat-metricset-azure-storage]
11+
12+
This is the storage metricset of the module azure.
13+
14+
This metricset allows users to retrieve all metrics from specified storage accounts.
15+
16+
## Metricset-specific configuration notes [_metricset_specific_configuration_notes_11]
17+
18+
`refresh_list_interval`
19+
: Resources will be retrieved at each fetch call (`period` interval), this means a number of Azure REST calls will be executed each time. This will be helpful if the azure users will be adding/removing resources that could match the configuration options so they will not added/removed to the list. To reduce on the number of API calls we are executing to retrieve the resources each time, users can configure this setting and make sure the list or resources will not be refreshed as often. This is also beneficial for performance and rate/ cost reasons ([https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits)).
20+
21+
`resources`
22+
: This will contain all options for identifying resources and configuring the desired metrics
23+
24+
### Config options to identify resources [_config_options_to_identify_resources_11]
25+
26+
`resource_id`
27+
: (*[]string*) The fully qualified ID’s of the resource, including the resource name and resource type. Has the format `/subscriptions/{{guid}}/resourceGroups/{{resource-group-name}}/providers/{{resource-provider-namespace}}/{resource-type}/{{resource-name}}`. Should return a list of resources.
28+
29+
`resource_group`
30+
: (*[]string*) This option will return all storage accounts inside the resource group.
31+
32+
`service_type`
33+
: (*[]string*) This configuration key can be used with any of the 2 options above, for example:
34+
35+
```
36+
resources:
37+
- resource_id: ""
38+
service_type: ["blob", "table"]
39+
- resource_group: ""
40+
service_type: ["queue", "file"]
41+
```
42+
43+
It will filter the metric values to be returned by specific metric namespaces. The supported metrics and namespaces can be found here [https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported#microsoftstoragestorageaccounts](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported#microsoftstoragestorageaccounts). The service type values allowed are `blob`, `table`, `queue`, `file` based on the namespaces `Microsoft.Storage/storageAccounts/blobServices`,`Microsoft.Storage/storageAccounts/tableServices`,`Microsoft.Storage/storageAccounts/fileServices`,`Microsoft.Storage/storageAccounts/queueServices`. If no service_type is specified all values are applied.
44+
45+
Also, if the `resources` option is not specified, then all the storage accounts from the entire subscription will be selected. The primary aggregation value will be retrieved for all the metrics contained in the namespaces. The aggregation options are `avg`, `sum`, `min`, `max`, `total`, `count`.
46+
47+
A default non configurable timegrain of 5 min is set so users are advised to configure an interval of 300s or a multiply of it.
48+
49+
`default_timegrain`:
50+
: (*string*) Sets the default time grain to use when collecting storage account metrics. Defaults to PT5M.
51+
52+
To collect storage account metrics with a PT1M time grain, we recommend using one of the following configurations:
53+
54+
```yaml
55+
# (1) With `period: 60s` and `default_timegrain: "PT1M"`, the metricset
56+
# collects 1 data point every 60s.
57+
- module: azure
58+
metricsets:
59+
- storage
60+
enabled: true
61+
period: 60s
62+
client_id: '${AZURE_CLIENT_ID:""}'
63+
client_secret: '${AZURE_CLIENT_SECRET:""}'
64+
tenant_id: '${AZURE_TENANT_ID:""}'
65+
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
66+
refresh_list_interval: 3600s # 1h
67+
enable_batch_api: true
68+
default_timegrain: "PT1M"
69+
```
70+
71+
```yaml
72+
# (2) With `period: 300s` and `default_timegrain: "PT1M"`, the metricset
73+
# collects 5 data points every 300s (5 minutes) — one for each minute,
74+
# but all data points arrive after 5 minutes
75+
- module: azure
76+
metricsets:
77+
- storage
78+
enabled: true
79+
period: 300s
80+
client_id: '${AZURE_CLIENT_ID:""}'
81+
client_secret: '${AZURE_CLIENT_SECRET:""}'
82+
tenant_id: '${AZURE_TENANT_ID:""}'
83+
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
84+
refresh_list_interval: 3600s # 1h
85+
enable_batch_api: true
86+
default_timegrain: "PT1M"
87+
```
88+
89+
These two configurations trade off scalability and freshness. Configuration (1) prioritizes freshness over scalability, while configuration (2) prioritizes scalability over freshness.
90+
91+
Suggested changes:
92+
93+
- `enable_batch_api: true`: Retrieves metric values for multiple Azure resources in one API call, supporting more storage accounts.
94+
- `refresh_list_interval: 3600s`: Looks for new storage accounts every 60 minutes instead of 10 minutes, helping to avoid or reduce gaps when monitoring many storage accounts.
95+
96+
Note: By setting the collection `period: 1m`, the metricset only has 60s to collect all metric values instead of 300s, so it can handle fewer storage accounts. Keep in mind that the storage accounts metricset collects metrics for five different namespaces (storage account, blob, file, queue, and table).
97+
98+
## Fields [_fields]
99+
100+
For a description of each field in the metricset, see the [exported fields](/reference/metricbeat/exported-fields-azure.md) section.
101+
102+
Here is an example document generated by this metricset:
103+
104+
```json
105+
{
106+
"@timestamp": "2017-10-12T08:05:34.853Z",
107+
"azure": {
108+
"namespace": "Microsoft.Storage/storageAccounts/queueServices",
109+
"resource": {
110+
"group": "obs-infrastructure",
111+
"type": "Microsoft.Storage/storageAccounts"
112+
},
113+
"storage": {
114+
"queue_capacity": {
115+
"avg": 0
116+
},
117+
"queue_count": {
118+
"avg": 0
119+
},
120+
"queue_message_count": {
121+
"avg": 0
122+
}
123+
},
124+
"subscription_id": "fd675b6f-b5e5-426e-ac45-d1f876d0ffa6",
125+
"timegrain": "PT1H"
126+
},
127+
"cloud": {
128+
"instance": {
129+
"id": "/subscriptions/fd675b6f-b5e5-426e-ac45-d1f876d0ffa6/resourceGroups/obs-infrastructure/providers/Microsoft.Storage/storageAccounts/urcbyscmrkbygsawinvm/queueServices/default",
130+
"name": "urcbyscmrkbygsawinvm"
131+
},
132+
"provider": "azure",
133+
"region": "westeurope"
134+
},
135+
"event": {
136+
"dataset": "azure.storage",
137+
"duration": 115000,
138+
"module": "azure"
139+
},
140+
"metricset": {
141+
"name": "storage",
142+
"period": 10000
143+
},
144+
"service": {
145+
"type": "azure"
146+
}
147+
}
148+
```

x-pack/metricbeat/module/azure/azure.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ var supportedMonitorMetricsets = []string{"monitor", "container_registry", "cont
7575
// NewMetricSet will instantiate a new azure metricset
7676
func NewMetricSet(base mb.BaseMetricSet) (*MetricSet, error) {
7777
metricsetName := base.Name()
78-
var config Config
78+
config := createDefaultConfig()
7979
err := base.Module().UnpackConfig(&config)
8080
if err != nil {
8181
return nil, fmt.Errorf("error unpack raw module config using UnpackConfig: %w", err)

x-pack/metricbeat/module/azure/client_utils.go

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,6 @@ import (
1919
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor"
2020
)
2121

22-
// DefaultTimeGrain is set as default timegrain for the azure metrics
23-
const DefaultTimeGrain = "PT5M"
24-
2522
var instanceIdRegex = regexp.MustCompile(`.*?(\d+)$`)
2623

2724
// mapMetricValues should map the metric values

x-pack/metricbeat/module/azure/config.go

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,22 @@ type Config struct {
5151
BillingScopeAccountId string `config:"billing_scope_account_id"` // retrieve usage details from billing account ID scope
5252
// Use BatchApi for metric values collection
5353
EnableBatchApi bool `config:"enable_batch_api"` // defaults to false
54+
// DefaultTimeGrain sets the default time interval when the resource config
55+
// doesn't specify one. If no time grain is configured, this value will be
56+
// used whenever possible.
57+
//
58+
// When the metric definition doesn't support this time grain, we fall back
59+
// to the smallest supported interval.
60+
//
61+
// Note: currently, this is only used for the storage metricset.
62+
DefaultTimeGrain string `config:"default_timegrain"` // defaults to PT5M
63+
}
64+
65+
// createDefaultConfig creates a default config for the metricset.
66+
func createDefaultConfig() Config {
67+
return Config{
68+
DefaultTimeGrain: "PT5M",
69+
}
5470
}
5571

5672
// ResourceConfig contains resource and metric list specific configuration.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
This is the storage metricset of the module azure.
2+
3+
This metricset allows users to retrieve all metrics from specified storage accounts.
4+
5+
## Metricset-specific configuration notes [_metricset_specific_configuration_notes_11]
6+
7+
`refresh_list_interval`
8+
: Resources will be retrieved at each fetch call (`period` interval), this means a number of Azure REST calls will be executed each time. This will be helpful if the azure users will be adding/removing resources that could match the configuration options so they will not added/removed to the list. To reduce on the number of API calls we are executing to retrieve the resources each time, users can configure this setting and make sure the list or resources will not be refreshed as often. This is also beneficial for performance and rate/ cost reasons ([https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits)).
9+
10+
`resources`
11+
: This will contain all options for identifying resources and configuring the desired metrics
12+
13+
### Config options to identify resources [_config_options_to_identify_resources_11]
14+
15+
`resource_id`
16+
: (*[]string*) The fully qualified ID’s of the resource, including the resource name and resource type. Has the format `/subscriptions/{{guid}}/resourceGroups/{{resource-group-name}}/providers/{{resource-provider-namespace}}/{resource-type}/{{resource-name}}`. Should return a list of resources.
17+
18+
`resource_group`
19+
: (*[]string*) This option will return all storage accounts inside the resource group.
20+
21+
`service_type`
22+
: (*[]string*) This configuration key can be used with any of the 2 options above, for example:
23+
24+
```
25+
resources:
26+
- resource_id: ""
27+
service_type: ["blob", "table"]
28+
- resource_group: ""
29+
service_type: ["queue", "file"]
30+
```
31+
32+
It will filter the metric values to be returned by specific metric namespaces. The supported metrics and namespaces can be found here [https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported#microsoftstoragestorageaccounts](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported#microsoftstoragestorageaccounts). The service type values allowed are `blob`, `table`, `queue`, `file` based on the namespaces `Microsoft.Storage/storageAccounts/blobServices`,`Microsoft.Storage/storageAccounts/tableServices`,`Microsoft.Storage/storageAccounts/fileServices`,`Microsoft.Storage/storageAccounts/queueServices`. If no service_type is specified all values are applied.
33+
34+
Also, if the `resources` option is not specified, then all the storage accounts from the entire subscription will be selected. The primary aggregation value will be retrieved for all the metrics contained in the namespaces. The aggregation options are `avg`, `sum`, `min`, `max`, `total`, `count`.
35+
36+
A default non configurable timegrain of 5 min is set so users are advised to configure an interval of 300s or a multiply of it.
37+
38+
`default_timegrain`:
39+
: (*string*) Sets the default time grain to use when collecting storage account metrics. Defaults to PT5M.
40+
41+
To collect storage account metrics with a PT1M time grain, we recommend using one of the following configurations:
42+
43+
```yaml
44+
# (1) With `period: 60s` and `default_timegrain: "PT1M"`, the metricset
45+
# collects 1 data point every 60s.
46+
- module: azure
47+
metricsets:
48+
- storage
49+
enabled: true
50+
period: 60s
51+
client_id: '${AZURE_CLIENT_ID:""}'
52+
client_secret: '${AZURE_CLIENT_SECRET:""}'
53+
tenant_id: '${AZURE_TENANT_ID:""}'
54+
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
55+
refresh_list_interval: 3600s # 1h
56+
enable_batch_api: true
57+
default_timegrain: "PT1M"
58+
```
59+
60+
```yaml
61+
# (2) With `period: 300s` and `default_timegrain: "PT1M"`, the metricset
62+
# collects 5 data points every 300s (5 minutes) — one for each minute,
63+
# but all data points arrive after 5 minutes
64+
- module: azure
65+
metricsets:
66+
- storage
67+
enabled: true
68+
period: 300s
69+
client_id: '${AZURE_CLIENT_ID:""}'
70+
client_secret: '${AZURE_CLIENT_SECRET:""}'
71+
tenant_id: '${AZURE_TENANT_ID:""}'
72+
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
73+
refresh_list_interval: 3600s # 1h
74+
enable_batch_api: true
75+
default_timegrain: "PT1M"
76+
```
77+
78+
These two configurations trade off scalability and freshness. Configuration (1) prioritizes freshness over scalability, while configuration (2) prioritizes scalability over freshness.
79+
80+
Suggested changes:
81+
82+
- `enable_batch_api: true`: Retrieves metric values for multiple Azure resources in one API call, supporting more storage accounts.
83+
- `refresh_list_interval: 3600s`: Looks for new storage accounts every 60 minutes instead of 10 minutes, helping to avoid or reduce gaps when monitoring many storage accounts.
84+
85+
Note: By setting the collection `period: 1m`, the metricset only has 60s to collect all metric values instead of 300s, so it can handle fewer storage accounts. Keep in mind that the storage accounts metricset collects metrics for five different namespaces (storage account, blob, file, queue, and table).

x-pack/metricbeat/module/azure/storage/client_helper.go

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ func mapMetrics(client *azure.Client, resources []*armresources.GenericResourceE
5858
}
5959

6060
// some metrics do not support the default PT5M timegrain so they will have to be grouped in a different API call, else call will fail
61-
groupedMetrics := groupOnTimeGrain(filteredMetricDefinitions)
61+
groupedMetrics := groupOnTimeGrain(filteredMetricDefinitions, client.Config.DefaultTimeGrain)
6262

6363
for time, groupedMetricList := range groupedMetrics {
6464
// metrics will have to be grouped by allowed dimensions
@@ -80,11 +80,11 @@ func mapMetrics(client *azure.Client, resources []*armresources.GenericResourceE
8080
}
8181

8282
// groupOnTimeGrain - some metrics do not support the default timegrain value so the closest supported timegrain will be selected
83-
func groupOnTimeGrain(list []armmonitor.MetricDefinition) map[string][]armmonitor.MetricDefinition {
83+
func groupOnTimeGrain(list []armmonitor.MetricDefinition, defaultTimeGrain string) map[string][]armmonitor.MetricDefinition {
8484
var groupedList = make(map[string][]armmonitor.MetricDefinition)
8585

8686
for _, metric := range list {
87-
timegrain := retrieveSupportedMetricAvailability(metric.MetricAvailabilities)
87+
timegrain := retrieveSupportedMetricAvailability(metric.MetricAvailabilities, defaultTimeGrain)
8888
if _, ok := groupedList[timegrain]; !ok {
8989
groupedList[timegrain] = make([]armmonitor.MetricDefinition, 0)
9090
}
@@ -94,21 +94,17 @@ func groupOnTimeGrain(list []armmonitor.MetricDefinition) map[string][]armmonito
9494
}
9595

9696
// retrieveSupportedMetricAvailability func will return the default timegrain if supported, else will return the next timegrain
97-
func retrieveSupportedMetricAvailability(availabilities []*armmonitor.MetricAvailability) string {
97+
func retrieveSupportedMetricAvailability(availabilities []*armmonitor.MetricAvailability, defaultTimeGrain string) string {
9898
// common case in metrics supported by storage account - one availability
9999
if len(availabilities) == 1 {
100100
return *availabilities[0].TimeGrain
101101
}
102102
// check if the default timegrain is supported
103103
for _, availability := range availabilities {
104-
if *availability.TimeGrain == azure.DefaultTimeGrain {
105-
return azure.DefaultTimeGrain
104+
if *availability.TimeGrain == defaultTimeGrain {
105+
return defaultTimeGrain
106106
}
107107
}
108-
// select first timegrain, should be bigger than the min timegrain of 1M, timegrains are returned in asc order
109-
if *availabilities[0].TimeGrain != "PT1M" {
110-
return *availabilities[0].TimeGrain
111-
}
112108
return *availabilities[1].TimeGrain
113109
}
114110

x-pack/metricbeat/module/azure/storage/client_helper_concurrent.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ func getStorageMappedResourceDefinitions(client *azure.BatchClient, resourceId s
6767
filteredMetricDefinitions = append(filteredMetricDefinitions, *metricDefinition)
6868
}
6969
// some metrics do not support the default PT5M timegrain so they will have to be grouped in a different API call, else call will fail
70-
groupedMetrics := groupOnTimeGrain(filteredMetricDefinitions)
70+
groupedMetrics := groupOnTimeGrain(filteredMetricDefinitions, client.Config.DefaultTimeGrain)
7171
for time, groupedMetricList := range groupedMetrics {
7272
// metrics will have to be grouped by allowed dimensions
7373
dimMetrics := groupMetricsByAllowedDimensions(groupedMetricList)

x-pack/metricbeat/module/azure/storage/client_helper_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ func TestFilterOnTimeGrain(t *testing.T) {
166166
{MetricAvailabilities: availability2},
167167
{MetricAvailabilities: availability3},
168168
}
169-
response := groupOnTimeGrain(list)
169+
response := groupOnTimeGrain(list, "PT5M")
170170
assert.Equal(t, len(response), 2)
171171
result := [][]armmonitor.MetricDefinition{
172172
{
@@ -184,11 +184,11 @@ func TestFilterOnTimeGrain(t *testing.T) {
184184
}
185185

186186
func TestRetrieveSupportedMetricAvailability(t *testing.T) {
187-
response := retrieveSupportedMetricAvailability(availability1)
187+
response := retrieveSupportedMetricAvailability(availability1, "PT5M")
188188
assert.Equal(t, response, time2)
189-
response = retrieveSupportedMetricAvailability(availability2)
189+
response = retrieveSupportedMetricAvailability(availability2, "PT5M")
190190
assert.Equal(t, response, time3)
191-
response = retrieveSupportedMetricAvailability(availability3)
191+
response = retrieveSupportedMetricAvailability(availability3, "PT5M")
192192
assert.Equal(t, response, time3)
193193
}
194194

0 commit comments

Comments
 (0)