Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Make telemetry parameters configurable in application parameters #292

Closed
negberts opened this issue Jan 22, 2024 · 15 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@negberts
Copy link

negberts commented Jan 22, 2024

Is your feature request related to a problem? Please describe.
Telemetry parameters like TelemetryProvider and LogAnalyticsWorkspaceId are currently not available in de application manifest XML and can therefore not be overridden when the application is deployed.

Describe the solution you'd like
Make the telemetry parameters available in ApplicationManifest.xml

@GitTorre
Copy link
Member

This will ship in 3.2.14. This is a breaking change for existing Telemetry settings that exist in Settings.xml today. Those settings will need to be overridden in ApplicationManifest.xml, so that related App parameters will host the setting values, specified in their DefaultValue attributes. Both Settings.xml and ApplicationManifest.xml will be commented to reflect this to make sure it is understood.

@GitTorre GitTorre added this to the 3.2.14 milestone Jan 24, 2024
@negberts
Copy link
Author

@GitTorre thanks. Shouldn't the ClusterObserverApp application manifest XML be updated as well? In your commit I see only changes in the FabricObserverApp

@GitTorre
Copy link
Member

Yes. We will update CO as well. The feature is still in development and testing. Thanks for the reminder.

@GitTorre
Copy link
Member

@negberts CO support pushed to develop. Please give it a try (CO and FO) if you have time. Ensure it works for your scenarios (it should).

@negberts
Copy link
Author

@GitTorre looks good to me! :-)

@negberts
Copy link
Author

negberts commented Feb 1, 2024

@GitTorre Thank you for implementing this. Are the assets for cluster observer missing in the release?

@GitTorre
Copy link
Member

GitTorre commented Feb 1, 2024

Yes. Will correct. Sorry about that.

@GitTorre
Copy link
Member

GitTorre commented Feb 1, 2024

@negberts CO SFpkgs uploaded to Release.

@negberts
Copy link
Author

negberts commented Feb 9, 2024

@GitTorre some observations:

In Application Insights all health events are logged with state Invalid:

image

Probably because Invalid is 0 and Ok = 1 in the enum... I should expect them to be logged as Ok.

And it would be nice if we had some loglevel setting that only warnings and errors would be sent to application insights / log analytics so we can create alerts.

@GitTorre
Copy link
Member

GitTorre commented Feb 9, 2024

These are raw metric data events. HealthState of the related entity is not considered. If the metric value meets or exceeds a specified threshold, then you will see Warning or Error or Ok events in the case where the usage has gone below the threshold in a subsequent run in a different event. This is by-design behavior.

If you don't want raw telemetry, then, yes, a new setting would need to be added to only emit telemetry that carries HealthState, so only when an entity is in Warning, Error or has recovered. That would mean you wouldn't be able to query for usage over time, which is useful to do. However, if the concern is telemetry volume, a valid concern, then adding such a pre-filter is useful. You would need to be comfortable only getting Warning/Error and recovery (as Ok) telemetry. Nothing prevents you from creating alerts today as unhealthy events are emitted in addition to raw telemetry.

@negberts
Copy link
Author

Aren't the events also stored in Azure Storage? Because yes, the concern is the telemetry volume... So than we can say that the raw telemetry is still in Azure Storage and the unhealthy events are only in Application Insights/Log Analytics? Would be a nice feature for me...

@GitTorre
Copy link
Member

GitTorre commented Feb 12, 2024

These events are not also present in storage account. These are specifically LogAnalytics or ApplicationInsights telemetry events, depending upon your configuration.

The idea is that each observer will emit the raw metric data per run, per metric. If the data reaches a threshold, then the warning telemetry will also come in.

There are two types of events that you will see in AppInsights:

FabricObserver.EntityHealthData
FabricObserver.EntityMetricData

EntityMetricData events do not carry HealthState information (thus, Invalid). If you only care about health states, then you should create alerts on EntityHealthData events as these will be ones that carry Warning/Error/Ok (recovery) information.

Please create a Work Item (Issue) and we will then look into this request: Disable Entity Raw Metric telemetry (this will only apply to observers that monitor metrics (so, 5 observers: AppObs, ContainerObs, DiskObs, FabricSystemObs, NodeObs). The concern about telemetry volume is certainly legitimate, especially for AppObs, depending on the number of service instances (processes) and metrics it's configured to monitor and report on. However, nothing prevents you from creating HealthState based alerts today by querying for events named "FabricObserver.EntityHealthData".

// Health data
customEvents
| where name has "FabricObserver.EntityHealthData"
| order by timestamp desc 

// Raw metric data
customEvents
| where name has "FabricObserver.EntityMetricData"
| order by timestamp desc 

And, if you also deploy ClusterObserver, you can just disable Telemetry for FO, as CO will emit unhealthy events created by FO for any observed entity and metric, by default when you enable Telemetry for CO. You then alert on CO Telemetry events (named ClusterObserver.EntityHealthData). These will carry the serialized TelemetryData instance created by FO as part of the HealthEvent it generated (so, all pertinent information will be carried in the CO event).

@GitTorre
Copy link
Member

@negberts This feature is available in develop branch if you want to check it out. You can disable metric telemetry data for 5 observers individually in ApplicationManifest.

@negberts
Copy link
Author

@GitTorre wow thanks :-) Next week on holidays but will check it out after that!

@negberts
Copy link
Author

@GitTorre took me some time, but looks good to me :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants