Skip to content

Commit

Permalink
Merge branch 'develop' into oma-cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
jeff-colucci committed Apr 26, 2024
2 parents 849a957 + a29a8fd commit a2b9884
Show file tree
Hide file tree
Showing 36 changed files with 260 additions and 128 deletions.
14 changes: 7 additions & 7 deletions src/content/docs/ai-monitoring/view-ai-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ If you own several apps with various implementations of different AI frameworks,
### Track total responses, average response time, and token usage

<img
title="AI responses response billboard and timeseries graphs"
alt="A cropped screenshot displaying the timeseries graphs and billboard info about AI data"
title="AI responses response billboard and time series graphs"
alt="A cropped screenshot displaying the time series graphs and billboard info about AI data"
src={aiTimeseriesBillboard}
/>

Expand All @@ -65,17 +65,17 @@ The three tiles show general performance metrics about your AI's responses. Thes
* If you notice a drop in total responses or an increase in average response time, it can indicate that some technology in your AI toolchain has prevented your AI-powered app from posting a response.
* A drop or increase in average token usage per response can give you insight into how your model creates a response. Maybe it's pulling too much context, thus driving up token cost while generating its response. Maybe its responses are too spare, leading to lower token costs and unhelpful responses.

### Adjust the timeseries graphs
### Adjust the time series graphs

<img
title="AI timeseries graphs"
alt="A cropped screenshot displaying timeseries info about AI data"
title="AI time series graphs"
alt="A cropped screenshot displaying time series info about AI data"
src={aiCroppedImageofAItimeseries}
/>

You can refer to the time series graphs to better visualize when an anamolous behavior first appears.

* Adjust the timeseries graph by dragging over a spike or drop. This scopes the timeseries to a certain time window.
* Adjust the time series graph by dragging over a spike or drop. This scopes the time series to a certain time window.
* Select the drop down to run comparative analysis for different performance parameters. You can choose between total responses, average response time, or average tokens per response.
* If you've enabled the [feedback feature](/docs/ai-monitoring/customize-agent-ai-monitoring), you can scope the graphs to analyze responses by positive and negative feedback.

Expand Down Expand Up @@ -146,7 +146,7 @@ Selecting an AI entity takes you to the APM summary page for that app. From the

Selecting an AI entity takes you to the APM summary page. To find your AI data, choose <DoNotTranslate>**AI responses**</DoNotTranslate> in the left nav. We recommend using this page when you've identified that a particular AI entity has contributed to anomalies.

* The APM version of AI responses contains the same tiles, timeseries graphs, and response tables collected as the top-level AI responses page.
* The APM version of AI responses contains the same tiles, time series graphs, and response tables collected as the top-level AI responses page.
* Instead of showing aggregated data, the APM AI responses page shows data scoped to the service you selected from AI entities.
* While the top-level AI responses page lets you filter by service across all AI entities, the APM AI responses page limits filter functionality to the app's own attributes.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ The table below provides descriptions for all of the global decisions that are a
</td>

<td>
Correlation is activated because the New Relic [condition IDs](/docs/new-relic-solutions/get-started/glossary/#condition_id) and deep link url are the same. Deep link url provides timeseries and time range information in addition to [alert condition](/docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/create-alert-conditions/). Correlating these issues make it easier for you to look at related incidents in the incident response flow with time-scoped metrics, and perform deep analysis. Deep link url can be automatically generated if incidents are triggered by New Relic alert conditions, while for REST source [deepLinkUrl](/docs/data-apis/ingest-apis/event-api/incident-event-rest-api/#api-specs) should be user defined.
Correlation is activated because the New Relic [condition IDs](/docs/new-relic-solutions/get-started/glossary/#condition_id) and deep link url are the same. Deep link url provides time series and time range information in addition to [alert condition](/docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/create-alert-conditions/). Correlating these issues make it easier for you to look at related incidents in the incident response flow with time-scoped metrics, and perform deep analysis. Deep link url can be automatically generated if incidents are triggered by New Relic alert conditions, while for REST source [deepLinkUrl](/docs/data-apis/ingest-apis/event-api/incident-event-rest-api/#api-specs) should be user defined.
</td>
</tr>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ For how to create a dashboard with data from multiple accounts, see [the NerdGra

## Create embeddable charts [#embeddable-charts]

In addition to returning raw data, you can fetch embeddable chart links for the data to use in an application. For example, instead of a single count of [transaction](/docs/insights/insights-data-sources/default-data/apm-default-event-attributes-insights#transaction-event), you can create a [chart](/docs/insights/use-insights-ui/manage-dashboards/chart-types#widget-types) that illustrates a timeseries of bucketed counts over time. Add `TIMESERIES` to your query with `embeddedChartUrl`:
In addition to returning raw data, you can fetch embeddable chart links for the data to use in an application. For example, instead of a single count of [transaction](/docs/insights/insights-data-sources/default-data/apm-default-event-attributes-insights#transaction-event), you can create a [chart](/docs/insights/use-insights-ui/manage-dashboards/chart-types#widget-types) that illustrates a time series of bucketed counts over time. Add `TIMESERIES` to your query with `embeddedChartUrl`:

```graphql
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@ Here are some sample queries for the event data to help you get started.
<CollapserGroup>
<Collapser
id="percentile-time"
title="Percentile over timeseries"
title="Percentile over time series"
>
Show the 95th percentile of first paint and first contentful paint over a time series:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,19 +60,19 @@ The following default limits apply for all Metric data:

<tr>
<td>
Max unique timeseries (cardinality) per account per day
Max unique time series (cardinality) per account per day
</td>

<td>
1-15 million [(learn more)](#additional-considerations)

A timeseries is a single, unique combination of a metric name and any attributes.
A time series is a single, unique combination of a metric name and any attributes.
</td>
</tr>

<tr>
<td>
Max unique timeseries (cardinality) per metric name per day
Max unique time series (cardinality) per metric name per day
</td>

<td>
Expand Down Expand Up @@ -232,13 +232,13 @@ The following default limits apply only to data collected via the Prometheus Rem
<tbody>
<tr>
<td>
Max unique Count and Summary timeseries (cardinality) per account per 5 minute interval
Max unique Count and Summary time series (cardinality) per account per 5 minute interval
</td>

<td>
1-15 million [(learn more)](#additional-considerations)

A timeseries is a single, unique combination of a metric name and any attributes. Timeseries received above this limit are dropped. This limit is enforced prior to and in addition to standard metric limits.
A time series is a single, unique combination of a metric name and any attributes. Time series received above this limit are dropped. This limit is enforced prior to and in addition to standard metric limits.
</td>
</tr>
</tbody>
Expand Down Expand Up @@ -268,22 +268,22 @@ This section describes how the Metric API behaves when you exceed the rate limit

<Collapser
id="incident-unique-timeseries"
title="Max unique timeseries per account per day"
title="Max unique time series per account per day"
>
A timeseries is a single, unique combination of a metric name and any attributes assigned to that metric. For example, if a `CPU utilization` metric with a single attribute `hostname` is sent from ten different hosts, this equals ten distinct values for the `hostname` attribute and ten unique metric timeseries.
A time series is a single, unique combination of a metric name and any attributes assigned to that metric. For example, if a `CPU utilization` metric with a single attribute `hostname` is sent from ten different hosts, this equals ten distinct values for the `hostname` attribute and ten unique metric time series.

If the per-account, per-day unique metric timeseries (cardinality) limit is exceeded during a 24 hour period, the endpoint will continue to receive and store raw metric data. However, New Relic will stop creating additional aggregate rollups (1 minute, 5 minutes, etc.) for the remainder of the 24 hour period. (These rollups are used used by default to query time windows longer than 60 minutes.)
If the per-account, per-day unique metric time series (cardinality) limit is exceeded during a 24 hour period, the endpoint will continue to receive and store raw metric data. However, New Relic will stop creating additional aggregate rollups (1 minute, 5 minutes, etc.) for the remainder of the 24 hour period. (These rollups are used used by default to query time windows longer than 60 minutes.)

You can continue to query your data when such an incident occurs by specifying a 60 minute or shorter time window or specifying the RAW keyword (for more on that, see [High cardinality metrics](/docs/data-apis/ingest-apis/metric-api/NRQL-high-cardinality-metrics)). This can be helpful in identifying potential causes for the incident.
</Collapser>

<Collapser
id="incident-unique-timeseries"
title="Max unique timeseries per metric name per day"
title="Max unique time series per metric name per day"
>
A timeseries is a single, unique combination of a metric name and any attributes assigned to that metric. For example, if a `CPU utilization` metric with a single attribute `hostname` is sent from ten different hosts, this equals ten distinct values for the `hostname` attribute and ten unique metric timeseries.
A time series is a single, unique combination of a metric name and any attributes assigned to that metric. For example, if a `CPU utilization` metric with a single attribute `hostname` is sent from ten different hosts, this equals ten distinct values for the `hostname` attribute and ten unique metric time series.

If the per-metric name, per-day unique metric timeseries (cardinality) limit is exceeded during a 24 hour period, the endpoint will continue to receive and store raw metric data. However, New Relic will stop creating additional aggregate rollups (1 minute, 5 minutes, etc.) for the remainder of the 24 hour period. (These rollups are used used by default to query time windows longer than 60 minutes.)
If the per-metric name, per-day unique metric time series (cardinality) limit is exceeded during a 24 hour period, the endpoint will continue to receive and store raw metric data. However, New Relic will stop creating additional aggregate rollups (1 minute, 5 minutes, etc.) for the remainder of the 24 hour period. (These rollups are used used by default to query time windows longer than 60 minutes.)

You can continue to query your data when such an incident occurs by specifying a 60 minute or shorter time window or specifying the RAW keyword (for more on that, see [High cardinality metrics](/docs/data-apis/ingest-apis/metric-api/NRQL-high-cardinality-metrics)). This can be helpful in identifying potential causes for the incident.
</Collapser>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,9 @@ The `category` indicates the type of error and the `message` provides more detai
</td>

<td>
You have Prometheus servers reporting too many unique timeseries via [New Relic's remote_write endpoint](/docs/integrations/prometheus-integrations/get-started/monitor-prometheus-new-relic#remote-write).
You have Prometheus servers reporting too many unique time series via [New Relic's remote_write endpoint](/docs/integrations/prometheus-integrations/get-started/monitor-prometheus-new-relic#remote-write).

Reduce the number of unique timeseries reported by modifying your [Prometheus server configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) to reduce the number of targets being scraped, or by using [relabel rules](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) in the [remote_write section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) of your server configuration to drop timeseries or highly unique labels.
Reduce the number of unique time series reported by modifying your [Prometheus server configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) to reduce the number of targets being scraped, or by using [relabel rules](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) in the [remote_write section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) of your server configuration to drop time series or highly unique labels.
</td>
</tr>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Limits are enforced per account (not per [organization](/docs/glossary/glossary/

* We place a limit on the number of ingested requests per minute (RPM) per data type. When this limit is reached, we stop accepting data and return a 429 status code for the duration of the minute.
* For queries, we place limits on the number of queries per minute and the number of records inspected (see [NRQL query rate limits](/docs/query-your-data/nrql-new-relic-query-language/get-started/rate-limits-nrql-queries)).
* For metrics, we place a limit on the number of unique timeseries (cardinality) per account and per metric. When this limit is reached, aggregated data is turned off for the rest of the UTC day.
* For metrics, we place a limit on the number of unique time series (cardinality) per account and per metric. When this limit is reached, aggregated data is turned off for the rest of the UTC day.

For every major limit incident, New Relic creates an [`NrIntegrationError` event](/docs/telemetry-data-platform/manage-data/nrintegrationerror) for that account, which has these limit-related attributes:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,17 @@ At a high level, delta conversion is performed by taking two data points sequent

### Resets [#resets]

If data for a [timeseries](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#timeseries-model) suddenly decreases in value, we treat this as a reset and will emit that new measurement as its own delta value (in other words, as if it were preceded by a `0` measurement).
If data for a [time series](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#timeseries-model) suddenly decreases in value, we treat this as a reset and will emit that new measurement as its own delta value (in other words, as if it were preceded by a `0` measurement).

[OpenTelemetry also defines](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#resets-and-gaps) situations where a decrease in value is unexpected, and we do our best to detect these cases and notify you via [New Relic integration errors](/docs/data-apis/manage-data/nrintegrationerror) (see [troubleshooting](#troubleshooting) below).

### Reordering data [#reorder-data]

We understand that many things can cause data points to arrive at New Relic out of order. As such, we will buffer data points and reorder them if we detect an unexpected gap in the reporting timeseries. Gaps are inferred by an expected reporting interval determined by the rate at which we receive data for a given timeseries. Buffering is bounded and eventually we will consider a data point "too late for resequencing". In this case, a delta is computed across the detected gap and processing of the timeseries continues.
We understand that many things can cause data points to arrive at New Relic out of order. As such, we will buffer data points and reorder them if we detect an unexpected gap in the reporting time series. Gaps are inferred by an expected reporting interval determined by the rate at which we receive data for a given time series. Buffering is bounded and eventually we will consider a data point "too late for resequencing". In this case, a delta is computed across the detected gap and processing of the time series continues.

### Stale data [#stale-data]

As delta conversion is a stateful operation, we must be cognizant of timeseries that may stop reporting and eventually drop its state. If a timeseries has not reported any new data points for <DoNotTranslate>**5 minutes**</DoNotTranslate>, we will flush the state we have, including computing deltas across any buffered gaps. This means that if a data point arrives at a later point in time, it will be treated as if it were the beginning of that timeseries, effectively losing the delta between the last data point before the flush and the first data point after the flush. This means that metric reporting intervals should be less than <DoNotTranslate>**5 minutes**</DoNotTranslate> to get the benefit of delta conversion.
As delta conversion is a stateful operation, we must be cognizant of time series that may stop reporting and eventually drop its state. If a time series has not reported any new data points for <DoNotTranslate>**5 minutes**</DoNotTranslate>, we will flush the state we have, including computing deltas across any buffered gaps. This means that if a data point arrives at a later point in time, it will be treated as if it were the beginning of that time series, effectively losing the delta between the last data point before the flush and the first data point after the flush. This means that metric reporting intervals should be less than <DoNotTranslate>**5 minutes**</DoNotTranslate> to get the benefit of delta conversion.

### Special note about cumulative sums [#cumulative-sums]

Expand All @@ -73,7 +73,7 @@ The OpenTelemetry SDK allows you to [configure its cardinality limits](https://o

### Cardinality limits [#card-limits]

During translation, we also loosely enforce metric cardinality limits that are based on your metric entitlements as a system protection. Rather than enforcing the limit [per day, as we do with rollups](/docs/data-apis/ingest-apis/metric-api/metric-api-limits-restricted-attributes/#incident-unique-timeseries), this limit is enforced as the number of concurrent timeseries being tracked. Once there are too many concurrent [unique metric timeseries](/docs/data-apis/ingest-apis/metric-api/NRQL-high-cardinality-metrics/#what-why), we will drop new incoming timeseries until an old one ages out (see [Stale data](#stale-data)).
During translation, we also loosely enforce metric cardinality limits that are based on your metric entitlements as a system protection. Rather than enforcing the limit [per day, as we do with rollups](/docs/data-apis/ingest-apis/metric-api/metric-api-limits-restricted-attributes/#incident-unique-timeseries), this limit is enforced as the number of concurrent time series being tracked. Once there are too many concurrent [unique metric time series](/docs/data-apis/ingest-apis/metric-api/NRQL-high-cardinality-metrics/#what-why), we will drop new incoming time series until an old one ages out (see [Stale data](#stale-data)).

### Cumulative metric resets [#cumulative-resets]

Expand Down

0 comments on commit a2b9884

Please sign in to comment.