Skip to content

Latest commit

 

History

History
380 lines (292 loc) · 18.3 KB

drop-data-using-nerdgraph.mdx

File metadata and controls

380 lines (292 loc) · 18.3 KB
title tags metaDescription redirects freshnessValidatedDate
Drop data using NerdGraph
Drop rules
Data ingest cost
With the New Relic NerdGraph API, you can drop data that meeets certain criteria and have it not count towards ingest and billing.
/docs/telemetry-data-platform/manage-data/drop-data-using-nerdgraph
/docs/drop-events-attributes
/docs/drop-events-nerdgraph
/docs/drop-data-using-nerdgraph
/telemetry-data-platform/get-started/manage-data/drop-data-using-nerdgraph
/docs/accounts/accounts/data-management/drop-data-using-nerdgraph
never

One way to manage your data ingest is to set up data dropping rules. With data dropping you can:

  • Filter out unimportant low-value data
  • Filter out potentially sensitive data

Overview [#overview]

With data dropping rules, you can specify which types of data you don't want saved to your New Relic organization.

Dropped data does not count towards your data ingest and so is not billable. To learn more about what data counts as billable or not, see Data ingest.

Drop rules only apply to data that arrives from the moment you create the rule. They don't delete data that's already been ingested.

Learn more about dropping data in this video (7:09 minutes):

Besides creating drop-data rules, other ways to minimize unwanted data include:

Requirements [#requirements]

The ability to create and edit drop filter rules is linked to the NRQL drop rules capability.

The following data types can be targeted for data dropping:

  • APM-reported events
  • Browser-reported events
  • Mobile-reported events
  • Synthetics-reported events
  • Custom events (like those generated by the APM agent APIs or the Event API)
  • Log data (you can also use the UI to drop data)
  • Distributed tracing spans
  • Default infrastructure monitoring events and infrastructure integrations events. Some caveats:
    • When you drop this data, the raw data is dropped, but the aggregated SystemSample, ProcessSample, NetworkSample and StorageSample events are still available (for more on this, see Data retention). Though still available, this data doesn't count towards ingest and is not billable.
    • Raw infrastructure data is used for alerting, so if you drop that data, you can't alert on it. Because the aggregated data is still available, you may still see that data in charts with time ranges above 59 minutes.
  • Dimensional metrics (the Metric data type). Some caveats:
    • For organizations on our original pricing model: billing is based on product subscription, meaning dropped dimensional metrics remain billable.
    • For metrics generated by the events-to-metrics service: drop rules won't work but these metrics can be stopped or attributes pruned by disabling or re-configuring the events-to-metric rule.

Create a drop data rule [#how-to]

Use caution when deciding to drop data. The data you drop can't be recovered. For more details on potential issues, see [Caution notes](#caution).

To drop data, create a NerdGraph-format drop rule that includes:

  • A NRQL string that specifies what data types to drop
  • An action type specifying how to apply the NRQL string

You can form and make the call in the NerdGraph API explorer: one.newrelic.com > Apps > NerdGraph API explorer.

The limit on nrql query length is 4096 characters. If it exceeds the length the nerdGraph will throw an error INVALID_NRQL_TOO_LONG.

There are two ways to drop data:

  • Drop entire data types or a data subset (with optional filter). This uses the DROP_DATA action type and uses NRQL of the form:

    SELECT * FROM DATA_TYPE_1, DATA_TYPE_2 (WHERE OPTIONAL_FILTER)

    For this type of drop rule, you cannot use anything other than * in the SELECT clause.

  • Drop attributes from data types (with optional filter). This uses the DROP_ATTRIBUTES action type and uses NRQL of the form:

    SELECT dropAttr1, dropAttr2 FROM DATA_TYPE (WHERE OPTIONAL_FILTER)

    For this type of drop rule, you must pass in a non-empty list of raw attributes names.

NRQL restrictions [#restrictions]

Not all NRQL clauses make sense for generating drop rules. You can provide a WHERE clause to select data with specific attributes. Other features such as LIMIT, TIMESERIES, COMPARE WITH, FACET, and other clauses cannot be used.

SINCE and UNTIL are not supported in drop rules. If you have time-specific rules (say, drop everything until a time in the future), use WHERE timestamp < (epoch milliseconds in the future). You also can't use SINCE to drop historical data: NRQL drop rules only apply to data reported after the drop rule was created. If you need to delete data that has already been reported, contact your New Relic representative.

JOIN and subqueries are also not supported. Drop rules are applied to each data point independently, and other data cannot be queried to determine whether a drop rule should be applied.

The two action types have these restrictions:

  • DROP_DATA can use only SELECT *.
  • DROP_ATTRIBUTES requires use of SELECT with "raw" attributes (attributes with no aggregator function applied). This also means you cannot use SELECT *. Additionally, there are some attributes that are integral to their data type and cannot be dropped (such as timestamp on event data). If you include them, registration will fail.

Example drop rules [#example-rules]

Here are some example drop rules:

Let's say you notice you have some event types being sent to New Relic that are not important to you. Also, stopping the source from sending those event types quickly is unrealistic, requiring changes to agents and/or API instrumentation. Using a drop rule is an easier way to accomplish the same goal.
Here is an example NerdGraph call that drops two event types: `Event1` and `Event2`.

```graphql
mutation {
    nrqlDropRulesCreate(accountId: YOUR_ACCOUNT_ID, rules: [
        {
            action: DROP_DATA
            nrql: "SELECT * FROM Event1, Event2"
            description: "Drops all data for Event1 and Event2."
        }
    ])
    {
        successes { id }
        failures {
            submitted { nrql }
            error { reason description }
        }
    }
}
```

<Collapser id="drop-specific-events" title="Drop events meeting certain criteria"

Let’s say you have a high volume custom event type that arrives from multiple sources. If you don't find all of that data important, you can use a drop rule. Here is an example of a drop rule that filters out events based on specific criteria.

```graphql
mutation {
    nrqlDropRulesCreate(accountId: YOUR_ACCOUNT_ID, rules: [
        {
            action: DROP_DATA
            nrql: "SELECT * FROM MyCustomEvent WHERE appName='LoadGeneratingApp' AND environment='development'"
            description: "Drops all data for MyCustomEvent that comes from the LoadGeneratingApp in the dev environment, because there is too much and we don’t look at it."
        }
    ])
    {
        successes { id }
        failures {
            submitted { nrql }
            error { reason description }
        }
    }
}
```

<Collapser id="drop-sensitive-data" title="Drop sensitive attributes while maintaining the rest of the data"

Let's say you noticed an event has attributes that contain Personally Identifiable Information (PII). You are working to update your services to stop sending the data, but until then, you need to cease storing further PII in New Relic. Although you could drop all of the data as it comes in the door with a `DROP_DATA` rule, the rest of the data still provides value. Therefore, you can register a drop rule to remove only the offending PII from your data:

```graphql
mutation {
    nrqlDropRulesCreate(accountId: YOUR_ACCOUNT_ID, rules: [
        {
            action: DROP_ATTRIBUTES
            nrql: "SELECT userEmail, userName FROM MyCustomEvent"
            description: "Removes the user name and email fields from MyCustomEvent"
        }
    ])
    {
        successes { id }
        failures {
            submitted { nrql }
            error { reason description }
        }
    }
}
```

Verify your drop rule works [#verify]

After you create a drop rule, verify that it is working as expected. The rule should take effect quickly after a successful registration, so try running a TIMESERIES version of the query you registered to see that the data drops off.

  <th>
    NRQL
  </th>
</tr>
  <td>
    <DNT>**Drop rule NRQL:**</DNT>

    ```sql
    SELECT * FROM MyEvent WHERE foo = bar
    ```

    <DNT>**Validation NRQL:**</DNT>

    ```sql
    SELECT count(*) FROM MyEvent WHERE foo = bar TIMESERIES
    ```

    This should drop to 0. To verify that it did not affect any thing else, invert the `WHERE` clause.
  </td>
</tr>

<tr>
  <td>
    `DROP_ATTRIBUTES`
  </td>

  <td>
    <DNT>**Drop rule NRQL:**</DNT>

    ```sql
    SELECT dropAttr1, dropAttr2 FROM MyEvent WHERE foo = bar
    ```

    <DNT>**Validation NRQL:**</DNT>

    ```sql
    SELECT count(dropAttr1), count(dropAttr2) FROM MyEvent WHERE foo = bar TIMESERIES
    ```

    Both lines should drop to 0. To verify that it did not affect events that contained these attributes and still should, invert the `WHERE` clause.
  </td>
</tr>
Drop rule type
`DROP_DATA`

View rules [#view]

Here is an example NerdGraph call that returns the drop rules set on an account:

{
    actor {
        account(id: YOUR_ACCOUNT_ID) {
            nrqlDropRules {
                list {
                    rules {
                        id
                        nrql
                        accountId
                        action
                        createdBy
                        createdAt
                        description
                    }
                    error { reason description }
                }
            }
        }
    }
}

Delete drop rules [#delete]

Here is an example NerdGraph call deleting two specific drop rules:

mutation {
    nrqlDropRulesDelete(accountId: YOUR_ACCOUNT_ID, ruleIds: ["48", "98"]) {
        successes {
            id
            nrql
            accountId
            action
            description
        }
        failures {
            error { reason description }
            submitted { ruleId accountId }
        }
    }
}

Audit drop rule history [#history]

To see who created and deleted drop rules, query your account audit logs. The list endpoint also includes the user ID of the person who created the rule.

Cautions when dropping data [#caution]

When creating drop rules, you are responsible for ensuring that the rules accurately identify and discard the data that meets the conditions that you have established. You are also responsible for monitoring the rule, as well as the data you disclose to New Relic.

New Relic cannot guarantee that this functionality will completely resolve data disclosure concerns you may have. New Relic does not review or monitor how effective the rules you develop are.

Creating rules about sensitive data can leak information about what kinds of data you maintain, including the format of your data or systems (for example, through referencing email addresses or specific credit card numbers). Rules you create, including all information in those rules, can be viewed and edited by any user with the relevant role-based access control permissions.

Only new data will be dropped. Existing data cannot be edited or deleted.

Drop attributes on dimensional metric rollups

Dimensional metrics aggregate metrics into rollups for long term storage and as a way to optimize longer term queries. Metric cardinality limits are applied to this data.

You can use this feature to decide which attributes you don't need for long term storage and query, but would like to maintain for real time queries.

For example, adding containerId as an attribute can be useful for live troubleshooting or recent analysis, but may not be needed when querying over longer periods of time for larger trends. Due to how unique something like containerId can be, it can quickly drive you towards your metric cardinality limits which when hit stops the synthesis of rollups for the remainder of that UTC day.

This feature also allows you to keep the high cardinality attributes on the raw data and drop it from rollups which gives you more control over how quickly you approach your cardinaliity limits.

Usage

Drop attributes from dimensional metrics rollups (with optional filter). This uses DROP_ATTRIBUTES_FROM_METRIC_AGGREGATES action type and uses NRQL of the form:

SELECT dropAttr1, dropAttr2 FROM Metric (WHERE OPTIONAL_FILTER)

Here is an example NerdGraph request:

mutation {
    nrqlDropRulesCreate(accountId: YOUR_ACCOUNT_ID, rules: [
        {
            action: DROP_ATTRIBUTES_FROM_METRIC_AGGREGATES
            nrql: "SELECT containerId FROM Metric WHERE metricName = 'some.metric'"
            description: "Removes the containerId from long term querys."
        }
    ])
    {
        successes { id }
        failures {
            submitted { nrql }
            error { reason description }
        }
    }
}

To verify it's working, wait 3 to 5 minutes for the rule to be picked up and for aggregate data to be generated. Then assuming the example NRQL above is your drop rule, run the following queries:

SELECT count(containerId) FROM Metric WHERE metricName = 'some.metric' TIMESERIES SINCE 2 hours ago
SELECT count(containerId) FROM Metric WHERE metricName = 'some.metric' TIMESERIES SINCE 2 hours ago RAW

The first query retrieves metric rollups and should drop to 0 since containerId has been dropped per the new drop rule. The second query retrieves metric raws using the RAW keyword and should continue to hold steady since raw data is not impacted by the new drop rule. For more information on how to see the impact this will have on your cardinality, check out Understand and query high cardinality metrics.

Restrictions

All restrictions that apply to DROP_ATTRIBUTES apply to DROP_ATTRIBUTES_FROM_METRIC_AGGREGATES with the additional restriction that you can only target the Metric data type. They also do not work on Metric queries targeting data created by an events to metrics rule or on Metric queries targeting timeslice data.

Learn more

Recommendations for learning more: