Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions docs/configuration.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@ If your service handles data like this, we advise to only enable this feature wi

|============
| Environment | Django/Flask | Default
| `ELASTIC_APM_FLUSH_INTERVAL` | `FLUSH_INTERVAL` | `60`
| `ELASTIC_APM_FLUSH_INTERVAL` | `FLUSH_INTERVAL` | `10`
|============

Interval with which transactions should be sent to the APM server, in seconds.
Expand Down Expand Up @@ -374,8 +374,8 @@ Setting an upper limit will prevent overloading the agent and the APM server wit
==== `max_queue_size`

|============
| Environment | Django/Flask | Default
| `ELASTIC_APM_MAX_EVENT_QUEUE_LENGTH` | `MAX_QUEUE_SIZE` | `500`
| Environment | Django/Flask | Default
| `ELASTIC_APM_MAX_QUEUE_SIZE` | `MAX_QUEUE_SIZE` | `500`
|============

Maximum queue length of transactions before sending transactions to the APM server.
Expand Down Expand Up @@ -405,6 +405,19 @@ For more information, see <<sanitizing-data, Sanitizing Data>>.
WARNING: We recommend to always include the default set of validators if you customize this setting.


[float]
[[config-transaction-sample-rate]]
==== `transaction_sample_rate`

|============
| Environment | Django/Flask | Default
| `ELASTIC_APM_TRANSACTION_SAMPLE_RATE` | `TRANSACTION_SAMPLE_RATE` | `1.0`
|============

By default, the agent will sample every transaction (e.g. request to your service).
To reduce overhead and storage requirements, you can set the sample rate to a value between `0.0` and `1.0`.
We still record overall time and the result for unsampled transactions, but no context information, tags, or spans.

[float]
[[config-include-paths]]
==== `include_paths`
Expand Down
1 change: 1 addition & 0 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,4 @@ include::./sanitizing-data.asciidoc[Sanitizing Data]
include::./run-tests-locally.asciidoc[Run Tests Locally]

include::./api.asciidoc[API documentation]
include::./tuning.asciidoc[Tuning and Overhead considerations]
75 changes: 75 additions & 0 deletions docs/tuning.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
[[tuning-and-overhead]]
== Tuning and Overhead considerations

Using an APM solution comes with certain trade-offs, and the Python agent for Elastic APM is no different.
Instrumenting your code, measuring timings, recording context data etc. all need resources:

* CPU time
* memory
* bandwidth use
* Elasticsearch storage

We invested and continue to invest a lot of effort to keep the overhead of using Elastic APM as low as possible.
But because every deployment is different, there are some knobs you can turn to adapt it to your specific needs.

[float]
[[tuning-sample-rate]]
=== Transaction Sample Rate

The most straight forward way to reduce the overhead of the agent is to tell the agent to do less.
If you set the <<config-transaction-sample-rate,`transaction_sample_rate`>> to a value below `1.0`,
the agent will randomly sample only a subset of transactions.
If a transaction is not sampled, the agent has to do a lot less work,
as we only record the the name of the transaction, the overall transaction time and the result for unsampled transactions.

[options="header"]
|============
| Field | Sampled | Unsampled
| Transaction name | yes | yes
| Duration | yes | yes
| Result | yes | yes
| Context | yes | no
| Tags | yes | no
| Spans | yes | no
|============

Reducing the sample rate to a fraction of all transactions can make a huge difference in all four of the mentioned resource types.

[float]
[[tuning-queue]]
=== Transaction Queue

To reduce the load on the APM Server, the agent does not send every transaction up as it happens.
Instead, it queues them up, and flushes the queue periodically, or when it reaches a maximum size, using a background thread.

While this reduces the load on the APM Server (and to a certain extent on the agent),
holding on to the transaction data in a queue uses memory.
If you notice that using the Python agent results in a large increase of memory use,
you can use these settings:

* <<config-flush-interval,`flush_interval`>> to reduce the time between queue flushes
* <<config-max-queue-size,`max_queue_size`>> to reduce the maximum size of the queue

The first setting, `flush_interval`, is helpful if you have a sustained high number of transactions.
The second setting, `max_queue_size`, can help if you experience peaks of transactions
(a large amount of transactions in a short period of time).

Keep in mind that reducing the value of either setting will cause the agent to send more HTTP requests to the APM Server,
potentially causing a higher load.


[float]
[[tuning-max-spans]]
=== Spans per transaction

The average amount of spans per transaction can influence how much time the agent spends in each transaction collecting contextual data for each span,
and the the storage space needed in Elasticsearch.
In our experience, most usual transactions should have well below 100 spans.
In some cases however, the number of spans can explode:

* long-running transactions
* unoptimized code, e.g. doing hundreds of SQL queries in a loop

To avoid that such edge cases overload both the agent and the APM Server,
the agent stops recording spans when a limit is reached.
You can configure this limit by changing the <<config-transaction-max-spans,`transaction_max_spans`>> setting.
2 changes: 1 addition & 1 deletion elasticapm/conf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ class Config(_ConfigBase):
'elasticapm.processors.sanitize_http_request_querystring',
'elasticapm.processors.sanitize_http_request_body',
])
flush_interval = _ConfigValue('FLUSH_INTERVAL', type=int, default=60)
flush_interval = _ConfigValue('FLUSH_INTERVAL', type=int, default=10)
transaction_sample_rate = _ConfigValue('TRANSACTION_SAMPLE_RATE', type=float, default=1.0)
transaction_max_spans = _ConfigValue('TRANSACTION_MAX_SPANS', type=int, default=500)
max_queue_size = _ConfigValue('MAX_QUEUE_SIZE', type=int, default=500)
Expand Down
5 changes: 4 additions & 1 deletion tests/contrib/django/django_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1030,7 +1030,10 @@ def test_perf_database_render_no_instrumentation(benchmark, django_elasticapm_cl


@pytest.mark.django_db
@pytest.mark.parametrize('django_elasticapm_client', [{'_wait_to_first_send': 100}], indirect=True)
@pytest.mark.parametrize('django_elasticapm_client', [{
'_wait_to_first_send': 100,
'flush_interval': 100
}], indirect=True)
def test_perf_transaction_with_collection(benchmark, django_elasticapm_client):
django_elasticapm_client.instrumentation_store.get_all()
with mock.patch("elasticapm.traces.TransactionsStore.should_collect") as should_collect:
Expand Down