New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for file-logging OTLP Metrics/Traces/LogsData
data for experimental and research purposes
#13626
Comments
not sure why format-preserving would be a requirement I think the biggest challenge with exporting data for research purposes that I encountered at two large companies is privacy concerns. The guidelines I got were: obfuscated or not, any privacy-sensitive data must not be included in public data sets. |
Format-preserving helps make compression results comparable.
This is likely true in most places. Even so, it would be useful to have the plugins I described so that a customer could capture data and run diagnostic tools on them while protecting their data appropriately. For example, the benchmarks here (https://github.com/lquerel/otel-arrow-adapter/tree/main/tools) could be adapted to use data files generated in this way, where the obfuscation is just an extra layer of security. |
Can I take on this issue? |
@jmacd Could you help review?Thank you. |
Thank you @atingchen! |
Hi @jmacd. |
Hi @atingchen. Thank you and sorry for the delay in answering. I agree with your analysis that feistel cannot be applied in a format-preserving way to non-utf-8-formatted data, so numbers won't work. In my prototype work on this topic, I simply did not obfuscate number data, since I was reasonably sure in this case that numeric data would not leak sensitive information. Since feistel's obfuscation would change numeric field values, it would also impact studies meant to evaluate compression performance of numeric data, so it would not be "format preserving" in the sense we need it to for evaluating protocol decisions. So, I think it would be reasonable for the tools in question to obfuscate only string-value fields in the protocol. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
The
exporter/fileexporter
code could easily be extended to support capturing OTLP data for research purposes, as follows.exporter/fileexporter
to encode data using a protobuf stream in OTLP batch format described in Add proto messages for signals data independent of OTLP protocol opentelemetry-proto#332receiver/filereceiver
to replay telemetry recorded to the file exporter. (Although this could be done with the existing JSON support, it would be more efficient to use protobuf.)With these steps accomplished, it should be possible to setup an OTel collector that records large volumes of data with relatively low overhead. It should be possible to obfuscate telemetry data on the write path, or it should be possible to re-execute an OpenTelemetry collector that replays those logs to apply obfuscation after the fact. It should be possible to replay those logs to test OTel processors and exporters. Lastly, it is important that any code used for obfuscating telemetry data for research be widely reviewed by the community--these tools need to be well reviewed and held in community.
The text was updated successfully, but these errors were encountered: