Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for file-logging OTLP Metrics/Traces/LogsData data for experimental and research purposes #13626

Closed
jmacd opened this issue Aug 25, 2022 · 9 comments
Assignees
Labels
closed as inactive enhancement New feature or request exporter/file good first issue Good for newcomers priority:needed Triagers reviewed the issue but need code owner to set priority Stale

Comments

@jmacd
Copy link
Contributor

jmacd commented Aug 25, 2022

The exporter/fileexporter code could easily be extended to support capturing OTLP data for research purposes, as follows.

With these steps accomplished, it should be possible to setup an OTel collector that records large volumes of data with relatively low overhead. It should be possible to obfuscate telemetry data on the write path, or it should be possible to re-execute an OpenTelemetry collector that replays those logs to apply obfuscation after the fact. It should be possible to replay those logs to test OTel processors and exporters. Lastly, it is important that any code used for obfuscating telemetry data for research be widely reviewed by the community--these tools need to be well reviewed and held in community.

@yurishkuro
Copy link
Member

apply format-preserving encryption to obfuscate the data

not sure why format-preserving would be a requirement

I think the biggest challenge with exporting data for research purposes that I encountered at two large companies is privacy concerns. The guidelines I got were: obfuscated or not, any privacy-sensitive data must not be included in public data sets.

@jmacd
Copy link
Contributor Author

jmacd commented Aug 25, 2022

Format-preserving helps make compression results comparable.

obfuscated or not, any privacy-sensitive data must not be included in public data sets.

This is likely true in most places. Even so, it would be useful to have the plugins I described so that a customer could capture data and run diagnostic tools on them while protecting their data appropriately. For example, the benchmarks here (https://github.com/lquerel/otel-arrow-adapter/tree/main/tools) could be adapted to use data files generated in this way, where the obfuscation is just an extra layer of security.

@evan-bradley evan-bradley added enhancement New feature or request exporter/file priority:needed Triagers reviewed the issue but need code owner to set priority labels Sep 1, 2022
@atingchen
Copy link
Contributor

atingchen commented Sep 4, 2022

Can I take on this issue?

@atingchen
Copy link
Contributor

@jmacd Could you help review?Thank you.

@jmacd
Copy link
Contributor Author

jmacd commented Sep 13, 2022

Thank you @atingchen!

@atingchen
Copy link
Contributor

atingchen commented Oct 8, 2022

Hi @jmacd.
I find that feistel may not obfuscate the data with format-preserving encryption.
I try to use it to encrypt 12 digit number '123123123123'. According to the format-preserving encryption what I should get is another 12 digit number. But what I actually get is `ZW±ªX^²[[®R' which is a string of length 12.
I'm also communicating with its developers about this issue, but haven't heard back yet.

@jmacd
Copy link
Contributor Author

jmacd commented Nov 30, 2022

Hi @atingchen. Thank you and sorry for the delay in answering. I agree with your analysis that feistel cannot be applied in a format-preserving way to non-utf-8-formatted data, so numbers won't work. In my prototype work on this topic, I simply did not obfuscate number data, since I was reasonably sure in this case that numeric data would not leak sensitive information. Since feistel's obfuscation would change numeric field values, it would also impact studies meant to evaluate compression performance of numeric data, so it would not be "format preserving" in the sense we need it to for evaluating protocol decisions.

So, I think it would be reasonable for the tools in question to obfuscate only string-value fields in the protocol.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed as inactive enhancement New feature or request exporter/file good first issue Good for newcomers priority:needed Triagers reviewed the issue but need code owner to set priority Stale
Projects
None yet
Development

No branches or pull requests

4 participants