Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to trace mutation form from each data source during query #11130

Closed
tgrabiec opened this issue Jul 26, 2022 · 6 comments · Fixed by #14347
Closed

Add ability to trace mutation form from each data source during query #11130

tgrabiec opened this issue Jul 26, 2022 · 6 comments · Fixed by #14347

Comments

@tgrabiec
Copy link
Contributor

tgrabiec commented Jul 26, 2022

Knowing mutation form returned by each of the data source touched by a query would be very helpful in debugging issues related to incorrect query results.

The user wouldn't have to share whole sstables with us, so the process would be easier and faster. Sensitive data could be easily removed. We could add an option to do this automatically, removing cell values and obfuscate keys. The user could keep the obfuscation translation map for reference.

It could be reported via CQL tracing and enabled with a new query syntax (like "bypass cache"), e.g.:

cqlsh> tracing on
cqlsh> select * from my_table where ... TRACE MUTATIONS 

\cc @bhalevy

@bhalevy
Copy link
Member

bhalevy commented Jul 26, 2022

Printing the mutation as json seems like a good idea so analyzing it could be automated.

@denesb
Copy link
Contributor

denesb commented Jul 27, 2022

@avikivity
Copy link
Member

When a mutation reader creates subordinate readers, then we could create subordinate trace_state objects that name the subordinate and its parent. This would recreate the reader tree in runtime without much effort.

@slivne slivne added this to the 5.x milestone Aug 11, 2022
@denesb
Copy link
Contributor

denesb commented May 25, 2023

I'm, thinking a virtual table would be a better fit than tracing. The potential amount of data is huge, and it can be duplicated in all the different data sources, or even multiple times in a single data source. A virtual table naturally lends itself to delivering large amounts of data, and if done right, advanced filtering/redacting should be easily possible.

@denesb
Copy link
Contributor

denesb commented May 30, 2023

I opened an RFC PR using the virtual table approach: #14083.

@denesb denesb self-assigned this May 30, 2023
@DoronArazii DoronArazii modified the milestones: 5.x, 5.4 May 30, 2023
@mykaul
Copy link
Contributor

mykaul commented Jun 11, 2023

@bhalevy - this looks like an important improvement to our ability to debug issues - can we push it forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment