-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tools/scylla-sstable]: add built-in scripting support #9679
Milestone
Comments
Closed
avikivity
added a commit
that referenced
this issue
Jan 9, 2023
…énes Introduce a new "script" operation, which loads a script from the specified path, then feeds the mutation fragment stream to it. The script can then extract, process and present information from the sstable as it wishes. For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available. Example: ```lua function new_stats(key) return { partition_key = key, total = 0, partition = 0, static_row = 0, clustering_row = 0, range_tombstone_change = 0, }; end total_stats = new_stats(nil); function inc_stat(stats, field) stats[field] = stats[field] + 1; stats.total = stats.total + 1; total_stats[field] = total_stats[field] + 1; total_stats.total = total_stats.total + 1; end function on_new_sstable(sst) max_partition_stats = new_stats(nil); if sst then current_sst_filename = sst.filename; else current_sst_filename = nil; end end function consume_partition_start(ps) current_partition_stats = new_stats(ps.key); inc_stat(current_partition_stats, "partition"); end function consume_static_row(sr) inc_stat(current_partition_stats, "static_row"); end function consume_clustering_row(cr) inc_stat(current_partition_stats, "clustering_row"); end function consume_range_tombstone_change(crt) inc_stat(current_partition_stats, "range_tombstone_change"); end function consume_partition_end() if current_partition_stats.total > max_partition_stats.total then max_partition_stats = current_partition_stats; end end function on_end_of_sstable() if current_sst_filename then print(string.format("Stats for sstable %s:", current_sst_filename)); else print("Stats for stream:"); end print(string.format("\t%d fragments in %d partitions - %d static rows, %d clustering rows and %d range tombstone changes", total_stats.total, total_stats.partition, total_stats.static_row, total_stats.clustering_row, total_stats.range_tombstone_change)); print(string.format("\tPartition with max number of fragments (%d): %s - %d static rows, %d clustering rows and %d range tombstone changes", max_partition_stats.total, max_partition_stats.partition_key, max_partition_stats.static_row, max_partition_stats.clustering_row, max_partition_stats.range_tombstone_change)); end ``` Running this script wilt yield the following: ``` $ scylla sstable script --script-file fragment-stats.lua --system-schema system_schema.columns /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-1-big-Data.db Stats for sstable /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//me-1-big-Data.db: 397 fragments in 7 partitions - 0 static rows, 362 clustering rows and 28 range tombstone changes Partition with max number of fragments (180): system - 0 static rows, 179 clustering rows and 0 range tombstone changes ``` Fixes: #9679 Closes #11649 * github.com:scylladb/scylladb: tools/scylla-sstable: consume_reader(): improve pause heuristincs test/cql-pytest/test_tools.py: add test for scylla-sstable script tools: add scylla-sstable-scripts directory tools/scylla-sstable: remove custom operation tools/scylla-sstable: add script operation tools/sstable: introduce the Lua sstable consumer dht/i_partitioner.hh: ring_position_ext: add weight() accessor lang/lua: export Scylla <-> lua type conversion methods lang/lua: use correct lib name for string lib lang/lua: fix type in aligned_used_data (meant to be user_data) lang/lua: use lua_State* in Scylla type <-> Lua type conversions tools/sstable_consumer: more consistent method naming tools/scylla-sstable: extract sstable_consumer interface into own header tools/json_writer: add accessor to underlying writer tools/scylla-sstable: fix indentation tools/scylla-sstable: export mutation_fragment_json_writer declaration tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer tools/scylla-sstable: extract json writing logic from json_dumper tools/scylla-sstable: extract json_writer into its own header tools/scylla-sstable: use json_writer::DataKey() to write all keys tools/scylla-types: fix use-after-free on main lambda captures
Not a bug, not backporting. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Although #9678 already enables scripting, dumping the entire content of the sstable can be very slow for large sstables. Add JSON generation + parsing on top of that. It would be much faster if
scylla-sstable
supported loading scripts and feeding data to them directly, allowing them to do anything they want with it.Possible scripting languages:
The text was updated successfully, but these errors were encountered: