Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upWrite support for tables implemented from extensions #4094
Conversation
|
@alessandrogario has updated the pull request. |
|
This is really interesting, what potential use cases do you envision for this sort of thing? |
|
I've marked a few things as what I think are API Changes which might break other people's building of things that use the osquery SDK (at least I think that's what would happen). This is not a huge problem but it is something we want to make sure we're not doing flippantly. Overall I like this a lot, this is going to be a super cool feature to have and will provide a lot of value. Thanks! |
| } | ||
|
|
||
| // We may have to update our rowid | ||
| std::string new_rowid; |
obelisk
Feb 1, 2018
Contributor
Nit. Could use a ternary here.
Nit. Could use a ternary here.
| /// Callback for DELETE queries | ||
| QueryData delete_(QueryContext& context, const PluginRequest& request) { | ||
| auto rowid = request.at("id"); | ||
| if (id_to_internal_id.count(rowid) == 0) { |
obelisk
Feb 1, 2018
Contributor
Prefer find. Then you will have a pointer to the item to use below.
Prefer find. Then you will have a pointer to the item to use below.
alessandrogario
Feb 1, 2018
Author
Contributor
Yeah, you are right! I'm not sure why but I ended up matching some other code I found in the other files without thinking! Going to fix it right now
Yeah, you are right! I'm not sure why but I ended up matching some other code I found in the other files without thinking! Going to fix it right now
| // Callback for UPDATE queries | ||
| QueryData update(QueryContext& context, const PluginRequest& request) { | ||
| std::string current_rowid = request.at("id"); | ||
| if (id_to_internal_id.count(current_rowid) == 0) { |
obelisk
Feb 1, 2018
Contributor
Prefer find.
Prefer find.
| RowData user_row_data; | ||
| auto column_index = 0U; | ||
|
|
||
| for (std::string value; std::getline(stream, value, ';');) { |
obelisk
Feb 1, 2018
Contributor
Much of this logic is duplicated in the insert function. Is there a way we can utilize that so that if/when this code is updated there is a smaller chance of error (updating one but not the other)?
Much of this logic is duplicated in the insert function. Is there a way we can utilize that so that if/when this code is updated there is a smaller chance of error (updating one but not the other)?
|
|
||
| /// Callback for DELETE queries | ||
| QueryData delete_(QueryContext& context, const PluginRequest& request) { | ||
| auto rowid = request.at("id"); |
obelisk
Feb 1, 2018
Contributor
Can be const &?
Can be const &?
| sqlite3_module* module = | ||
| tables::sqlite::getVirtualTableModule(name, is_extension); | ||
| if (module == nullptr) { | ||
| auto error_message = |
obelisk
Feb 1, 2018
Contributor
I would just do this in the VLOG(1). In my opinion, the only time that returning an error message this verbose in a Status is useful is if you are checking it in code, or are going to print it out. Since you've already printed it out, you probably don't need to return it and Status(1) should suffice.
cc: @muffins Do you agree with this as a design decision? Please tell me I'm wrong if you think so.
I would just do this in the VLOG(1). In my opinion, the only time that returning an error message this verbose in a Status is useful is if you are checking it in code, or are going to print it out. Since you've already printed it out, you probably don't need to return it and Status(1) should suffice.
cc: @muffins Do you agree with this as a design decision? Please tell me I'm wrong if you think so.
| @@ -752,7 +777,7 @@ class TablePlugin : public Plugin { | |||
|
|
|||
| protected: | |||
| /// An SQL table containing the table definition/syntax. | |||
| std::string columnDefinition() const; | |||
| std::string columnDefinition(bool is_extension) const; | |||
obelisk
Feb 1, 2018
Contributor
This change could break other people who are compiling extensions against this header correct? Could we provide default values instead?
This change could break other people who are compiling extensions against this header correct? Could we provide default values instead?
alessandrogario
Feb 1, 2018
Author
Contributor
The columnDefinition method was defined several times (as a method and as a function) both overloaded and with default parameters, and this made it hard to read because you never knew where the call was going when passing some parameters (and this is why I didn't provide a default value here).
Isn't this method only used by osquery and not by the extension developers? (I'm asking because I didn't have to change the other example extensions after this patch).
The columnDefinition method was defined several times (as a method and as a function) both overloaded and with default parameters, and this made it hard to read because you never knew where the call was going when passing some parameters (and this is why I didn't provide a default value here).
Isn't this method only used by osquery and not by the extension developers? (I'm asking because I didn't have to change the other example extensions after this patch).
obelisk
Feb 1, 2018
Contributor
It is possible that other people are using them because they are in our public headers. This (and the changes I marked API Change) could in theory break anyone who is using the SDK. If this is the only way to do it, or this is a great improvement in the API then we can merge this but we shouldn't do so without evaluating the possibility of breaking some third party extensions.
It is possible that other people are using them because they are in our public headers. This (and the changes I marked API Change) could in theory break anyone who is using the SDK. If this is the only way to do it, or this is a great improvement in the API then we can merge this but we shouldn't do so without evaluating the possibility of breaking some third party extensions.
| @@ -846,10 +871,6 @@ class TablePlugin : public Plugin { | |||
| static void setRequestFromContext(const QueryContext& context, | |||
| PluginRequest& request); | |||
|
|
|||
| /// Helper data structure transformation methods. | |||
| static void setContextFromRequest(const PluginRequest& request, | |||
obelisk
Feb 1, 2018
Contributor
API Change
API Change
alessandrogario
Feb 1, 2018
Author
Contributor
I moved it because it wasn't referenced anywhere! Thanks for spotting this!
I moved it because it wasn't referenced anywhere! Thanks for spotting this!
| FRIEND_TEST(VirtualTableTests, test_tableplugin_statement); | ||
| FRIEND_TEST(VirtualTableTests, test_indexing_costs); | ||
| FRIEND_TEST(VirtualTableTests, test_table_results_cache); | ||
| FRIEND_TEST(VirtualTableTests, test_yield_generator); | ||
| }; | ||
|
|
||
| /// Helper method to generate the virtual table CREATE statement. | ||
| std::string columnDefinition(const TableColumns& columns); | ||
| std::string columnDefinition(const TableColumns& columns, bool is_extension); |
obelisk
Feb 1, 2018
Contributor
API Change.
API Change.
|
|
||
| /// Helper method to generate the virtual table CREATE statement. | ||
| std::string columnDefinition(const PluginResponse& response, | ||
| bool aliases = false); | ||
| bool aliases, |
obelisk
Feb 1, 2018
Contributor
API Change.
API Change.
|
@alessandrogario has updated the pull request. View: changes |
|
@alessandrogario has updated the pull request. |
1 similar comment
|
@alessandrogario has updated the pull request. |
| row_data.rowid = new_rowid; | ||
|
|
||
| // Build the primary key and make sure it's unique | ||
| std::string internal_rowid = |
obelisk
Feb 1, 2018
Contributor
This should be abstracted as a function that generates the primary key of the table. Currently adding a new column to the primary key is going to require changes to places and will be error prone (even if you use all the right columns but in the wrong order).
This should be abstracted as a function that generates the primary key of the table. Currently adding a new column to the primary key is going to require changes to places and will be error prone (even if you use all the right columns but in the wrong order).
| } | ||
|
|
||
| // Build the primary key and make sure it's unique | ||
| std::string new_internal_id = |
obelisk
Feb 1, 2018
Contributor
Call new primary key function.
Call new primary key function.
|
|
||
| auto current_internal_id = internal_id_it->second; | ||
|
|
||
| // This variable contains a value for each column in the table, separated by |
obelisk
Feb 1, 2018
Contributor
Is this still true?
Is this still true?
| auto new_rowid_it = request.find("new_id"); | ||
|
|
||
| std::string new_rowid; | ||
| if (new_rowid_it != request.end()) { |
obelisk
Feb 1, 2018
Contributor
This could be a ternary.
std::string new_rowid = new_rowid_it != request.end() ? new_rowid_it->second : current_rowid;
This could be a ternary.
std::string new_rowid = new_rowid_it != request.end() ? new_rowid_it->second : current_rowid;
|
|
||
| /// Callback for INSERT queries | ||
| QueryData insert(QueryContext& context, const PluginRequest& request) { | ||
| // This variable contains a value for each column in the table, separated by |
obelisk
Feb 1, 2018
Contributor
Is this still true?
Is this still true?
| QueryData results; | ||
|
|
||
| for (const auto& row_descriptor : row_map) { | ||
| const auto row_data = row_descriptor.second; |
obelisk
Feb 1, 2018
Contributor
I think this can be a reference
I think this can be a reference
|
@alessandrogario has updated the pull request. |
|
@alessandrogario has updated the pull request. View: changes |
|
@alessandrogario has updated the pull request. View: changes |
| response = update(context, request); | ||
|
|
||
| } else if (action == "columns") { | ||
| static_cast<void>(request); |
obelisk
Feb 1, 2018
Contributor
What does this do?
What does this do?
alessandrogario
Feb 1, 2018
Author
Contributor
This is just to make static analysis (and warnings with -Wunused) happy; I kind of keep adding it due to habits for other projects! I can see it's not applicable here, I'll remove it!
This is just to make static analysis (and warnings with -Wunused) happy; I kind of keep adding it due to habits for other projects! I can see it's not applicable here, I'll remove it!
| class WritableTable : public TablePlugin { | ||
| private: | ||
| /// Contents for a single row | ||
| struct RowData final { |
obelisk
Feb 1, 2018
Contributor
Can you elaborate on why you chose to use a struct here instead of an unordered_map? It seems to me that if these were maps you could just return them in your gen function instead of copying the values to the Row object. Maybe you can just use Row objects as well? Another upside is that most of your int conversions will go away.
The only draw back I see is not being able to store types in the their native types (they all become strings).
Let me know if I missed something obvious about your implementation :p
Can you elaborate on why you chose to use a struct here instead of an unordered_map? It seems to me that if these were maps you could just return them in your gen function instead of copying the values to the Row object. Maybe you can just use Row objects as well? Another upside is that most of your int conversions will go away.
The only draw back I see is not being able to store types in the their native types (they all become strings).
Let me know if I missed something obvious about your implementation :p
alessandrogario
Feb 1, 2018
Author
Contributor
I couldn't use strings because the data is now serialized using rapidjson, automatically matching the type of each column
I couldn't use strings because the data is now serialized using rapidjson, automatically matching the type of each column
| } | ||
|
|
||
| /// Expands a value list returned by osquery into a RowData structure | ||
| bool getRowData(RowData& row_data, const std::string& value_list) const { |
obelisk
Feb 1, 2018
Contributor
This should return a status
This should return a status
| rapidjson::Document document; | ||
| document.Parse(value_list); | ||
| if (document.HasParseError() || !document.IsArray()) { | ||
| VLOG(1) << "Invalid format"; |
obelisk
Feb 1, 2018
Contributor
Put these into the message of the status
Put these into the message of the status
| } | ||
|
|
||
| if (document.Size() != 2U) { | ||
| VLOG(1) << "Wrong column count."; |
obelisk
Feb 1, 2018
Contributor
As above.
As above.
| row_data.text = document[0].GetString(); | ||
| } | ||
|
|
||
| if (document[0].IsNull()) { |
obelisk
Feb 1, 2018
Contributor
Nit. Could be ternary.
Nit. Could be ternary.
|
ok to test |
|
@alessandrogario has updated the pull request. View: changes |
|
Hi @alessandrogario, why is it you say "Write support for core/built-in tables is NOT implemented, and will not work."? |
|
@fmanco Building in tables won't work because the code is generated for those tables early in the build process without the insert method set. We would have to change a lot more code and create some more context to support this in core tables. |
|
@alessandrogario has updated the pull request. View: changes |
| } | ||
|
|
||
| /// Saves the given row | ||
| Status saveRow(const Row& row, PrimaryKey primary_key = std::string()) { |
obelisk
Feb 5, 2018
Contributor
I would think that saveRow should check the primary key doesn't already exist, otherwise if you add saveRow in another place, you need to duplicate that logic.
I would think that saveRow should check the primary key doesn't already exist, otherwise if you add saveRow in another place, you need to duplicate that logic.
alessandrogario
Feb 5, 2018
Author
Contributor
I need to keep them separate so that the statements can be blocked earlier (i.e. without modifying the current data or calling the rowid generator) in case of a constraint failure. If I do this check inside the saveRow call, then I would have to undo changes
I need to keep them separate so that the statements can be blocked earlier (i.e. without modifying the current data or calling the rowid generator) in case of a constraint failure. If I do this check inside the saveRow call, then I would have to undo changes
|
|
||
| /// Expands a value list returned by osquery into a Row (without the rowid | ||
| /// column) | ||
| Status getRowData(Row& row, const std::string& value_list) const { |
obelisk
Feb 5, 2018
Contributor
Nit. We are no longer using a value list, we should rename this to JSON because that’s what it is.
Nit. We are no longer using a value list, we should rename this to JSON because that’s what it is.
alessandrogario
Feb 5, 2018
Author
Contributor
I kept value_list because it is more descriptive than just 'json' (it is in fact a list of values, regardless of the format). Maybe we can use json_value_array
I kept value_list because it is more descriptive than just 'json' (it is in fact a list of values, regardless of the format). Maybe we can use json_value_array
|
|
||
| row["text"] = document[0].IsNull() ? "" : document[0].GetString(); | ||
|
|
||
| auto value = document[1].IsNull() ? 0 : document[1].GetInt(); |
obelisk
Feb 5, 2018
Contributor
Combine these two lines so we can get rid of the value intermediate variable.
Combine these two lines so we can get rid of the value intermediate variable.
alessandrogario
Feb 5, 2018
Author
Contributor
Duh! Was trying to keep short lines :D Will fix right now!
Duh! Was trying to keep short lines :D Will fix right now!
|
|
||
| // Finally, save the row; also pass the primary key we calculated so that | ||
| // the function doesn't have to compute it again | ||
| status = saveRow(row, primary_key); |
obelisk
Feb 5, 2018
Contributor
If we fold the primary key check logic in here we can combine the error checking here and on line 152.
If we fold the primary key check logic in here we can combine the error checking here and on line 152.
| std::vector<std::string> extension_table_list; | ||
| std::mutex extension_table_list_mutex; | ||
|
|
||
| bool isExtensionTable(const std::string& table_name) { |
obelisk
Feb 5, 2018
Contributor
Do we need a new array to keep track of this or can we use the registry?
Do we need a new array to keep track of this or can we use the registry?
alessandrogario
Feb 5, 2018
Author
Contributor
I tried to use the registry many times, but it doesn't look like it can be used reliably while it is initializing the tables
I tried to use the registry many times, but it doesn't look like it can be used reliably while it is initializing the tables
| @@ -178,13 +488,20 @@ int xCreate(sqlite3* db, | |||
| return SQLITE_ERROR; | |||
| } | |||
|
|
|||
| // Tables implemented from extensions can be made read/write if they overwrite | |||
obelisk
Feb 5, 2018
Contributor
S/overwrite/implement
S/overwrite/implement
|
|
||
| struct sqlite3_module* getVirtualTableModule(const std::string& table_name, | ||
| bool extension) { | ||
| static std::unordered_map<std::string, struct sqlite3_module> modules; |
obelisk
Feb 5, 2018
Contributor
I don't like this static definition, I think it makes the code messier.
cc @muffins Thoughts on this?
I don't like this static definition, I think it makes the code messier.
cc @muffins Thoughts on this?
alessandrogario
Feb 5, 2018
Author
Contributor
This is done to actually avoid making the code messier (because we are reducing the scope as much as possible since it is not needed outside the dedicated interface function). This is also thread safe (as long as we compile with C++11 or better).
I can move this outside and remove the static if you prefer
This is done to actually avoid making the code messier (because we are reducing the scope as much as possible since it is not needed outside the dedicated interface function). This is also thread safe (as long as we compile with C++11 or better).
I can move this outside and remove the static if you prefer
obelisk
Feb 5, 2018
Contributor
Makes sense :)
Makes sense :)
uptycs-nishant
Feb 5, 2018
Contributor
Hi @alessandrogario I believe initialization 'static std::unordered_map<std::string, struct sqlite3_module> modules;' is thread safe but 'modules[table_name] = {};' is not thead safe
I.e. initialization of a function local static object is thread safe not the operations on them.
Hi @alessandrogario I believe initialization 'static std::unordered_map<std::string, struct sqlite3_module> modules;' is thread safe but 'modules[table_name] = {};' is not thead safe
I.e. initialization of a function local static object is thread safe not the operations on them.
alessandrogario
Feb 7, 2018
•
Author
Contributor
Yes! I used the wrong terms (I meant that this construct is useful and use for singletons for example); initialization is performed on a single thread (you can see the original code doesn't protect access to this variable)
Yes! I used the wrong terms (I meant that this construct is useful and use for singletons for example); initialization is performed on a single thread (you can see the original code doesn't protect access to this variable)
uptycs-nishant
Feb 9, 2018
Contributor
In the present code usage of variable 'module' is thread safe since -
- It is initialized as static variable.
- After initialization it is being used as read-only.
While the code you are proposing 'modules' variable is not read-only.
In the present code usage of variable 'module' is thread safe since -
- It is initialized as static variable.
- After initialization it is being used as read-only.
While the code you are proposing 'modules' variable is not read-only.
alessandrogario
Feb 15, 2018
•
Author
Contributor
You are right it is not accessed in read only, but I was under the assumption that initialization was single threaded.
EDIT: Taking a look at it again, and it does seem that everything is performed in a unique call from a single thread, so it doesn't matter whether you lock it or not. I will move the function call under the attachLock call just to be sure. What do you think?
Thanks!
You are right it is not accessed in read only, but I was under the assumption that initialization was single threaded.
EDIT: Taking a look at it again, and it does seem that everything is performed in a unique call from a single thread, so it doesn't matter whether you lock it or not. I will move the function call under the attachLock call just to be sure. What do you think?
Thanks!
uptycs-nishant
Feb 19, 2018
Contributor
"it does seem that everything is performed in a unique call from a single thread, so it doesn't matter whether you lock it or not."
I observed that the code is written with keeping thread safety in mind. Until now access to module object was being done in thread safe manner since it is being accessed as read-only. At present only single thread of control is accessing this code but it may not be the case in future.
I believe attachlock will not be enough since it is synchronizing access to instance but modules object can be accessed across the instances. Please correct me if I am wrong.
"it does seem that everything is performed in a unique call from a single thread, so it doesn't matter whether you lock it or not."
I observed that the code is written with keeping thread safety in mind. Until now access to module object was being done in thread safe manner since it is being accessed as read-only. At present only single thread of control is accessing this code but it may not be the case in future.
I believe attachlock will not be enough since it is synchronizing access to instance but modules object can be accessed across the instances. Please correct me if I am wrong.
alessandrogario
Feb 23, 2018
Author
Contributor
You may be right! I've ended up reverting the last commit, and adding a mutex. Thanks!
You may be right! I've ended up reverting the last commit, and adding a mutex. Thanks!
|
@alessandrogario has updated the pull request. View: changes |
|
I'm not really sure why osqueryer commented that the test has failed; it is passing on my virtual machine, and it is not shown as failing in the check list at the bottom of the page! |
|
@alessandrogario has updated the pull request. |
|
@alessandrogario has updated the pull request. |
|
@alessandrogario has updated the pull request. View: changes |
|
@alessandrogario has updated the pull request. |
|
Ok, I left myself some pointers for a larger review later. I think this approach works. I want to be extra-sure about the keys (their names) used in serialization. |
| @@ -192,12 +192,16 @@ Status SQLiteSQLPlugin::attach(const std::string& name) { | |||
| return status; | |||
| } | |||
|
|
|||
| auto statement = columnDefinition(response); | |||
| bool is_extension = true; | |||
| auto statement = columnDefinition(response, false, is_extension); | |||
theopolis
Jun 1, 2018
Member
Confirm that this attach method is only called for external tables.
Confirm that this attach method is only called for external tables.
| @@ -702,14 +1094,16 @@ void attachVirtualTables(const SQLiteDBInstanceRef& instance) { | |||
| } | |||
|
|
|||
| PluginResponse response; | |||
| bool is_extension = false; | |||
theopolis
Jun 1, 2018
Member
TODO(Ted): Double check this.
TODO(Ted): Double check this.
| // Tables implemented by extension can be made read/write; make sure to always | ||
| // keep the rowid column; this makes them easier to manage, and also removes | ||
| // any limit on the primary key count | ||
| if (is_extension) { |
theopolis
Jun 1, 2018
Member
Let's see if we can fix the INDEX=True, so we wont need this bool.
Let's see if we can fix the INDEX=True, so we wont need this bool.
| std::mutex extension_table_list_mutex; | ||
|
|
||
| bool isExtensionTable(const std::string& table_name) { | ||
| std::lock_guard<std::mutex> locker(extension_table_list_mutex); |
theopolis
Jun 1, 2018
Member
You can use ReadLock lock(...);
You can use ReadLock lock(...);
|
|
||
| struct sqlite3_module* getVirtualTableModule(const std::string& table_name, | ||
| bool extension) { | ||
| std::lock_guard<std::mutex> lock(sqlite_module_map_mutex); |
theopolis
Jun 1, 2018
Member
nb: ReadLock lock...;
nb: ReadLock lock...;
|
|
||
| std::string table_name = pVtab->content->name; | ||
|
|
||
| if (FLAGS_table_delay > 0 && pVtab->instance->tableCalled(content)) { |
theopolis
Jun 1, 2018
Member
You can delete this.
You can delete this.
| // Apply an optional sleep between table calls. | ||
| sleepFor(FLAGS_table_delay); | ||
| } | ||
| pVtab->instance->addAffectedTable(content); |
theopolis
Jun 1, 2018
Member
You can delete this.
You can delete this.
|
|
||
| // The SQLite instance communicates to the TablePlugin via the context. | ||
| QueryContext context(content); | ||
| context.useCache(pVtab->instance->useCache()); |
theopolis
Jun 1, 2018
Member
You can delete this.
You can delete this.
| return serializerError; | ||
| } | ||
|
|
||
| plugin_request.insert({"json_value_array", json_value_array}); |
theopolis
Jun 1, 2018
Member
Investigate the use of this map key, why json_value_array?
Investigate the use of this map key, why json_value_array?
| return serializerError; | ||
| } | ||
|
|
||
| plugin_request.insert({"json_value_array", json_value_array}); |
theopolis
Jun 1, 2018
Member
Again
Again
| /// Callback for DELETE statements | ||
| virtual QueryData delete_(QueryContext& context, | ||
| const PluginRequest& request) { | ||
| static_cast<void>(context); |
akindyakov
Jul 12, 2018
Contributor
May I ask you to use in such cases boost::ignore_unused as more explicit way to switch off this warning. ref
May I ask you to use in such cases boost::ignore_unused as more explicit way to switch off this warning. ref
| "list", | ||
| "semi", | ||
| "csv", | ||
| "pretty", |
akindyakov
Jul 12, 2018
Contributor
May I ask you to submit formatting changes in separate PR?
May I ask you to submit formatting changes in separate PR?
alessandrogario
Jul 12, 2018
Author
Contributor
This sometimes happens when we update the clang-format executable from the brew repository and there are changes on how code is formatted by the tool. If I remove this, then the make audit step will fail (and the CI workers will fail too).
These lines will disappear once we rebase on master after the correctly formatted code is committed.
This sometimes happens when we update the clang-format executable from the brew repository and there are changes on how code is formatted by the tool. If I remove this, then the make audit step will fail (and the CI workers will fail too).
These lines will disappear once we rebase on master after the correctly formatted code is committed.
akindyakov
Jul 13, 2018
Contributor
As far as I know make audit checks only changed lines. If some particular code is not satisfy clang-format and is not changed the make audit will not complain about it.
For instance:
$ git checkout -b test
$ make audit
And disregarding to the wrong formatting in shell.cpp audit is not complain about it. But if you change one letter in this string - it will start complain.
There is a instrument to avoid complains of audit test in changed code: make format_master. It get a diff between local master and feature brach and fix up format only in changed lines.
As far as I know make audit checks only changed lines. If some particular code is not satisfy clang-format and is not changed the make audit will not complain about it.
For instance:
$ git checkout -b test
$ make auditAnd disregarding to the wrong formatting in shell.cpp audit is not complain about it. But if you change one letter in this string - it will start complain.
There is a instrument to avoid complains of audit test in changed code: make format_master. It get a diff between local master and feature brach and fix up format only in changed lines.
alessandrogario
Jul 13, 2018
Author
Contributor
That's really cool! Has this been introduced recently? I had several PR fail due to this in the recent past!
That's really cool! Has this been introduced recently? I had several PR fail due to this in the recent past!
| @@ -1083,7 +1088,7 @@ inline void meta_schema(int nArg, char** azArg) { | |||
| fprintf(stdout, | |||
| "CREATE TABLE %s%s;\n", | |||
| table.c_str(), | |||
| osquery::columnDefinition(response, true).c_str()); | |||
| osquery::columnDefinition(response, true, false).c_str()); | |||
akindyakov
Jul 12, 2018
Contributor
I'd like to suggest you make such calls more explicit in terms of passing arguments. Smth like this:
auto const aliases = true;
auto const is_extension = false;
osquery::columnDefinition(response, aliases, is_extension).c_str());
What do you think?
I'd like to suggest you make such calls more explicit in terms of passing arguments. Smth like this:
auto const aliases = true;
auto const is_extension = false;
osquery::columnDefinition(response, aliases, is_extension).c_str());What do you think?
| return (std::find(extension_table_list.begin(), | ||
| extension_table_list.end(), | ||
| table_name) != extension_table_list.end()); | ||
| } |
akindyakov
Jul 12, 2018
Contributor
- There are 3 things that are used only together, would you like to make a make a class for them?
- There is no usage of it outside of cpp file. Would you like to put it under anonymous namespace?
- As far as order of elements in
extension_table_list is not important (I didn't find it, may be I'm wrong) I'd suggest to use unordered_set instead of it.
- There are 3 things that are used only together, would you like to make a make a class for them?
- There is no usage of it outside of cpp file. Would you like to put it under anonymous namespace?
- As far as order of elements in
extension_table_listis not important (I didn't find it, may be I'm wrong) I'd suggest to useunordered_setinstead of it.
|
|
||
| bool getColumnValue(std::string& value, | ||
| int index, | ||
| int argc, |
akindyakov
Jul 12, 2018
Contributor
index and argc are used for indexing, would you like to define them as std::size_t? To avoid a problems with negative values.
index and argc are used for indexing, would you like to define them as std::size_t? To avoid a problems with negative values.
|
|
||
| auto buffer_size = sqlite3_value_bytes(sqlite_value); | ||
| value.resize(buffer_size); | ||
| std::memcpy(&value[0], data_pointer, buffer_size); |
akindyakov
Jul 12, 2018
Contributor
You could do the same operation without using low level function memcpy.
Smth like this:
value.assign(data_pointer, buffer_size);
You could do the same operation without using low level function memcpy.
Smth like this:
value.assign(data_pointer, buffer_size);| } | ||
|
|
||
| case SQLITE_NULL: { | ||
| break; |
akindyakov
Jul 12, 2018
Contributor
Should we put in the value something like "0" or similar in this case?
Should we put in the value something like "0" or similar in this case?
alessandrogario
Jul 12, 2018
Author
Contributor
The value variable is cleared before we enter the switch, so in case of a NULL we get an empty string. This is how NULL values are generally handled by tables
The value variable is cleared before we enter the switch, so in case of a NULL we get an empty string. This is how NULL values are generally handled by tables
|
|
||
| // Type-aware serializer to pass parameters between osquery and the extension | ||
| int serializeUpdateParameters(std::string& json_value_array, | ||
| int argc, |
akindyakov
Jul 12, 2018
Contributor
Would you like to consider of using std::size_t for argc variable?
And also define it as const?
Would you like to consider of using std::size_t for argc variable?
And also define it as const?
| int argc, | ||
| sqlite3_value** argv, | ||
| const TableColumns& columnDescriptors) { | ||
| if (columnDescriptors.size() != static_cast<size_t>(argc - 2)) { |
akindyakov
Jul 12, 2018
Contributor
It could be problematic to substitute something from number and cast it to unsigned type. Is here any possibility for argc to be less than 2?
It could be problematic to substitute something from number and cast it to unsigned type. Is here any possibility for argc to be less than 2?
alessandrogario
Jul 12, 2018
Author
Contributor
All the argc parameters are defined as int to match the type used by sqlite3 to define that variable.
If we switch the type to something else, we will have to cast it to size_t when we receive it and back to int when we return it back to the library (would you prefer this approach?).
Parameter serialization only happens when an UPDATE or INSERT statement has been received.
This is a short explanation of how it works for this case, but the documentation goes in more detail:
Update queries
We are updating an existing row; the ID for that row is inside argv[0]
argv[1] == NULL
The table code is expected to keep using the ID associated with the row being modified. This ID can be found in argv[0].
argv[1] == INTEGER
The table handler is expected to update the row, invalidate the existing row ID (argv[0]) and use the new one found in argv[1].
Insert queries
In this case, the row is new so there is no existing row ID. This means that argv[0] will be set to SQLITE_NULL.
argv[1] == NULL
The table is expected to come up with a new row ID and return it back to sqlite.
argv[1] == INTEGER
sqlite requires the table logic to use the supplied row ID when saving the row.
All the argc parameters are defined as int to match the type used by sqlite3 to define that variable.
If we switch the type to something else, we will have to cast it to size_t when we receive it and back to int when we return it back to the library (would you prefer this approach?).
Parameter serialization only happens when an UPDATE or INSERT statement has been received.
This is a short explanation of how it works for this case, but the documentation goes in more detail:
Update queries
We are updating an existing row; the ID for that row is inside argv[0]
argv[1] == NULL
The table code is expected to keep using the ID associated with the row being modified. This ID can be found in argv[0].
argv[1] == INTEGER
The table handler is expected to update the row, invalidate the existing row ID (argv[0]) and use the new one found in argv[1].
Insert queries
In this case, the row is new so there is no existing row ID. This means that argv[0] will be set to SQLITE_NULL.
argv[1] == NULL
The table is expected to come up with a new row ID and return it back to sqlite.
argv[1] == INTEGER
sqlite requires the table logic to use the supplied row ID when saving the row.
akindyakov
Jul 13, 2018
Contributor
Thank you for the explanation. I see your point. But even so I'd like to suggest you to use a std::size_t here.
int in sqlite3 interface is just a part of interface. And is good to see this conversion from std::size_t to int only when we use the sqlite library.
- Mixing up a signed and unsigned types is bad idea. It could be cause for many bugs and even UB, which is pure evil. Look at the example:
#include <iostream>
int main() {
const auto sv = -2;
const auto usv = 1u;
auto res = sv + usv;
std::cout << res << '\n';
return 0;
}
What is the type of res and what is the answer? This is potential UB.
3. You right for high load pieces of code static conversion from one type to another could be have a notable performance penalty. But we can not solve this problem by using a int type here. Because there still will be a lot of conversions. And, that is the worst, they will be implicit. Which is hard to find, to control, test and debug.
4. And semantics, std::size_t is a special type dedicated to represent indexes, address offsets and so on. If you see that type you will know - that variable is for the size. int is too common, variable of this type could be anything.
Thank you for the explanation. I see your point. But even so I'd like to suggest you to use a std::size_t here.
intinsqlite3interface is just a part of interface. And is good to see this conversion fromstd::size_ttointonly when we use thesqlitelibrary.- Mixing up a signed and unsigned types is bad idea. It could be cause for many bugs and even UB, which is pure evil. Look at the example:
#include <iostream>
int main() {
const auto sv = -2;
const auto usv = 1u;
auto res = sv + usv;
std::cout << res << '\n';
return 0;
}What is the type of res and what is the answer? This is potential UB.
3. You right for high load pieces of code static conversion from one type to another could be have a notable performance penalty. But we can not solve this problem by using a int type here. Because there still will be a lot of conversions. And, that is the worst, they will be implicit. Which is hard to find, to control, test and debug.
4. And semantics, std::size_t is a special type dedicated to represent indexes, address offsets and so on. If you see that type you will know - that variable is for the size. int is too common, variable of this type could be anything.
| break; | ||
| } | ||
|
|
||
| default: { return SQLITE_MISMATCH; } |
akindyakov
Jul 12, 2018
Contributor
Should we write something to LOG(ERROR) here?
Should we write something to LOG(ERROR) here?
|
@alessandrogario looks like @theopolis and @akindyakov had a few more things to fix, and this needs a rebase, but we should be getting close on this yeah? |
|
Hey @muffins! I've rebased and fixed the issues that have been mentioned; the remaining comments are personal notes/bookmarks that Ted has left for his review during QueryCon. I don't think he is requesting for any change, but correct me if I'm wrong! |
|
LGTM. Let's use smaller follow-up PRs for any additionally changes folks would like to see, but for now I think we're ok bringing this in and iterating. |
Description
This PR adds support for writable extension tables by providing an xUpdate sqlite3 implementation.
Disclamer
Write support for core/built-in tables is NOT implemented, and will not work.
Demo
This is a small demo for the following example: osquery/examples/example_writable_table.cpp