New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write support for tables implemented from extensions #4094

Merged
merged 6 commits into from Jul 17, 2018

Conversation

@alessandrogario
Contributor

alessandrogario commented Jan 31, 2018

Description

This PR adds support for writable extension tables by providing an xUpdate sqlite3 implementation.

Disclamer

Write support for core/built-in tables is NOT implemented, and will not work.

Demo

This is a small demo for the following example: osquery/examples/example_writable_table.cpp

osquery> select * from WritableTable;
osquery> insert into WritableTable (text, integer) values ("1", 1);
osquery> insert into WritableTable (text, integer) values ("1", 1);
Error: constraint failed
osquery> insert into WritableTable (text, integer) values ("2", 2);
osquery> select * from WritableTable;
+------+---------+
| text | integer |
+------+---------+
| 2    | 2       |
| 1    | 1       |
+------+---------+
osquery> delete from WritableTable where text = "1";
osquery> select * from WritableTable;
+------+---------+
| text | integer |
+------+---------+
| 2    | 2       |
+------+---------+
osquery> insert into WritableTable (text, integer) values ("3", 3);
osquery> select * from WritableTable;
+------+---------+
| text | integer |
+------+---------+
| 3    | 3       |
| 2    | 2       |
+------+---------+
osquery> update WritableTable set text="2", integer=2 where text="3";
Error: constraint failed
osquery> update WritableTable set text="5", integer=5 where text="3";
osquery> select * from WritableTable;
+------+---------+
| text | integer |
+------+---------+
| 5    | 5       |
| 2    | 2       |
+------+---------+
@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Jan 31, 2018

@alessandrogario has updated the pull request.

@obelisk

This comment has been minimized.

Contributor

obelisk commented Jan 31, 2018

This is really interesting, what potential use cases do you envision for this sort of thing?

@obelisk

I've marked a few things as what I think are API Changes which might break other people's building of things that use the osquery SDK (at least I think that's what would happen). This is not a huge problem but it is something we want to make sure we're not doing flippantly.

Overall I like this a lot, this is going to be a super cool feature to have and will provide a lot of value.

Thanks!

}
// We may have to update our rowid
std::string new_rowid;

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Nit. Could use a ternary here.

/// Callback for DELETE queries
QueryData delete_(QueryContext& context, const PluginRequest& request) {
auto rowid = request.at("id");
if (id_to_internal_id.count(rowid) == 0) {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Prefer find. Then you will have a pointer to the item to use below.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 1, 2018

Contributor

Yeah, you are right! I'm not sure why but I ended up matching some other code I found in the other files without thinking! Going to fix it right now

// Callback for UPDATE queries
QueryData update(QueryContext& context, const PluginRequest& request) {
std::string current_rowid = request.at("id");
if (id_to_internal_id.count(current_rowid) == 0) {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Prefer find.

RowData user_row_data;
auto column_index = 0U;
for (std::string value; std::getline(stream, value, ';');) {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Much of this logic is duplicated in the insert function. Is there a way we can utilize that so that if/when this code is updated there is a smaller chance of error (updating one but not the other)?

/// Callback for DELETE queries
QueryData delete_(QueryContext& context, const PluginRequest& request) {
auto rowid = request.at("id");

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Can be const &?

sqlite3_module* module =
tables::sqlite::getVirtualTableModule(name, is_extension);
if (module == nullptr) {
auto error_message =

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

I would just do this in the VLOG(1). In my opinion, the only time that returning an error message this verbose in a Status is useful is if you are checking it in code, or are going to print it out. Since you've already printed it out, you probably don't need to return it and Status(1) should suffice.

cc: @muffins Do you agree with this as a design decision? Please tell me I'm wrong if you think so.

@@ -752,7 +777,7 @@ class TablePlugin : public Plugin {
protected:
/// An SQL table containing the table definition/syntax.
std::string columnDefinition() const;
std::string columnDefinition(bool is_extension) const;

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

This change could break other people who are compiling extensions against this header correct? Could we provide default values instead?

This comment has been minimized.

@alessandrogario

alessandrogario Feb 1, 2018

Contributor

The columnDefinition method was defined several times (as a method and as a function) both overloaded and with default parameters, and this made it hard to read because you never knew where the call was going when passing some parameters (and this is why I didn't provide a default value here).

Isn't this method only used by osquery and not by the extension developers? (I'm asking because I didn't have to change the other example extensions after this patch).

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

It is possible that other people are using them because they are in our public headers. This (and the changes I marked API Change) could in theory break anyone who is using the SDK. If this is the only way to do it, or this is a great improvement in the API then we can merge this but we shouldn't do so without evaluating the possibility of breaking some third party extensions.

@@ -846,10 +871,6 @@ class TablePlugin : public Plugin {
static void setRequestFromContext(const QueryContext& context,
PluginRequest& request);
/// Helper data structure transformation methods.
static void setContextFromRequest(const PluginRequest& request,

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

API Change

This comment has been minimized.

@alessandrogario

alessandrogario Feb 1, 2018

Contributor

I moved it because it wasn't referenced anywhere! Thanks for spotting this!

FRIEND_TEST(VirtualTableTests, test_tableplugin_statement);
FRIEND_TEST(VirtualTableTests, test_indexing_costs);
FRIEND_TEST(VirtualTableTests, test_table_results_cache);
FRIEND_TEST(VirtualTableTests, test_yield_generator);
};
/// Helper method to generate the virtual table CREATE statement.
std::string columnDefinition(const TableColumns& columns);
std::string columnDefinition(const TableColumns& columns, bool is_extension);

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

API Change.

/// Helper method to generate the virtual table CREATE statement.
std::string columnDefinition(const PluginResponse& response,
bool aliases = false);
bool aliases,

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

API Change.

@obelisk obelisk added the API change label Feb 1, 2018

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 1, 2018

@alessandrogario has updated the pull request. View: changes

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 1, 2018

@alessandrogario has updated the pull request.

1 similar comment
@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 1, 2018

@alessandrogario has updated the pull request.

row_data.rowid = new_rowid;
// Build the primary key and make sure it's unique
std::string internal_rowid =

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

This should be abstracted as a function that generates the primary key of the table. Currently adding a new column to the primary key is going to require changes to places and will be error prone (even if you use all the right columns but in the wrong order).

}
// Build the primary key and make sure it's unique
std::string new_internal_id =

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Call new primary key function.

auto current_internal_id = internal_id_it->second;
// This variable contains a value for each column in the table, separated by

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Is this still true?

auto new_rowid_it = request.find("new_id");
std::string new_rowid;
if (new_rowid_it != request.end()) {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

This could be a ternary.

std::string new_rowid = new_rowid_it != request.end() ? new_rowid_it->second : current_rowid;
/// Callback for INSERT queries
QueryData insert(QueryContext& context, const PluginRequest& request) {
// This variable contains a value for each column in the table, separated by

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Is this still true?

QueryData results;
for (const auto& row_descriptor : row_map) {
const auto row_data = row_descriptor.second;

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

I think this can be a reference

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 1, 2018

@alessandrogario has updated the pull request.

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 1, 2018

@alessandrogario has updated the pull request. View: changes

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 1, 2018

@alessandrogario has updated the pull request. View: changes

response = update(context, request);
} else if (action == "columns") {
static_cast<void>(request);

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

What does this do?

This comment has been minimized.

@alessandrogario

alessandrogario Feb 1, 2018

Contributor

This is just to make static analysis (and warnings with -Wunused) happy; I kind of keep adding it due to habits for other projects! I can see it's not applicable here, I'll remove it!

class WritableTable : public TablePlugin {
private:
/// Contents for a single row
struct RowData final {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Can you elaborate on why you chose to use a struct here instead of an unordered_map? It seems to me that if these were maps you could just return them in your gen function instead of copying the values to the Row object. Maybe you can just use Row objects as well? Another upside is that most of your int conversions will go away.

The only draw back I see is not being able to store types in the their native types (they all become strings).

Let me know if I missed something obvious about your implementation :p

This comment has been minimized.

@alessandrogario

alessandrogario Feb 1, 2018

Contributor

I couldn't use strings because the data is now serialized using rapidjson, automatically matching the type of each column

}
/// Expands a value list returned by osquery into a RowData structure
bool getRowData(RowData& row_data, const std::string& value_list) const {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

This should return a status

rapidjson::Document document;
document.Parse(value_list);
if (document.HasParseError() || !document.IsArray()) {
VLOG(1) << "Invalid format";

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Put these into the message of the status

}
if (document.Size() != 2U) {
VLOG(1) << "Wrong column count.";

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

As above.

row_data.text = document[0].GetString();
}
if (document[0].IsNull()) {

This comment has been minimized.

@obelisk

obelisk Feb 1, 2018

Contributor

Nit. Could be ternary.

@obelisk

This comment has been minimized.

Contributor

obelisk commented Feb 1, 2018

ok to test

@obelisk obelisk removed the API change label Feb 1, 2018

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Feb 1, 2018

👎 The commit d10a5ee (Job results: 3349) failed one or more tests (Windows).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Feb 1, 2018

👎 The commit d10a5ee (Job results: 3350) failed one or more tests (Windows).

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 2, 2018

@alessandrogario has updated the pull request. View: changes

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Feb 2, 2018

👎 The commit b4eb518 (Job results: 3352) failed one or more tests (Windows).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Feb 2, 2018

👎 The commit b4eb518 (Job results: 3353) failed one or more tests (Windows).

@fmanco

This comment has been minimized.

Contributor

fmanco commented Feb 2, 2018

Hi @alessandrogario, why is it you say "Write support for core/built-in tables is NOT implemented, and will not work."?

@obelisk

This comment has been minimized.

Contributor

obelisk commented Feb 2, 2018

@fmanco Building in tables won't work because the code is generated for those tables early in the build process without the insert method set. We would have to change a lot more code and create some more context to support this in core tables.

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 3, 2018

@alessandrogario has updated the pull request. View: changes

}
/// Saves the given row
Status saveRow(const Row& row, PrimaryKey primary_key = std::string()) {

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

I would think that saveRow should check the primary key doesn't already exist, otherwise if you add saveRow in another place, you need to duplicate that logic.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 5, 2018

Contributor

I need to keep them separate so that the statements can be blocked earlier (i.e. without modifying the current data or calling the rowid generator) in case of a constraint failure. If I do this check inside the saveRow call, then I would have to undo changes

/// Expands a value list returned by osquery into a Row (without the rowid
/// column)
Status getRowData(Row& row, const std::string& value_list) const {

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

Nit. We are no longer using a value list, we should rename this to JSON because that’s what it is.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 5, 2018

Contributor

I kept value_list because it is more descriptive than just 'json' (it is in fact a list of values, regardless of the format). Maybe we can use json_value_array

row["text"] = document[0].IsNull() ? "" : document[0].GetString();
auto value = document[1].IsNull() ? 0 : document[1].GetInt();

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

Combine these two lines so we can get rid of the value intermediate variable.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 5, 2018

Contributor

Duh! Was trying to keep short lines :D Will fix right now!

// Finally, save the row; also pass the primary key we calculated so that
// the function doesn't have to compute it again
status = saveRow(row, primary_key);

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

If we fold the primary key check logic in here we can combine the error checking here and on line 152.

This comment has been minimized.

@alessandrogario
std::vector<std::string> extension_table_list;
std::mutex extension_table_list_mutex;
bool isExtensionTable(const std::string& table_name) {

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

Do we need a new array to keep track of this or can we use the registry?

This comment has been minimized.

@alessandrogario

alessandrogario Feb 5, 2018

Contributor

I tried to use the registry many times, but it doesn't look like it can be used reliably while it is initializing the tables

@@ -178,13 +488,20 @@ int xCreate(sqlite3* db,
return SQLITE_ERROR;
}
// Tables implemented from extensions can be made read/write if they overwrite

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

S/overwrite/implement

struct sqlite3_module* getVirtualTableModule(const std::string& table_name,
bool extension) {
static std::unordered_map<std::string, struct sqlite3_module> modules;

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

I don't like this static definition, I think it makes the code messier.

cc @muffins Thoughts on this?

This comment has been minimized.

@alessandrogario

alessandrogario Feb 5, 2018

Contributor

This is done to actually avoid making the code messier (because we are reducing the scope as much as possible since it is not needed outside the dedicated interface function). This is also thread safe (as long as we compile with C++11 or better).

I can move this outside and remove the static if you prefer

This comment has been minimized.

@obelisk

obelisk Feb 5, 2018

Contributor

Makes sense :)

This comment has been minimized.

@uptycs-nishant

uptycs-nishant Feb 5, 2018

Contributor

Hi @alessandrogario I believe initialization 'static std::unordered_map<std::string, struct sqlite3_module> modules;' is thread safe but 'modules[table_name] = {};' is not thead safe
I.e. initialization of a function local static object is thread safe not the operations on them.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 7, 2018

Contributor

Yes! I used the wrong terms (I meant that this construct is useful and use for singletons for example); initialization is performed on a single thread (you can see the original code doesn't protect access to this variable)

This comment has been minimized.

@uptycs-nishant

uptycs-nishant Feb 9, 2018

Contributor

In the present code usage of variable 'module' is thread safe since -

  1. It is initialized as static variable.
  2. After initialization it is being used as read-only.
    While the code you are proposing 'modules' variable is not read-only.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 15, 2018

Contributor

You are right it is not accessed in read only, but I was under the assumption that initialization was single threaded.

EDIT: Taking a look at it again, and it does seem that everything is performed in a unique call from a single thread, so it doesn't matter whether you lock it or not. I will move the function call under the attachLock call just to be sure. What do you think?

Thanks!

This comment has been minimized.

@uptycs-nishant

uptycs-nishant Feb 19, 2018

Contributor

"it does seem that everything is performed in a unique call from a single thread, so it doesn't matter whether you lock it or not."
I observed that the code is written with keeping thread safety in mind. Until now access to module object was being done in thread safe manner since it is being accessed as read-only. At present only single thread of control is accessing this code but it may not be the case in future.

I believe attachlock will not be enough since it is synchronizing access to instance but modules object can be accessed across the instances. Please correct me if I am wrong.

This comment has been minimized.

@alessandrogario

alessandrogario Feb 23, 2018

Contributor

You may be right! I've ended up reverting the last commit, and adding a mutex. Thanks!

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Feb 5, 2018

@alessandrogario has updated the pull request. View: changes

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Feb 5, 2018

👎 The commit 312e938 (Job results: 3358) failed one or more tests (Windows).

@alessandrogario

This comment has been minimized.

Contributor

alessandrogario commented Feb 5, 2018

I'm not really sure why osqueryer commented that the test has failed; it is passing on my virtual machine, and it is not shown as failing in the check list at the bottom of the page!

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 3, 2018

👎 The commit 45b582e (Job results: 3692) failed one or more tests (Windows).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 3, 2018

👎 The commit 45b582e (Job results: 6798) failed one or more tests (Linux).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 3, 2018

👎 The commit 45b582e (Job results: 3693) failed one or more tests (Windows).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 3, 2018

👎 The commit 45b582e (Job results: 6799) failed one or more tests (Linux).

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Apr 3, 2018

@alessandrogario has updated the pull request.

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 3, 2018

👎 The commit 967a99c (Job results: 6800) failed one or more tests (Linux).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 3, 2018

👎 The commit 967a99c (Job results: 6801) failed one or more tests (Linux).

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Apr 4, 2018

@alessandrogario has updated the pull request.

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 4, 2018

👎 The commit 48f5e7f (Job results: 1267) failed one or more tests (FreeBSD).

@osqueryer

This comment has been minimized.

Collaborator

osqueryer commented Apr 4, 2018

👎 The commit 48f5e7f (Job results: 1268) failed one or more tests (FreeBSD).

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented May 1, 2018

@alessandrogario has updated the pull request. View: changes

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented May 16, 2018

@alessandrogario has updated the pull request.

@theopolis

Ok, I left myself some pointers for a larger review later.

I think this approach works. I want to be extra-sure about the keys (their names) used in serialization.

@@ -192,12 +192,16 @@ Status SQLiteSQLPlugin::attach(const std::string& name) {
return status;
}
auto statement = columnDefinition(response);
bool is_extension = true;
auto statement = columnDefinition(response, false, is_extension);

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

Confirm that this attach method is only called for external tables.

@@ -702,14 +1094,16 @@ void attachVirtualTables(const SQLiteDBInstanceRef& instance) {
}
PluginResponse response;
bool is_extension = false;

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

TODO(Ted): Double check this.

// Tables implemented by extension can be made read/write; make sure to always
// keep the rowid column; this makes them easier to manage, and also removes
// any limit on the primary key count
if (is_extension) {

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

Let's see if we can fix the INDEX=True, so we wont need this bool.

std::mutex extension_table_list_mutex;
bool isExtensionTable(const std::string& table_name) {
std::lock_guard<std::mutex> locker(extension_table_list_mutex);

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

You can use ReadLock lock(...);

struct sqlite3_module* getVirtualTableModule(const std::string& table_name,
bool extension) {
std::lock_guard<std::mutex> lock(sqlite_module_map_mutex);

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

nb: ReadLock lock...;

std::string table_name = pVtab->content->name;
if (FLAGS_table_delay > 0 && pVtab->instance->tableCalled(content)) {

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

You can delete this.

// Apply an optional sleep between table calls.
sleepFor(FLAGS_table_delay);
}
pVtab->instance->addAffectedTable(content);

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

You can delete this.

// The SQLite instance communicates to the TablePlugin via the context.
QueryContext context(content);
context.useCache(pVtab->instance->useCache());

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

You can delete this.

return serializerError;
}
plugin_request.insert({"json_value_array", json_value_array});

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

Investigate the use of this map key, why json_value_array?

return serializerError;
}
plugin_request.insert({"json_value_array", json_value_array});

This comment has been minimized.

@theopolis

theopolis Jun 1, 2018

Contributor

Again

/// Callback for DELETE statements
virtual QueryData delete_(QueryContext& context,
const PluginRequest& request) {
static_cast<void>(context);

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

May I ask you to use in such cases boost::ignore_unused as more explicit way to switch off this warning. ref

"list",
"semi",
"csv",
"pretty",

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

May I ask you to submit formatting changes in separate PR?

This comment has been minimized.

@alessandrogario

alessandrogario Jul 12, 2018

Contributor

This sometimes happens when we update the clang-format executable from the brew repository and there are changes on how code is formatted by the tool. If I remove this, then the make audit step will fail (and the CI workers will fail too).

These lines will disappear once we rebase on master after the correctly formatted code is committed.

This comment has been minimized.

@akindyakov

akindyakov Jul 13, 2018

Contributor

As far as I know make audit checks only changed lines. If some particular code is not satisfy clang-format and is not changed the make audit will not complain about it.
For instance:

$ git checkout -b test
$ make audit

And disregarding to the wrong formatting in shell.cpp audit is not complain about it. But if you change one letter in this string - it will start complain.

There is a instrument to avoid complains of audit test in changed code: make format_master. It get a diff between local master and feature brach and fix up format only in changed lines.

This comment has been minimized.

@alessandrogario

alessandrogario Jul 13, 2018

Contributor

That's really cool! Has this been introduced recently? I had several PR fail due to this in the recent past!

@@ -1083,7 +1088,7 @@ inline void meta_schema(int nArg, char** azArg) {
fprintf(stdout,
"CREATE TABLE %s%s;\n",
table.c_str(),
osquery::columnDefinition(response, true).c_str());
osquery::columnDefinition(response, true, false).c_str());

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

I'd like to suggest you make such calls more explicit in terms of passing arguments. Smth like this:

auto const aliases = true;
auto const is_extension = false;
osquery::columnDefinition(response, aliases, is_extension).c_str());

What do you think?

return (std::find(extension_table_list.begin(),
extension_table_list.end(),
table_name) != extension_table_list.end());
}

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor
  1. There are 3 things that are used only together, would you like to make a make a class for them?
  2. There is no usage of it outside of cpp file. Would you like to put it under anonymous namespace?
  3. As far as order of elements in extension_table_list is not important (I didn't find it, may be I'm wrong) I'd suggest to use unordered_set instead of it.
bool getColumnValue(std::string& value,
int index,
int argc,

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

index and argc are used for indexing, would you like to define them as std::size_t? To avoid a problems with negative values.

This comment has been minimized.

@alessandrogario
auto buffer_size = sqlite3_value_bytes(sqlite_value);
value.resize(buffer_size);
std::memcpy(&value[0], data_pointer, buffer_size);

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

You could do the same operation without using low level function memcpy.
Smth like this:

value.assign(data_pointer, buffer_size);
}
case SQLITE_NULL: {
break;

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

Should we put in the value something like "0" or similar in this case?

This comment has been minimized.

@alessandrogario

alessandrogario Jul 12, 2018

Contributor

The value variable is cleared before we enter the switch, so in case of a NULL we get an empty string. This is how NULL values are generally handled by tables

// Type-aware serializer to pass parameters between osquery and the extension
int serializeUpdateParameters(std::string& json_value_array,
int argc,

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

Would you like to consider of using std::size_t for argc variable?
And also define it as const?

This comment has been minimized.

@alessandrogario
int argc,
sqlite3_value** argv,
const TableColumns& columnDescriptors) {
if (columnDescriptors.size() != static_cast<size_t>(argc - 2)) {

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

It could be problematic to substitute something from number and cast it to unsigned type. Is here any possibility for argc to be less than 2?

This comment has been minimized.

@alessandrogario

alessandrogario Jul 12, 2018

Contributor

All the argc parameters are defined as int to match the type used by sqlite3 to define that variable.

If we switch the type to something else, we will have to cast it to size_t when we receive it and back to int when we return it back to the library (would you prefer this approach?).

Parameter serialization only happens when an UPDATE or INSERT statement has been received.

This is a short explanation of how it works for this case, but the documentation goes in more detail:

Update queries

We are updating an existing row; the ID for that row is inside argv[0]

argv[1] == NULL

The table code is expected to keep using the ID associated with the row being modified. This ID can be found in argv[0].

argv[1] == INTEGER

The table handler is expected to update the row, invalidate the existing row ID (argv[0]) and use the new one found in argv[1].

Insert queries

In this case, the row is new so there is no existing row ID. This means that argv[0] will be set to SQLITE_NULL.

argv[1] == NULL

The table is expected to come up with a new row ID and return it back to sqlite.

argv[1] == INTEGER

sqlite requires the table logic to use the supplied row ID when saving the row.

This comment has been minimized.

@akindyakov

akindyakov Jul 13, 2018

Contributor

Thank you for the explanation. I see your point. But even so I'd like to suggest you to use a std::size_t here.

  1. int in sqlite3 interface is just a part of interface. And is good to see this conversion from std::size_t to int only when we use the sqlite library.
  2. Mixing up a signed and unsigned types is bad idea. It could be cause for many bugs and even UB, which is pure evil. Look at the example:
#include <iostream>
int main() {
  const auto sv = -2;
  const auto usv = 1u;
  auto res = sv + usv;
  std::cout << res << '\n';
  return 0;
}

What is the type of res and what is the answer? This is potential UB.
3. You right for high load pieces of code static conversion from one type to another could be have a notable performance penalty. But we can not solve this problem by using a int type here. Because there still will be a lot of conversions. And, that is the worst, they will be implicit. Which is hard to find, to control, test and debug.
4. And semantics, std::size_t is a special type dedicated to represent indexes, address offsets and so on. If you see that type you will know - that variable is for the size. int is too common, variable of this type could be anything.

break;
}
default: { return SQLITE_MISMATCH; }

This comment has been minimized.

@akindyakov

akindyakov Jul 12, 2018

Contributor

Should we write something to LOG(ERROR) here?

@muffins

This comment has been minimized.

Contributor

muffins commented Jul 15, 2018

@alessandrogario looks like @theopolis and @akindyakov had a few more things to fix, and this needs a rebase, but we should be getting close on this yeah?

@muffins muffins added this to the 3.3.0 milestone Jul 15, 2018

@alessandrogario

This comment has been minimized.

Contributor

alessandrogario commented Jul 16, 2018

Hey @muffins!

I've rebased and fixed the issues that have been mentioned; the remaining comments are personal notes/bookmarks that Ted has left for his review during QueryCon. I don't think he is requesting for any change, but correct me if I'm wrong!

Dismissing as the review is stale.

@muffins

LGTM. Let's use smaller follow-up PRs for any additionally changes folks would like to see, but for now I think we're ok bringing this in and iterating.

@muffins muffins merged commit 8fe570b into facebook:master Jul 17, 2018

5 checks passed

Code Audit Build finished.
Details
FreeBSD Build finished.
Details
Linux Build finished.
Details
Windows Build finished.
Details
macOS/OS X Build finished.
Details

@alessandrogario alessandrogario deleted the trailofbits:alessandro/feature/xupdate-support branch Jul 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment