Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update augeas table to use native pattern matching (BREAKING) #6982

Merged
merged 5 commits into from Jul 25, 2021

Conversation

directionless
Copy link
Member

@directionless directionless commented Mar 1, 2021

Add querying by LIKE to the augeas table. This fixes querying the
/augeas part of the tree, and should additionally improve the behavior
around file paths.

Prior, there is implicit magic that makes the path part work -- we
append //* the augeas recursive wildcard. This combined with sqlite
filtering made it appear to work for simple trailing wildcards. But, it
would fail for leading or infix wildcards.

This replaces that with some additional routines to allow wildcard
conversion. The UX is a bit weird, in that path should not have a
trailing wildcard, and the behavior of % vs %% is different. But
this feels like the simplest bridge between osquery and augeas.

The current problem is mostly clearly visible with select * from augeas where node like "/augeas/load/%"; returning zero results.

We could consider whether to keep using /files//* as the default
pattern or not.

@directionless directionless requested review from a team as code owners March 1, 2021 18:43
Copy link
Member

@theopolis theopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to use normal std::string instead of std::ostringstream. The rest of the suggestions are up to you.

osquery/tables/system/posix/augeas.cpp Outdated Show resolved Hide resolved
osquery/tables/system/posix/augeas.cpp Outdated Show resolved Hide resolved
osquery/tables/system/posix/augeas.cpp Outdated Show resolved Hide resolved
osquery/tables/system/posix/augeas.cpp Outdated Show resolved Hide resolved
SQL("select * from augeas where path LIKE '/etc/hosts/%'").rows().size(),
0U);
ASSERT_EQ(
SQL("select * from augeas where path LIKE '/etc/hosts%'").rows().size(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd that this returns 0 results. It sounds like this is an augeas quirk vs. a confound the SQL abstraction adds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. AFAICT augeas doesn't do partial matches on a leaf's name:

augtool> match /files/etc/shells/1
/files/etc/shells/1 = /bin/bash

augtool> match /files/etc/shells/1*
  (no matches)

augtool> match /files/etc/shells/*
/files/etc/shells/#comment[1] = List of acceptable shells for chpass(1).
/files/etc/shells/#comment[2] = Ftpd will not allow users to connect who are not using
/files/etc/shells/#comment[3] = one of these shells.
/files/etc/shells/1 = /bin/bash
/files/etc/shells/2 = /bin/csh
/files/etc/shells/3 = /bin/ksh
/files/etc/shells/4 = /bin/sh
/files/etc/shells/5 = /bin/tcsh
/files/etc/shells/6 = /bin/zsh

Thinking aloud... I went down this path because I couldn't query LIKE on the /augeas hierarchy. This is because the current implementation has complexity to pass EQUALS around. But when there are no EQUALS, it defaults to matching off /files//* and letting sqlite filter. Which obviously omits the /augeas// tree.

This approach extends the prior logic parsing the sql expressions to include LIKE. And barring a lot more complexity in the translation, I think this means it's subject to the underlying augtool limitations.

I think there are other possible implementations.

We could change the default match to //*, which would dump the entire augeas tree back to sqlite for filtering. This feels simple. As I think the entire filespace is loaded into augeas on initialization, I don't think there'd be a significant performance impact. It doesn't feel right, but I'm not sure why.

We could change the general handling to not auto load files unless explicitly requested, then return the full space and let sqlite filter. This is appealing to me, but I don't see how to do it performantly.

Though, one advantage to the current approach is that you can use the % vs %% difference to explore a path. For example:

osquery> select node from augeas where node like '/augeas/%';
+-------------------+
| node              |
+-------------------+
| /augeas/root      |
| /augeas/context   |
| /augeas/variables |
| /augeas/version   |
| /augeas/save      |
| /augeas/span      |
| /augeas/load      |
| /augeas/files     |
+-------------------+
osquery> select node from augeas where node like '/augeas/files/%';
+---------------------+
| node                |
+---------------------+
| /augeas/files/Users |
| /augeas/files/etc   |
+---------------------+

Copy link
Member

@theopolis theopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually this test is failing

22: [ RUN      ] AugeasTests.select_hosts_by_path_expression
22: /__w/osquery/osquery/workspace/padding-required-by-rpm-packages/src/osquery/tables/system/tests/posix/augeas_tests.cpp:40: Failure
22: Expected equality of these values:
22:   results.rows().size()
22:     Which is: 0
22:   1U
22:     Which is: 1
22: [  FAILED  ] AugeasTests.select_hosts_by_path_expression (522 ms)

@theopolis
Copy link
Member

Do you think we need any additional documentation about how to use the table?

@directionless
Copy link
Member Author

I need to stare at the tests. I only found them last night, and haven't understood what's happening yet.

Docs are also a good callout.

Add querying by LIKE to the augeas table. This fixes querying the
`/augeas` part of the tree, and should additionally improve the behavior
around file paths.

Prior, there is implicit magic that makes the path part work -- we
append `//*` the augeas recursive wildcard. This combined with sqlite
filtering made it appear to work for simple trailing wildcards. But, it
would fail for leading or infix wildcards.

This replaces that with some additional routines to allow wildcard
conversion. The UX is a bit weird, in that `path` should _not_ have a
trailing wildcard, and the behavior of `%` vs `%%` is different. But
this feels like the simplest bridge between osquery and augeas.

This is mostly clearly visible with `select * from augeas where node like "/augeas/load/%";` returning zero results.

We could consider whether to keep using `/files//*` as the default
pattern or not.
@directionless
Copy link
Member Author

I found the test issue....

So Augeas looks at files and their contents, right? So you could request either /etc/hosts OR /etc/hosts/*. Previously we did weird things, so if you said where path = '/etc/hosts you'd get both.

I'm not sure the prior behavior was correct, but I'm also not sure it it was wrong. So, I ported it.

Now when you select where path = '/etc/hosts, this is translated into two search patterns /files/etc/hosts and /file/etc/hosts/*

I also concluded there wasn't really a simple place to document how augeas works in our project :\

@theopolis
Copy link
Member

This seems reasonable but we should mark it as an API change due to change with queries like select * from augeas where path LIKE '/etc/hosts%';, where before this would full-scan and have SQL apply the LIKE filtering.

@directionless directionless changed the title Augeas Table: Fix like queries and wildcard behavior Update augeas table to use native pattern matching Jul 25, 2021
@directionless directionless changed the title Update augeas table to use native pattern matching Update augeas table to use native pattern matching (BREAKING) Jul 25, 2021
@directionless directionless merged commit 5b5942f into osquery:master Jul 25, 2021
@directionless directionless deleted the seph/augeas-fix-queries branch July 25, 2021 22:22
sharvilshah pushed a commit to sharvilshah/osquery that referenced this pull request Aug 3, 2021
…ery#6982)

Rework the query parameters on the `augeas` table. Supports querying against `node` (The augeas column), and the osquery mediated `path` column. Patterns are passed to the underlying augeas library, and follow their (somewhat surprising) wildcard semantics.
@directionless
Copy link
Member Author

Some comments for a future doc update...

Some interesting queries to compare, that show the confusion nature of the wildcards, and the mapping between path and node

select * from augeas where path LIKE '/etc/%';
select * from augeas where path LIKE '/etc/%%';

select * from augeas where path LIKE '/etc/hosts%%';
select * from augeas where path LIKE '/etc/hosts%';
select * from augeas where path LIKE '/etc/hosts/%%';
Query is converted to augeas matches sql matches
/etc/% /files/etc/* YES -- single level wildcard YES
/etc/%% /files/etc//* YES -- recursive wildcard YES
/etc/hosts%% /files/etc/hosts/* YES -- recursive wildcard YES
/etc/hosts% /files/etc/hosts* NO -- not how matches work N/A
/etc/hosts/%% /files/etc/hosts//* NO -- // is invalid N/A (but also no, trailing slash won't match)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants