-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new ADS table #8190
base: master
Are you sure you want to change the base?
Add new ADS table #8190
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is a Windows "feature" I did not know about previously 😅
// Folders can have ADS streams too | ||
if (!(boost::filesystem::is_regular_file(path, ec) || | ||
boost::filesystem::is_directory(path, ec))) { | ||
continue; | ||
} | ||
enumerateStreams(results, path.string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this miss directories that are included in the directory
constraint?
|
||
if (hFind != INVALID_HANDLE_VALUE) { | ||
do { | ||
// Skip first stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? As a reader previously unfamiliar with ADS, I don't understand why the first stream is not relevant. Is it because the first stream is the standard file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, per FindFirstStreamW docs:
The FindFirstStreamW function opens a search handle and returns information about the first $DATA stream in the specified file or directory. For files, this is always the default, unnamed data stream, "::$DATA". Directories do not have $DATA streams by default and cannot have an unnamed data stream, but may have named data streams set after they have been created.
The unnamed data stream is the file contents so I thought it's best to skip that one. Although I have just noticed that by the way the table works now it will skip the first stream of a directory when it shouldn't do it since they can't have unnamed data streams.
I will address this alongside the other comments.
const std::string& value) { | ||
Row r; | ||
r["path"] = path; | ||
r["directory"] = boost::filesystem::path(path).parent_path().string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of a directory would this erroneously return the parent directory? I'm thinking in that case path
and directory
should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if a directory is specified the table will return something like:
> select * from ads where path = 'C:\Users\ignacior\Downloads\subdir';
+------------------------------------+-----------------------------+----------+-----------------+--------+
| path | directory | key | value | base64 |
+------------------------------------+-----------------------------+----------+-----------------+--------+
| C:\Users\ignacior\Downloads\subdir | C:\Users\ignacior\Downloads | hide.txt | secret | 0 |
+------------------------------------+-----------------------------+----------+-----------------+--------+
Which is a similar behaviour to the extended_attributes
table:
Using a virtual database. Need help, type '.help'
osquery> select version from osquery_info;
+---------+
| version |
+---------+
| 5.4.0 |
+---------+
osquery> select * from extended_attributes where path = '/Users/ignacior/Downloads';
+---------------------------+-----------------+----------------+--------------------------------------+--------+
| path | directory | key | value | base64 |
+---------------------------+-----------------+----------------+--------------------------------------+--------+
| /Users/ignacior/Downloads | /Users/ignacior | com.apple.macl | BAAoRInjLHdMmrS8U0MEzCBvBADshbS1+VFBm| 1 |
+---------------------------+-----------------+----------------+--------------------------------------+--------+
I'm happy to change it though if it's better to keep path
and directory
the same.
@zwass I applied some of your suggested changes. I will take a look at having the tests create a file later this week. |
} | ||
} | ||
|
||
QueryData genAds(QueryContext& context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unsure of how the path
and directory
expansion logic work.
I think the intent is when one can query a directory, and get back all the files inside. (Similar to the file
table). Or one can query a specific path. (And both probably support LIKE
). This makes sense to me.
But the implementation enumerates the the path
first, and then the directory
. And I think that will result in equal work being done. Sometimes? I guess it's confusing.
If both are part of the query predicate, sqlite will filter to only return rows that match both. Eg select * from ads where directory = '/tmp/' AND path = '/var/tmp/foo'
would result first in enumerating /var/tmp/foo
, then all the files in /tmp
, and would ultimately return nothing, because sqlite filtered them.
Though, to correct myself, that's not at all true there's an OR
in there.
I guess I'd suggest a reasonable pattern is to generate the list of things to enumerate, and then find the union of them. This would, at least, prevent duplicate enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is a copy paste from the extended_attributes table since it was from where I based by work form. I agree is a bit confusing 😅
I can take a look into making it more efficient.
Column("path", TEXT, "Absolute file path", required=True, index=True), | ||
Column("directory", TEXT, "Directory of file(s)", required=True), | ||
Column("key", TEXT, "Name of the value generated from the stream"), | ||
Column("value", TEXT, "The parsed information from the attribute"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of data shows up here? Some internet things talk about malware smuggling entire file contents here. Will that be okay to push back in a column? (We don't generally push that much data through osquery, so it feels a little amiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree. I think the main value of this table is the content of the Zone.Identifier
stream which can help during investigations to identify where the file was downloaded from.
In my PR description I left an open question about how to handle cases where a stream contains an entire file. Maybe we can set a hard limit on the length of the content and warn users if the content is too large to be displayed by osquery?
Happy to hear other thoughts.
I'm also not sure how the extended_attributes
table handles this type of cases in nix systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure how the
extended_attributes
table handles this type of cases in nix systems.
The possible max value size is much more limited: https://en.wikipedia.org/wiki/Extended_file_attributes
The Linux kernel allows extended attribute to have names of up to 255 bytes and values of up to 64 KiB, as do XFS and ReiserFS, but ext2/3/4 and btrfs impose much smaller limits, requiring all the attributes (names and values) of one file to fit in one "filesystem block" (usually 4 KiB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably depends on what we're concerned about.
If it's content, the most conservative approach would be to only fetch the value
for keys on a compiled allowlist. (pushing people to use carves if they want the content)
If it's size, truncation and warnings probably make sense.
This PR adds a new table called
ads
to query Alternate Data Streams (ADS) in Windows. Resolves #5250The table uses FindFirstStreamW and FindNextStreamW to enumerate the streams in a file and read the content.
If the stream name is
Zone.Identifier
it parses the values in a separate function.I tried to keep the logic similar to the
extended_attributes
table and how it handles thekMDItemWhereFroms
andquarantine
attributes.Screenshots
Additional resources
Practical Guide to Alternative Data Streams in NTFS
Alternate Data Streams Documentation
Highway To The Danger Zone.Identifier
About URL Security Zones
Open question
NTFS ADS also support executable files, for example we can hide the notepad executable in
test.txt
.As of now the table will base64 encode the binary but would love some guidance on what would be the best option for this cases.