Skip to content

Commit

Permalink
fix existential issue with SCM aware queries
Browse files Browse the repository at this point in the history
Summary:
This was a bit of a thorny issue to run down.  What was happening was
this: when source control aware mode is enabled we subvert the normal recency
index and use the `pathGenerator` rather than the `timeGenerator`.

We pass the list of files that we got back from the source control system to
the path generator and let it evaluate the query based on information that
we've observed in the InMemoryView.

If watchman has never observed some components of the paths returned by the
source control query, such as in the case where watchman was restarted after
checking out the revision, and we're asked about changes since an earlier hash,
the `pathGenerator` will not be able to furnish information about the paths
returned by source control.   This means that we are potentially inaccurate for
files that exist on the filesystem (although we will likely soon observe the
current state of those files), but more critically, we cannot return
information about files that were deleted in between the commits in the query
range.

The resolution is not to use the InMemoryView as the source of information
about these files.

This diff adds a LocalFileResult for just this purpose; this is an
implementation of the FileResult class that knows how to stat the local
filesystem as needed to furnish the metadata that may be needed by the query.
I'm not thrilled that it needs to stat the files, but this the simplest way to
obtain correct results and we can figure out if we need to do better than this
in practice as a follow up.

I've also broken out the guts of `ContentHashCache::computeHashImmediate`
as a static function so that we can call it from `LocalFileResult`.
Ideally we'd find a way to plumb in the cache that is held in the
`InMemoryView`, but we're operating at a different layer and cannot
guarantee that it is present; it isn't present for Eden and we still
need to solve the hash computation issue for Eden.  Those are out of
scope wrt. resolving this issue in the short term.

Reviewed By: simpkins

Differential Revision: D7142088

fbshipit-source-id: 35b7d7745e30adabb8b9a7b3713d6ba3500b3552
  • Loading branch information
wez authored and facebook-github-bot committed Mar 6, 2018
1 parent f6cbdc4 commit 72950c2
Show file tree
Hide file tree
Showing 8 changed files with 175 additions and 20 deletions.
13 changes: 9 additions & 4 deletions ContentHash.cpp
Expand Up @@ -46,13 +46,11 @@ Future<std::shared_ptr<const Node>> ContentHashCache::get(
key, [this](const ContentHashCacheKey& k) { return computeHash(k); });
}

HashValue ContentHashCache::computeHashImmediate(
const ContentHashCacheKey& key) const {
HashValue ContentHashCache::computeHashImmediate(const char* fullPath) {
HashValue result;
uint8_t buf[8192];

auto fullPath = w_string::pathCat({rootPath_, key.relativePath});
auto stm = w_stm_open(fullPath.c_str(), O_RDONLY);
auto stm = w_stm_open(fullPath, O_RDONLY);
if (!stm) {
throw std::system_error(
errno,
Expand Down Expand Up @@ -130,6 +128,13 @@ HashValue ContentHashCache::computeHashImmediate(
GetLastError(), std::system_category(), "CryptGetHashParam HP_HASHVAL");
}
#endif
return result;
}

HashValue ContentHashCache::computeHashImmediate(
const ContentHashCacheKey& key) const {
auto fullPath = w_string::pathCat({rootPath_, key.relativePath});
auto result = computeHashImmediate(fullPath.c_str());

// Since TOCTOU is everywhere and everything, double check to make sure that
// the file looks like we were expecting at the start. If it isn't, then
Expand Down
5 changes: 5 additions & 0 deletions ContentHash.h
Expand Up @@ -56,6 +56,11 @@ class ContentHashCache {
// Throws exceptions for any errors that may occur.
HashValue computeHashImmediate(const ContentHashCacheKey& key) const;

// Compute the hash value for a given input.
// This will block the calling thread while the I/O is performed.
// Throws exceptions for any errors that may occur.
static HashValue computeHashImmediate(const char* fullPath);

// Compute the hash value for a given input via the thread pool.
// Returns a future to operate on the result of this async operation
Future<HashValue> computeHash(const ContentHashCacheKey& key) const;
Expand Down
65 changes: 65 additions & 0 deletions LocalFileResult.cpp
@@ -0,0 +1,65 @@
#include "LocalFileResult.h"
#include "ContentHash.h"
#include "watchman_error_category.h"
namespace watchman {

LocalFileResult::LocalFileResult(
const std::shared_ptr<w_root_t>& root,
w_string_piece path,
w_clock_t clock)
: root_(root), path_(path.asWString()), clock_(clock) {}

void LocalFileResult::getInfo() const {
if (!needInfo_) {
return;
}
needInfo_ = false;
try {
info_ = getFileInformation(path_.c_str(), root_->case_sensitive);
exists_ = true;
} catch (const std::exception&) {
// Treat any error as effectively deleted
exists_ = false;
}
}

const watchman::FileInformation& LocalFileResult::stat() const {
getInfo();
return info_;
}

w_string_piece LocalFileResult::baseName() const {
return w_string_piece(path_).baseName();
}

w_string_piece LocalFileResult::dirName() {
return w_string_piece(path_).dirName();
}

bool LocalFileResult::exists() const {
getInfo();
return exists_;
}

watchman::Future<w_string> LocalFileResult::readLink() const {
auto absPath = w_string::pathCat({root_->root_path, path_});
return makeFuture(readSymbolicLink(absPath.c_str()));
}

const w_clock_t& LocalFileResult::ctime() const {
return clock_;
}

const w_clock_t& LocalFileResult::otime() const {
return clock_;
}

watchman::Future<FileResult::ContentHash> LocalFileResult::getContentSha1() {
// TODO: find a way to reference a ContentHashCache instance
// that will work with !InMemoryView based views.
return makeFuture(makeResultWith([&] {
auto absPath = w_string::pathCat({root_->root_path, path_});
return ContentHashCache::computeHashImmediate(absPath.c_str());
}));
}
} // namespace watchman
54 changes: 54 additions & 0 deletions LocalFileResult.h
@@ -0,0 +1,54 @@
#pragma once
#include "thirdparty/jansson/jansson.h"
#include "watchman.h"
#include "watchman_query.h"
#include "watchman_string.h"

namespace watchman {

/** A FileResult that exists on the local filesystem.
* This differs from InMemoryFileResult in that we don't maintain any
* long-lived persistent information about the file and that the methods of
* this instance will query the local filesystem to discover the information as
* it is accessed.
* We do cache the results of any filesystem operations that we may perform
* for the duration of the lifetime of a given LocalFileResult, but that
* information is not shared beyond that lifetime.
* FileResult objects are typically extremely short lived, existing between
* the point in time at which a file is matched by a query and the time
* at which the file is rendered into the results of the query.
*/
class LocalFileResult : public FileResult {
public:
LocalFileResult(
const std::shared_ptr<w_root_t>& root_,
w_string_piece path,
w_clock_t clock);
const watchman::FileInformation& stat() const override;
// Returns the name of the file in its containing dir
w_string_piece baseName() const override;
// Returns the name of the containing dir relative to the
// VFS root
w_string_piece dirName() override;
// Returns true if the file currently exists
bool exists() const override;
// Returns the symlink target
watchman::Future<w_string> readLink() const override;

const w_clock_t& ctime() const override;
const w_clock_t& otime() const override;

// Returns the SHA-1 hash of the file contents
watchman::Future<FileResult::ContentHash> getContentSha1() override;

private:
void getInfo() const;
mutable bool needInfo_{true};
mutable bool exists_{true};
mutable FileInformation info_;
std::shared_ptr<w_root_t> root_;
w_string path_;
w_clock_t clock_;
};

} // namespace watchman
1 change: 1 addition & 0 deletions Makefile.am
Expand Up @@ -18,6 +18,7 @@ watchman_SOURCES = \
CookieSync.cpp \
FileDescriptor.cpp \
FileInformation.cpp \
LocalFileResult.cpp \
InMemoryView.cpp \
Pipe.cpp \
PubSub.cpp \
Expand Down
33 changes: 18 additions & 15 deletions query/eval.cpp
Expand Up @@ -2,7 +2,11 @@
* Licensed under the Apache License, Version 2.0 */

#include "watchman.h"

#include "LocalFileResult.h"
#include "watchman_scopeguard.h"

using namespace watchman;
using watchman::Result;
using watchman::Future;
using watchman::collectAll;
Expand Down Expand Up @@ -283,22 +287,21 @@ w_query_res w_query_execute(
json_array_append_new(pathList, w_string_to_json(f));
}

// Re-cast this as a path-generator query
auto altQuerySpec = json_copy(query->query_spec);

altQuerySpec.object().erase("since");
altQuerySpec.set("path", std::move(pathList));

// And switch us over to run the rest of the query on this one
altQuery = w_query_parse(root, altQuerySpec);
query = altQuery.get();
disableFreshInstance = true;
// We may have been called with a custom generator; we don't need to use
// that for this case, so make sure that we use the default generator
// so that it will actually execute using the pathGenerator.
generator = [](w_query* q,
const std::shared_ptr<w_root_t>& r,
struct w_query_ctx* c) { r->view()->pathGenerator(q, c); };
generator = [pathList](
w_query* q,
const std::shared_ptr<w_root_t>& r,
struct w_query_ctx* c) {
auto spec = r->view()->getMostRecentRootNumberAndTickValue();
w_clock_t clock{};
clock.ticks = spec.ticks;
time(&clock.timestamp);
for (auto& pathEntry : pathList.array()) {
auto path = json_to_w_string(pathEntry);
w_query_process_file(
q, c, watchman::make_unique<LocalFileResult>(r, path, clock));
}
};
}
}

Expand Down
23 changes: 22 additions & 1 deletion tests/integration/test_scm.py
Expand Up @@ -16,7 +16,7 @@
if pywatchman.compat.PYTHON3:
STRING_TYPES = (str, bytes)
else:
STRING_TYPES = (str, unicode)
STRING_TYPES = (str, unicode) # noqa: F821


@WatchmanTestCase.expand_matrix
Expand Down Expand Up @@ -318,6 +318,27 @@ def test_scmHg(self):
self.assertEqual(dat[-1]['clock']['scm']['mergebase'], mergeBaseInitial)
self.assertFileListsEqual(self.getConsolidatedFileList(dat), [])

# Ensure that we reported deleted files correctly, even
# if we've never seen the files before. To do this, we're
# going to cancel the watch and restart it, so this is broken
# out separately from the earlier tests against feature3
self.hg(['co', '-C', 'feature3'], cwd=root)
self.watchmanCommand('watch-del', root)
self.watchmanCommand('watch', root)
res = self.watchmanCommand('query', root, {
'expression': ['not',
['anyof',
['name', '.hg'],
['match', 'hg-check*'],
['dirname', '.hg']]],
'fields': ['name'],
'since': {
'scm': {
'mergebase': '',
'mergebase-with': 'TheMaster'}}})
self.assertFileListsEqual(res['files'], ['car'])


def getConsolidatedFileList(self, dat):
fset = set()
for _ in dat:
Expand Down
1 change: 1 addition & 0 deletions winbuild/Makefile
Expand Up @@ -32,6 +32,7 @@ SRCS_CPP=\
CookieSync.cpp \
FileDescriptor.cpp \
FileInformation.cpp \
LocalFileResult.cpp \
InMemoryView.cpp \
Pipe.cpp \
PubSub.cpp \
Expand Down

0 comments on commit 72950c2

Please sign in to comment.