Skip to content

Commit

Permalink
[analyzer] Make recognition of hardened __FOO_chk functions explicit (#…
Browse files Browse the repository at this point in the history
…86536)

In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.

Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.

Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.

This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.

This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).

The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.

After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.

This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.

After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.

This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.

As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
  • Loading branch information
NagyDonat and steakhal committed Apr 5, 2024
1 parent c5d000b commit fb299ca
Show file tree
Hide file tree
Showing 7 changed files with 217 additions and 131 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,22 @@ namespace ento {
class CallDescription {
public:
enum class Mode {
/// Match calls to functions from the C standard library. On some platforms
/// some functions may be implemented as macros that expand to calls to
/// built-in variants of the given functions, so in this mode we use some
/// heuristics to recognize these implementation-defined variants:
/// - We also accept calls where the name is derived from the specified
/// name by adding "__builtin" or similar prefixes/suffixes.
/// - We also accept calls where the number of arguments or parameters is
/// greater than the specified value.
/// Match calls to functions from the C standard library. This also
/// recognizes builtin variants whose name is derived by adding
/// "__builtin", "__inline" or similar prefixes or suffixes; but only
/// matches functions than are externally visible and are declared either
/// directly within a TU or in the namespace 'std'.
/// For the exact heuristics, see CheckerContext::isCLibraryFunction().
/// (This mode only matches functions that are declared either directly
/// within a TU or in the namespace `std`.)
CLibrary,

/// An extended version of the `CLibrary` mode that also matches the
/// hardened variants like __FOO_chk() and __builtin__FOO_chk() that take
/// additional arguments compared to the "regular" function FOO().
/// This is not the default behavior of `CLibrary` because in this case the
/// checker code must be prepared to handle the different parametrization.
/// For the exact heuristics, see CheckerContext::isHardenedVariantOf().
CLibraryMaybeHardened,

/// Matches "simple" functions that are not methods. (Static methods are
/// methods.)
SimpleFunc,
Expand Down Expand Up @@ -187,6 +190,9 @@ class CallDescription {
private:
bool matchesImpl(const FunctionDecl *Callee, size_t ArgCount,
size_t ParamCount) const;

bool matchNameOnly(const NamedDecl *ND) const;
bool matchQualifiedNameParts(const Decl *D) const;
};

/// An immutable map from CallDescriptions to arbitrary data. Provides a unified
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -366,19 +366,31 @@ class CheckerContext {
return getCalleeName(FunDecl);
}

/// Returns true if the callee is an externally-visible function in the
/// top-level namespace, such as \c malloc.
/// Returns true if the given function is an externally-visible function in
/// the top-level namespace, such as \c malloc.
///
/// If a name is provided, the function must additionally match the given
/// name.
///
/// Note that this deliberately excludes C++ library functions in the \c std
/// namespace, but will include C library functions accessed through the
/// \c std namespace. This also does not check if the function is declared
/// as 'extern "C"', or if it uses C++ name mangling.
/// Note that this also accepts functions from the \c std namespace (because
/// headers like <cstdlib> declare them there) and does not check if the
/// function is declared as 'extern "C"' or if it uses C++ name mangling.
static bool isCLibraryFunction(const FunctionDecl *FD,
StringRef Name = StringRef());

/// In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
/// functions are implemented as macros that expand to calls of hardened
/// functions that take additional arguments compared to the "usual"
/// variant and perform additional input validation. For example, a `memcpy`
/// call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
///
/// This method returns true if `FD` declares a fortified variant of the
/// standard library function `Name`.
///
/// NOTE: This method relies on heuristics; extend it if you need to handle a
/// hardened variant that's not yet covered by it.
static bool isHardenedVariantOf(const FunctionDecl *FD, StringRef Name);

/// Depending on wither the location corresponds to a macro, return
/// either the macro name or the token spelling.
///
Expand Down
76 changes: 48 additions & 28 deletions clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,34 +124,45 @@ class CStringChecker : public Checker< eval::Call,
const CallEvent &)>;

CallDescriptionMap<FnCheck> Callbacks = {
{{CDM::CLibrary, {"memcpy"}, 3},
{{CDM::CLibraryMaybeHardened, {"memcpy"}, 3},
std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Regular)},
{{CDM::CLibrary, {"wmemcpy"}, 3},
{{CDM::CLibraryMaybeHardened, {"wmemcpy"}, 3},
std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Wide)},
{{CDM::CLibrary, {"mempcpy"}, 3},
{{CDM::CLibraryMaybeHardened, {"mempcpy"}, 3},
std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Regular)},
{{CDM::Unspecified, {"wmempcpy"}, 3},
{{CDM::CLibraryMaybeHardened, {"wmempcpy"}, 3},
std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Wide)},
{{CDM::CLibrary, {"memcmp"}, 3},
std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
{{CDM::CLibrary, {"wmemcmp"}, 3},
std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Wide)},
{{CDM::CLibrary, {"memmove"}, 3},
{{CDM::CLibraryMaybeHardened, {"memmove"}, 3},
std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Regular)},
{{CDM::CLibrary, {"wmemmove"}, 3},
{{CDM::CLibraryMaybeHardened, {"wmemmove"}, 3},
std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Wide)},
{{CDM::CLibrary, {"memset"}, 3}, &CStringChecker::evalMemset},
{{CDM::CLibraryMaybeHardened, {"memset"}, 3},
&CStringChecker::evalMemset},
{{CDM::CLibrary, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
{{CDM::CLibrary, {"strcpy"}, 2}, &CStringChecker::evalStrcpy},
{{CDM::CLibrary, {"strncpy"}, 3}, &CStringChecker::evalStrncpy},
{{CDM::CLibrary, {"stpcpy"}, 2}, &CStringChecker::evalStpcpy},
{{CDM::CLibrary, {"strlcpy"}, 3}, &CStringChecker::evalStrlcpy},
{{CDM::CLibrary, {"strcat"}, 2}, &CStringChecker::evalStrcat},
{{CDM::CLibrary, {"strncat"}, 3}, &CStringChecker::evalStrncat},
{{CDM::CLibrary, {"strlcat"}, 3}, &CStringChecker::evalStrlcat},
{{CDM::CLibrary, {"strlen"}, 1}, &CStringChecker::evalstrLength},
// FIXME: C23 introduces 'memset_explicit', maybe also model that
{{CDM::CLibraryMaybeHardened, {"strcpy"}, 2},
&CStringChecker::evalStrcpy},
{{CDM::CLibraryMaybeHardened, {"strncpy"}, 3},
&CStringChecker::evalStrncpy},
{{CDM::CLibraryMaybeHardened, {"stpcpy"}, 2},
&CStringChecker::evalStpcpy},
{{CDM::CLibraryMaybeHardened, {"strlcpy"}, 3},
&CStringChecker::evalStrlcpy},
{{CDM::CLibraryMaybeHardened, {"strcat"}, 2},
&CStringChecker::evalStrcat},
{{CDM::CLibraryMaybeHardened, {"strncat"}, 3},
&CStringChecker::evalStrncat},
{{CDM::CLibraryMaybeHardened, {"strlcat"}, 3},
&CStringChecker::evalStrlcat},
{{CDM::CLibraryMaybeHardened, {"strlen"}, 1},
&CStringChecker::evalstrLength},
{{CDM::CLibrary, {"wcslen"}, 1}, &CStringChecker::evalstrLength},
{{CDM::CLibrary, {"strnlen"}, 2}, &CStringChecker::evalstrnLength},
{{CDM::CLibraryMaybeHardened, {"strnlen"}, 2},
&CStringChecker::evalstrnLength},
{{CDM::CLibrary, {"wcsnlen"}, 2}, &CStringChecker::evalstrnLength},
{{CDM::CLibrary, {"strcmp"}, 2}, &CStringChecker::evalStrcmp},
{{CDM::CLibrary, {"strncmp"}, 3}, &CStringChecker::evalStrncmp},
Expand All @@ -162,9 +173,19 @@ class CStringChecker : public Checker< eval::Call,
{{CDM::CLibrary, {"bcmp"}, 3},
std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
{{CDM::CLibrary, {"bzero"}, 2}, &CStringChecker::evalBzero},
{{CDM::CLibrary, {"explicit_bzero"}, 2}, &CStringChecker::evalBzero},
{{CDM::CLibrary, {"sprintf"}, 2}, &CStringChecker::evalSprintf},
{{CDM::CLibrary, {"snprintf"}, 2}, &CStringChecker::evalSnprintf},
{{CDM::CLibraryMaybeHardened, {"explicit_bzero"}, 2},
&CStringChecker::evalBzero},

// When recognizing calls to the following variadic functions, we accept
// any number of arguments in the call (std::nullopt = accept any
// number), but check that in the declaration there are 2 and 3
// parameters respectively. (Note that the parameter count does not
// include the "...". Calls where the number of arguments is too small
// will be discarded by the callback.)
{{CDM::CLibraryMaybeHardened, {"sprintf"}, std::nullopt, 2},
&CStringChecker::evalSprintf},
{{CDM::CLibraryMaybeHardened, {"snprintf"}, std::nullopt, 3},
&CStringChecker::evalSnprintf},
};

// These require a bit of special handling.
Expand Down Expand Up @@ -218,7 +239,7 @@ class CStringChecker : public Checker< eval::Call,
void evalSprintf(CheckerContext &C, const CallEvent &Call) const;
void evalSnprintf(CheckerContext &C, const CallEvent &Call) const;
void evalSprintfCommon(CheckerContext &C, const CallEvent &Call,
bool IsBounded, bool IsBuiltin) const;
bool IsBounded) const;

// Utility methods
std::pair<ProgramStateRef , ProgramStateRef >
Expand Down Expand Up @@ -2467,27 +2488,26 @@ void CStringChecker::evalBzero(CheckerContext &C, const CallEvent &Call) const {
void CStringChecker::evalSprintf(CheckerContext &C,
const CallEvent &Call) const {
CurrentFunctionDescription = "'sprintf'";
const auto *CE = cast<CallExpr>(Call.getOriginExpr());
bool IsBI = CE->getBuiltinCallee() == Builtin::BI__builtin___sprintf_chk;
evalSprintfCommon(C, Call, /* IsBounded */ false, IsBI);
evalSprintfCommon(C, Call, /* IsBounded = */ false);
}

void CStringChecker::evalSnprintf(CheckerContext &C,
const CallEvent &Call) const {
CurrentFunctionDescription = "'snprintf'";
const auto *CE = cast<CallExpr>(Call.getOriginExpr());
bool IsBI = CE->getBuiltinCallee() == Builtin::BI__builtin___snprintf_chk;
evalSprintfCommon(C, Call, /* IsBounded */ true, IsBI);
evalSprintfCommon(C, Call, /* IsBounded = */ true);
}

void CStringChecker::evalSprintfCommon(CheckerContext &C, const CallEvent &Call,
bool IsBounded, bool IsBuiltin) const {
bool IsBounded) const {
ProgramStateRef State = C.getState();
const auto *CE = cast<CallExpr>(Call.getOriginExpr());
DestinationArgExpr Dest = {{Call.getArgExpr(0), 0}};

const auto NumParams = Call.parameters().size();
assert(CE->getNumArgs() >= NumParams);
if (CE->getNumArgs() < NumParams) {
// This is an invalid call, let's just ignore it.
return;
}

const auto AllArguments =
llvm::make_range(CE->getArgs(), CE->getArgs() + CE->getNumArgs());
Expand Down
27 changes: 15 additions & 12 deletions clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -718,20 +718,23 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
{{{"isupper"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{{"isxdigit"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},

{{CDM::CLibrary, {BI.getName(Builtin::BIstrncat)}},
{{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrncat)}},
TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrlcpy)}},
{{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrlcpy)}},
TR::Prop({{1, 2}}, {{0}})},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrlcat)}},
{{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrlcat)}},
TR::Prop({{1, 2}}, {{0}})},
{{CDM::CLibrary, {{"snprintf"}}},
{{CDM::CLibraryMaybeHardened, {{"snprintf"}}},
TR::Prop({{1}, 3}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"sprintf"}}},
{{CDM::CLibraryMaybeHardened, {{"sprintf"}}},
TR::Prop({{1}, 2}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"strcpy"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"stpcpy"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"strcat"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"wcsncat"}}},
{{CDM::CLibraryMaybeHardened, {{"strcpy"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibraryMaybeHardened, {{"stpcpy"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibraryMaybeHardened, {{"strcat"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibraryMaybeHardened, {{"wcsncat"}}},
TR::Prop({{1}}, {{0, ReturnValueIndex}})},
{{CDM::CLibrary, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
{{CDM::CLibrary, {{"strdupa"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
Expand Down Expand Up @@ -759,13 +762,13 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},

// SinkProps
{{CDM::CLibrary, BI.getName(Builtin::BImemcpy)},
{{CDM::CLibraryMaybeHardened, BI.getName(Builtin::BImemcpy)},
TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDM::CLibrary, {BI.getName(Builtin::BImemmove)}},
{{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BImemmove)}},
TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrncpy)}},
{{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrncpy)}},
TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
MsgTaintedBufferSize)},
{{CDM::CLibrary, {BI.getName(Builtin::BIstrndup)}},
Expand Down

0 comments on commit fb299ca

Please sign in to comment.