You can integrate YARA into your C/C++ project by using the API privided by the libyara library. This API gives you access to every YARA feature and it's the same API used by the command-line tools yara
and yarac
.
The first thing your program must do when using libyara is initializing the library. This is done by calling the :cyr_initialize()
function. This function allocates any resources needed by the library and initalizes internal data structures. Its counterpart is :cyr_finalize
, which must be called when you are finished using the library.
In a multi-threaded program only the main thread must call :cyr_initialize
and :cyr_finalize
, but any additional thread using the library must call :cyr_finalize_thread
before exiting.
Before using your rules to scan any data you need to compile them into binary form. For that purpose you'll need a YARA compiler, which can be created with :cyr_compiler_create
. After being used, the compiler must be destroyed with :cyr_compiler_destroy
.
You can use either :cyr_compiler_add_file
or :cyr_compiler_add_string
to add one or more input sources to be compiled. Both of these functions receive an optional namespace. Rules added under the same namespace behaves as if they were contained within the same source file or string, so, rule identifiers must be unique among all the sources sharing a namespace. If the namespace argument is NULL
the rules are put in the default namespace.
Both :cyr_compiler_add_file
and :cyr_compiler_add_string
return the number of errors found in the source code. If the rules are correct they will return 0. For more detailed error information you must set a callback function by using :cyr_compiler_set_callback
before calling :cyr_compiler_add_file
or :cyr_compiler_add_string
. The callback function has the following prototype:
void callback_function(
int error_level,
const char* file_name,
int line_number,
const char* message)
Possible values for error_level
are YARA_ERROR_LEVEL_ERROR
and YARA_ERROR_LEVEL_WARNING
. The arguments file_name
and line_number
contains the file name and line number where the error or warning occurs. file_name
is the one passed to :cyr_compiler_add_file
. It can be NULL
if you passed NULL
or if you're using :cyr_compiler_add_string
.
After you successfully added some sources you can get the compiled rules using the :cyr_compiler_get_rules()
function. You'll get a pointer to a :cYR_RULES
structure which can be used to scan your data as described in scanning-data
. Once :cyr_compiler_get_rules()
is invoked you can not add more sources to the compiler, but you can get multiple instances of the compiled rules by calling :cyr_compiler_get_rules()
multiple times.
Each instance of :cYR_RULES
must be destroyed with :cyr_rules_destroy
.
Compiled rules can be saved to a file and retrieved later by using :cyr_rules_save
and :cyr_rules_load
. Rules compiled and saved in one machine can be loaded in another machine as long as they have the same endianness, no matter the operating system or if they are 32-bits or 64-bits systems. However files saved with older versions of YARA may not work with newer version due to changes in the file layout.
Once you have an instance of :cYR_RULES
you can use it to scan data either from a file or a memory buffer with :cyr_rules_scan_file
and :cyr_rules_scan_mem
respectively. The results from the scan are notified to your program via a callback function. The callback has the following prototype:
int callback_function(
int message,
void* message_data,
void* user_data);
Possible values for message
are:
CALLBACK_MSG_RULE_MATCHING
CALLBACK_MSG_RULE_NOT_MATCHING
CALLBACK_MSG_SCAN_FINISHED
CALLBACK_MSG_IMPORT_MODULE
Your callback function will be called once for each existing rule with either a CALLBACK_MSG_RULE_MATCHING
or CALLBACK_MSG_RULE_NOT_MATCHING
message, depending if the rule is matching or not. In both cases a pointer to the :cYR_RULE
structure associated to the rule is passed in the message_data
argument. You just need to perform a typecast from void*
to YR_RULE*
to access the structure.
The callback is also called once for each imported module, with the CALLBACK_MSG_IMPORT_MODULE
message. In this case message_data
points to a :cYR_MODULE_IMPORT
structure. This structure contains a module_name
field pointing to a null terminated string with the name of the module being imported and two other fields module_data
and module_data_size
. These fields are initially set to NULL
and 0
, but your program can assign a pointer to some arbitrary data to module_data
while setting module_data_size
to the size of the data. This way you can pass additional data to those modules requiring it, like the Cuckoo-module
for example.
Lastly, the callback function is also called with the CALLBACK_MSG_SCAN_FINISHED
message when the scan is finished. In this case message_data
is NULL
.
In all cases the user_data
argument is the same passed to :cyr_rules_scan_file
or :cyr_rules_scan_mem
. This pointer is not touched by YARA, it's just a way for your program to pass arbitrary data to the callback function.
Both :cyr_rules_scan_file
and :cyr_rules_scan_mem
receive a flags
argument and a timeout
argument. The only flag defined at this time is SCAN_FLAGS_FAST_MODE
, so you must pass either this flag or a zero value. The timeout
argument forces the function to return after the specified number of seconds aproximately, with a zero meaning no timeout at all.
The SCAN_FLAGS_FAST_MODE
flag makes the scanning a little faster by avoiding multiple matches of the same string when not necessary. Once the string was found in the file it's subsequently ignored, implying that you'll have a single match for the string, even if it appears multiple times in the scanned data. This flag has the same effect of the -f
command-line option described in command-line
.