Skip to content

Create your own feature extractor class

Ayoub Benaissa edited this page Jul 19, 2019 · 5 revisions

Feature extractors are separated according to the kind of feature it extracts, there is three different packages for the moment, for PE features, for ELF features and for all common features. You will need to choose where your feature extractor should be put or create a new one.

Implementing the feature extractor class

Your class should extend from the BaseFeature class

class BaseFeature(object):

    name = ''
    is_image = False

    def can_extract(self, raw_exe):
        return True

    def extract_features(self, raw_exe):
        raise NotImplementedError

The name attribute should give a meaning for the features extracted by this class.

The is_image boolean attribute should indicate if this class extract an image or not.

The can_extract() function should tell if a certain feature can be extracted or not from the raw executable so the extractor may deal with it differently.

The extract_features() function should contain the main logic to extract a single or a set of common features from the raw executable, we use lief for the parsing of executable and we provide a helper function to get a lief binary from the raw executable, you can import it from

Here is an example feature extractor class that extract the names of imported functions and their counts

class ImportedFunctions(BaseFeature):
    Get the number and the list of imported functions.

    name = 'imported_functions'

    def extract_features(self, raw_exe):
        lief_file = lief_from_raw(raw_exe)
        imported_functions = lief_file.imported_functions
        features = {
            'imported_functions_counts': len(imported_functions),
            'imported_functions': imported_functions,
        return features

As you see, the extract_features should return a dictionary containing the wanted information, this key value pair can then be found in the output json file.