better management of unconfigured analyzers #306

mlodic · 2020-12-30T16:35:16Z

Regarding this closed issue (#298), we have a problem with those analyzers that require the use of an API key but either the user does not have the key or has not configured it.

Problems faced by users:

they execute the "Run all available analyzers" option and get a lot of errors. This could be avoided just by executing the correctly configured analyzers.
they see those analyzers as "runnable" but, actually, they are not and, when executed, they just instantly crash. I think we should not allow the user to execute them without the configuration set but, at the same time, let the user understand from the GUI that those analyzers exists and can be appropriately configured (the requires_configuration flag could be enough for this).

This seems easier than it is. ATM the API keys are checked in the code of each analyzer, while this should be done before the analysis actually runs. I am open to suggestions on how to solve this problem

The text was updated successfully, but these errors were encountered:

eshaan7 · 2020-12-31T11:58:31Z

The way I see it is there are 2 cases here:

If user selects to run an analyzer explicitly (configured or unconfigured), then showing the error in the logs and even in the final report makes sense.
in case of run_all_available_analyzers, the analyzers which failed because they were not configured should not be in the final report.

My suggestion:

in the set_report_and_cleanup function, we can pass the boolean run_all_available_analyzers and implement a check that the error-ed out analyzers are not added to the final report when run_all_available_analyzers=True. (this can be done very easily and does not require us to change the implementation of analyzers)

we can remove these 2 log statements from here.

IntelOwl/api_app/script_analyzers/classes.py

Lines 117 to 121 in b9073c5

    
           error_message = ( 
        
               f"job_id:{self.job_id}, analyzer: '{self.analyzer_name}'." 
        
               f" Analyzer error: '{err}'" 
        
           ) 
        
           logger.error(error_message)

IntelOwl/api_app/script_analyzers/classes.py

Lines 127 to 131 in b9073c5

    
           error_message = ( 
        
               f"job_id:{self.job_id}, analyzer:'{self.analyzer_name}'." 
        
               f" Unexpected error: '{err}'" 
        
           ) 
        
           logger.exception(error_message)

since it's already logged here:

IntelOwl/api_app/script_analyzers/utils.py

Lines 107 to 110 in b9073c5

    
           logger.warning( 
        
               f"({analyzer_name}, job_id #{job_id}) -> set as FAILED. " 
        
               f" Error message: {err_msg}" 
        
           )

this way the user is not bombarded with error messages.

eshaan7 · 2020-12-31T12:07:17Z

On another note, we should use JSON logging format (for example, python-json-logger instead of just plain texts. This would allow users to load the logs into elastic;'s APM or other similar services. Thoughts ?

mlodic · 2021-01-02T15:44:01Z

About the JSON logger I think that for someone could be helpful. However it is already easy to ingest logs in ElasticSearch with the actual logs format because it is pretty standard and it would just require a simple grok filter + logstash.
Maybe we could talk a little more about how to extend the ElasticSearch support in general (and for logs too). This could be another issue.

in the set_report_and_cleanup function, we can pass the boolean run_all_available_analyzers and implement a check that the error-ed out analyzers are not added to the final report when run_all_available_analyzers=True. (this can be done very easily and does not require us to change the implementation of analyzers)

This is a good compromise to avoid to make bigger changes. However it seems a little hack because the application would lie to the user who would think that those analyzers never executed.

CITIZENDOT · 2021-01-12T05:52:24Z

I'll suggest one method but you might get dissappointed by amount of work has to be done for that. But I feel it's safe way.
First of all, We'll agree on a fact that every analyzer either has an API Key or doesn't have an API key. No need to worry about second case. They'll work even if unconfigured. (Correct me, If I am wrong here)

And, In the first case, I believe every API Key follows a FIXED REGEX PATTERN. We should include that pattern in analyzer_config.json. And before sending a request/starting analysis, Make a check whether the API Key matches that Regex. If not, Set the status as Ignored/Unconfigured and It shouldn't be counted as error in report.

Work needs to be done: We should identify regex patterns of all existing API Keys and for future analyzers too.

mlodic · 2021-01-12T08:57:35Z

Thank you for sharing your idea.

One of the problems is that the "secrets" (api keys or similar) can be more than one for each analyzer and this should remain as it is (for example see Censys or HoneyDB). However when you mention the idea of including the keys names in analyzer_config.json, I think that it could work.

At the moment, we are just using the requires_configuration parameter that is shown to the users in the interface to tell them if an analyzer should be configured or not. It is a boolean value. One idea could be to replace this value for a required_secrets parameter where there is a list of secrets names that the analyzer needs. Before executing the analyzers, in the general.py module, we could check if those secrets are available and, if not, we could just not execute the related analyzer.

Also, required_secrets could be used in the GUI in the place of the requires_configuration (no need to keep that boolean value anymore) and show the names of the required secrets in the table.

In this way, if the user runs run_all_available_analyzers or several analyzers manually, the unconfigured analyzers would not be executed transparently, just like now it works for analyzers that does not supported a specific type of file for instance. Obviously, we should log this behavior somewhere.

In another case, if the user tries to execute the unconfigured analyzer alone, we should manage this special case in the API and show to the user that the analysis did not start because that analyzer is unconfigured...and showing a proper status code.

This should solve all the forementioned problems. Thoughts?

CITIZENDOT · 2021-01-12T09:07:16Z

I agree with rest of the things but, We shouldn't just rely on the presence of that API-KEY/secrets.

Because, The recommended usage is to copy .env_app_template to .env_app. In that process, All the API_KEYs are given a default value. So, that presence check always returns true. WHICH SHOULDN'T BE THE CASE.

I suggest to perform a basic check. Like if that secret is not the default value assigned in the template. Or If it matches with that Analyzer's Regex pattern (Robust way but requires lot of effort).

Edit : I am mistaken. I am assuming the test env file which has test as API key. I got your idea. Yes, A secret presence check will do the job.

mlodic · 2021-01-12T09:12:48Z

Yes, we could just check if the key is empty or not, that is the idea

CITIZENDOT · 2021-01-12T09:17:26Z

I'll work on this, If that's okay

mlodic · 2021-01-12T09:19:56Z

Sure, feel free and thank you!

eshaan7 · 2021-01-20T12:17:01Z

One idea could be to replace this value for a required_secrets parameter where there is a list of secrets names that the analyzer needs.

In this way, we are duplicating the API-key variable names under api_key_name and required_secrets. IMHO, this seems like a hack. I feel this solution wouldn't last long and we would soon have to implement it in some other way to allow for newer features in the future.

Since making big changes in the analyzer_config.json can't be avoided, I'd prefer to do it another way. To be precise, something similar to as done here. Basically, instead of just having a key-value for the api_key_name, it will be an object where we can add further information about the particular config parameter.

PS: Now that I think about this, this would involve tonnes of changes everywhere including the clients and maybe overkill for the problem that we are trying to solve which is just silencing unconfigured analyzers.

eshaan7 · 2021-01-20T13:40:36Z

Before executing the analyzers, in the general.py module, we could check if those secrets are available and, if not, we could just not execute the related analyzer.

the start_analyzers() function in general.py is called synchronously when we make a call to the /send_analysis_request endpoint, so if we add these checks their it would significantly slow down the response time. Not to mention, the overhead that it would add since now we are checking/reading the environment variables twice for each analyzer.

in the set_report_and_cleanup function, we can pass the boolean run_all_available_analyzers and implement a check that the error-ed out analyzers are not added to the final report when run_all_available_analyzers=True. (this can be done very easily and does not require us to change the implementation of analyzers)

Whether, we like it or not, all we can do is let the analyzer run (like it does now) and handle the exception in a way that whether we should show it to the user or not.

mlodic · 2021-01-20T13:59:37Z

the start_analyzers() function in general.py is called synchronously when we make a call to the /send_analysis_request endpoint, so if we add these checks their it would significantly slow down the response time. Not to mention, the overhead that it would add since now we are checking/reading the environment variables twice for each analyzer.

Yes, there would be a overhead but is it really relevant? We are trying to solve a user-experience problem that is worth to waste some milliseconds of computation. I know that without a real benchmark is difficult to understand but I guess that IntelOwl bottleneck is not there but it is inside all the tools that the analyzers use and all the APIs that the analyzers interact with.

Also, we could avoid to read the environment variables twice if we pass those values to the celery worker when we execute the analyzers. However this would require little changes to all the analyzers modules. It would be a painful work to do

m0mosenpai · 2021-02-28T12:01:42Z

One idea could be to replace this value for a required_secrets parameter where there is a list of secrets names that the analyzer needs.

In this way, we are duplicating the API-key variable names under api_key_name and required_secrets. IMHO, this seems like a hack. I feel this solution wouldn't last long and we would soon have to implement it in some other way to allow for newer features in the future.

Since making big changes in the analyzer_config.json can't be avoided, I'd prefer to do it another way. To be precise, something similar to as done here. Basically, instead of just having a key-value for the api_key_name, it will be an object where we can add further information about the particular config parameter.

PS: Now that I think about this, this would involve tonnes of changes everywhere including the clients and maybe overkill for the problem that we are trying to solve which is just silencing unconfigured analyzers.

I feel this would be nice too. I was thinking right now, analyzers that require API_KEYS show up in the list, while starting a job - even if they aren't configured properly. If the user forgets to set them up, they don't get to know about it till they run the job and get the error in the result JSON.

These unconfigured analyzers could be grayed out in the list and hovering over them could give a tooltip telling the user what exactly they forgot to configure. For accurate tooltips, we might need objects like @eshaan7 mentioned where at the time of adding an analyzer, we can specify the configuration required in terms of objects that we can check to know if the analyzer has been fully configured. If anyone of those objects of that Analyzer class, for example, are unconfigured/empty, we show that in the tooltip and when all have been configured, the analyzer is ungrayed and can be chosen.

This will prevent any of the unconfigured analyzers to run accidentally or while using the run_all_available_analyzers option and also provide the User real-time feedback on what he/she is missing. It also feels like a more scalable solution imo. Although, it would certainly require more work than the current preferred solution.

CITIZENDOT · 2021-02-28T12:12:30Z

Yes. Someway or Other, That's our Final Goal. I mean, User just couldn't select configured analyzer. Forgot to post the idea here.

For Now, I'm considering only run_all_available_analyzers and ignoring error report of un-configured analyzers, And in future, We'll implement the above idea, i.e., Knowing the configured analyzers before hand and disabling them.

eshaan7 · 2021-04-05T07:39:34Z

This is an example analyzer object inside the new format of analyzer_config.json,

"Cuckoo_Scan": {
  "type": "file",
  "disabled": false,
  "description": "scan a file on a Cuckoo instance",
  "python_module": "cuckoo_scan.CuckooAnalysis",
  "config": {
    "soft_time_limit": 500,
    "queue": "long",
    "max_post_tries": 5,
    "max_poll_tries": 20,
  },
  "secrets": {
    "api_key_name": {
      "secret_name": "CUCKOO_API_KEY",
      "type": "string",
      "required": true,
      "default": null,
      "description": "API key for your cuckoo instance"
    }
  }
}

Here, config is basically the additional_config_params as it was before except that currently we also store what secrets are to be read as part of additional_config_params but in this new implementation, we have a separate field secrets which is it's own JSON object and stores for each secret the related information such as secret_name, required, default , type (choices:number, string, bool) and description.

and this is how it would be stored in the cache after verify_analyzer_config() is called:

"Cuckoo_Scan": {
  "type": "file",
  "disabled": false,
  "description": "scan a file on a Cuckoo instance",
  "python_module": "cuckoo_scan.CuckooAnalysis",
  "config": {
    "soft_time_limit": 500,
    "queue": "long",
    "max_post_tries": 5,
    "max_poll_tries": 20,
  },
  "secrets": {
    "api_key_name": {
      "secret_name": "CUCKOO_API_KEY",
      "type": "string",
      "required": true,
      "default": null,
      "description": "API key for your cuckoo instance"
    }
  },
  "verification": {
    "configured": false,
    "error_message": "'api_key_name' unknown: expected string, got int (3 of 4 satisfied)",
    "missing_secrets": ["api_key_name"]
  }
}

Since the secrets field now stores all information regarding whether a secret is required, default, etc. that means we can loop over each analyzer object inside analyzer_config.json and then traverse through it's secrets field and verify for all of them thereby marking them as configured or unconfigured and constructing the error_message that will be shown on the frontend as the tooltip.

mlodic added the enhancement label Dec 30, 2020

mlodic assigned CITIZENDOT Jan 12, 2021

CITIZENDOT mentioned this issue Jan 18, 2021

Changed requires_configuration to required_secrets #321

Closed

12 tasks

CITIZENDOT mentioned this issue Feb 24, 2021

Added Script to change analyzer_config.json to newer format #343

Merged

12 tasks

eshaan7 mentioned this issue Apr 5, 2021

Create `Analyzer' model, serializer, viewset to allow storing/editing. #392

Closed

eshaan7 added this to To do in Analyzers Manager/ Analyzers Apr 5, 2021

eshaan7 mentioned this issue Apr 5, 2021

disable unconfigured options + show error_message in analyzers dropdown list intelowlproject/IntelOwl-ng#89

Closed

eshaan7 unassigned CITIZENDOT Apr 5, 2021

mlodic mentioned this issue Apr 27, 2021

Allow editing plugin params from the GUI #433

Closed

m0mosenpai added this to To do in GSoC '21 - IntelOwl Improvements May 18, 2021

m0mosenpai added the GSoC-2021 label May 31, 2021

m0mosenpai moved this from To do to In progress in GSoC '21 - IntelOwl Improvements Jun 15, 2021

m0mosenpai moved this from In progress to Done in GSoC '21 - IntelOwl Improvements Jun 22, 2021

m0mosenpai moved this from Done to In progress in GSoC '21 - IntelOwl Improvements Jun 22, 2021

eshaan7 mentioned this issue Jun 23, 2021

Secrets verification to be done inside BaseAnalyzerMixin #503

Closed

eshaan7 mentioned this issue Jun 23, 2021

Don't hardcode observable types in analyzer scripts #502

Closed

eshaan7 moved this from To do to In progress in Analyzers Manager/ Analyzers Jun 25, 2021

eshaan7 assigned eshaan7 and m0mosenpai and unassigned eshaan7 Jun 25, 2021

eshaan7 added this to the v3.0 milestone Jul 8, 2021

m0mosenpai closed this as completed Jul 22, 2021

GSoC '21 - IntelOwl Improvements automation moved this from In progress to Done Jul 22, 2021

sp35 moved this from In progress to Done in Analyzers Manager/ Analyzers Jul 23, 2021

eshaan7 mentioned this issue Apr 14, 2022

Allow plugin secrets to be stored and managed from GUI #978

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better management of unconfigured analyzers #306

better management of unconfigured analyzers #306

mlodic commented Dec 30, 2020

eshaan7 commented Dec 31, 2020 •

edited

eshaan7 commented Dec 31, 2020 •

edited

mlodic commented Jan 2, 2021

CITIZENDOT commented Jan 12, 2021

mlodic commented Jan 12, 2021

CITIZENDOT commented Jan 12, 2021 •

edited

mlodic commented Jan 12, 2021

CITIZENDOT commented Jan 12, 2021

mlodic commented Jan 12, 2021

eshaan7 commented Jan 20, 2021 •

edited

eshaan7 commented Jan 20, 2021 •

edited

mlodic commented Jan 20, 2021 •

edited

m0mosenpai commented Feb 28, 2021 •

edited

CITIZENDOT commented Feb 28, 2021

eshaan7 commented Apr 5, 2021 •

edited

better management of unconfigured analyzers #306

better management of unconfigured analyzers #306

Comments

mlodic commented Dec 30, 2020

eshaan7 commented Dec 31, 2020 • edited

eshaan7 commented Dec 31, 2020 • edited

mlodic commented Jan 2, 2021

CITIZENDOT commented Jan 12, 2021

mlodic commented Jan 12, 2021

CITIZENDOT commented Jan 12, 2021 • edited

mlodic commented Jan 12, 2021

CITIZENDOT commented Jan 12, 2021

mlodic commented Jan 12, 2021

eshaan7 commented Jan 20, 2021 • edited

eshaan7 commented Jan 20, 2021 • edited

mlodic commented Jan 20, 2021 • edited

m0mosenpai commented Feb 28, 2021 • edited

CITIZENDOT commented Feb 28, 2021

eshaan7 commented Apr 5, 2021 • edited

eshaan7 commented Dec 31, 2020 •

edited

eshaan7 commented Dec 31, 2020 •

edited

CITIZENDOT commented Jan 12, 2021 •

edited

eshaan7 commented Jan 20, 2021 •

edited

eshaan7 commented Jan 20, 2021 •

edited

mlodic commented Jan 20, 2021 •

edited

m0mosenpai commented Feb 28, 2021 •

edited

eshaan7 commented Apr 5, 2021 •

edited