## Security POC

#### Executive Summary

Recent advances in Machine Learning technology are starting to be integrated into mainstream software tools.  One example is security software, where a recent IDC vendor comparison report says:

>"Some security vendors were excluded from this analysis because IDC considered the endpoint STAP product incomplete or lacking full integration into the overall offering for signatureless defense, security incident response, and remediation tools. For example, some products **lacked advanced machine learning–based threat detection** at the time of this analysis." (emphasis added) - https://www.idc.com/getdoc.jsp?containerId=US42385717

In addition to specialized security software, Machine Learning is starting to be incorporated in more generic data analysis software, like the [Elastic Stack](https://www.elastic.co/guide/en/elastic-stack/current/) commonly used for software application log data collection, storage, and analysis.  This provides a "force multiplier" for IT staff to identify unusual or unexpected conditions that include operational and security issues as manifested in application logs.

These new Machine Learning techniques automatically and adaptively determine the "normal" behavior of systems, so "anomalous" events can be identified in near real-time (minutes) and trigger alerting for review by qualified staff, with enough context to guide them quickly to judge whether further investigation and remediation is required.

A Proof Of Concept undertaken by HMH Labs demonstrates that while Machine Learning implementations for log analysis are still "immature" in that they require a fair amount of configuration effort to give maximal value, even non-Data Analysis specialists can quickly put together systems to uncover hidden value in application log data already being collected.

Machine Learning will not eliminate the need for human judgment anytime soon, but it does enable delivery of pertinent information that make people more effective by doing the things computers are good at, like sifting through large amounts of data.  Everyone is familiar with the problem of 'information overload' and Machine Learning provides one powerful tool that will enable 'lean' organizations.

#### Machine Learning

* software designed to improve its operation over time, usually through refinement of a statistical model.

##### Use cases

* finance

* market data analysis

* image recognition

* natural language processing, including machine translation

* security, particularly threat detection

##### Security POC use case

* analyze HMH application logs for evidence of SQL injection attacks

* alerting Security personnel, with contextual information to allow quick assesment and detailed investigation.


##### Last path length ML job

The hard part of Machine Learning, like many other areas, is figuring out what questions to ask.  Data features pertinent to the problem need identification and metrics developed to allow them to be analyzed.

One of the most obvious things about SQL injection attacks is that they typically involve tacking various SQL syntax onto URLs.  Apart from trying to recognize that this SQL syntax is unusual, a simple test of the length of the last component, for example the '5699' in '/api/carts/5699', might be enough to distinguish them.  The longest path component of normal Huan API calls is 'forgot_password_notification', 28 characters.  A sample SQL injection test can look like:
```
  /api/carts/5699%27riMfJs%3C%27%22%3EfQzUJf%29%20AND%205971%3D3322%20AND%20%281102%3D1102
```

Minor additions were made to the log transformation scripts to calculate the length of the last path component.

Centralized Logging ML jobs "last_path_length" and "sql_injection_detection2" were created to test this theory.  The first is a "single metric" job just tracking max(last_path_length) and the second a "multi-metric" job that adds detection of anomalous mean(status) that indicates high rates of HTTP errors.  These jobs do not filter by environment, production or integration, but [sql_injection_detection2](json:redacted) sets the environment as a "partition field", which helps interpretation of the source of anomalies.




  ML job last_path_length training phase
![ML job last_path_length training phase](last_path_length_training.png)


ML job sql_injection_detection2 anomaly from sending [fake log records](json:redacted) to Huan INT

![sql_injection_detection2 anomaly from sending fake log records to Huan INT](sql_injection_detection2a.png)
![sql_injection_detection2 anomaly from sending fake log records to Huan INT](sql_injection_detection2b.png)

The path "influencers" reported in the lower right of the second screen shot are the attempted SQL injections.

#### Alerting

Elastic Watcher allows periodic checking of indices and alerting when given conditions are met.
The .ml-anomaly* indices, in particular '.ml-anomaly-shared', can be queried like any other data in ElasticSearch.
ML jobs default to putting results there, but can be assigned an individual index.

Watch alert integration exists for lots of services, in particular Slack and PagerDuty.  Unfortunately, they require account configuration in config/elasticsearch.yml and configuration reload is not supported, so elasticsearch needs restarting.

PagerDuty alert triggered by Elastic Watcher on ML job
![PagerDuty alert triggered by Elastic Watcher on ML job](PagerDuty_ML_alert_Screen_Shot.png)

## Appendix

### Log analysis demo system

Labs ["Centralized Logging ML" Kibana](redacted)

This is primarily logs from:

* [Secret Fire](redacted)

* [Nauglamir](redacted)

* [Thorondor](redacted)

* [Huan](redacted)

The API of the Marketplace back-end system, Huan, was used as a test system.  The actual Marketplace system is not suceptible to SQL injection, but the POC detects attempts against this API.  Rather than running "real" attacks, they are simulated by modifying real log records to match what standard SQL injection tests against this API produce, then sending these records to Elasticsearch directly.

[ML ELK Stack configuration](redacted)

[Test single metric ML job](redacted)

[Sample log message for SQL error](redacted)

##### SQL Injection

[Sample ML job configuration for SQL insert of admin users](sql_insert_admin.json)

[Automated Audit using SQLMap](https://www.owasp.org/index.php/Automated_Audit_using_SQLMap)

sample URI paths generated by sqlmap --technique=B

  * '/api/products/1
  
  * '/api/products/1
  
  * '/api/products/4987
  
  * '/api/products/1295
  
  * '/api/products/1%22%28%2C%2C%2C%2C..%2C%27
  
  * '/api/products/1%27riMfJs%3C%27%22%3EfQzUJf
  
  * '/api/products/1%29%20AND%205971%3D3322%20AND%20%281102%3D1102
  
  * '/api/products/1%29%20AND%204176%3D4176%20AND%20%281877%3D1877
  
  * '/api/products/1%20AND%203781%3D4876
  
  * '/api/products/1%20AND%204176%3D4176
  
  * '/api/products/1%27%29%20AND%207104%3D6594%20AND%20%28%27NKSa%27%3D%27NKSa
  
  * '/api/products/1%27%29%20AND%204176%3D4176%20AND%20%28%27VdqZ%27%3D%27VdqZ
  
  * '/api/products/1%27%20AND%207019%3D9045%20AND%20%27oLjV%27%3D%27oLjV
  
  * '/api/products/1%27%20AND%204176%3D4176%20AND%20%27ZvNO%27%3D%27ZvNO
  
  * '/api/products/1%25%27%20AND%205973%3D3153%20AND%20%27%25%27%3D%27
  
  * '/api/products/1%25%27%20AND%204176%3D4176%20AND%20%27%25%27%3D%27
  
  * '/api/products/1%20AND%202284%3D1303--%20qEMV
  
  * '/api/products/1%20AND%204176%3D4176--%20jHdc
  
*[URI]: Uniform Resource Identifier