# 10 minutes to CLX

This is a short introduction to CLX geared mainly towards new users.

## What are these libraries?

CLX provides a simple API for analysts, data scientists and engineers to quickly get started with applying RAPIDS to real-world cyber use cases. CLX uses the cuDF GPU-DataFrame to execute cyber analytics functionality at fast speeds. The following packages are available:

<ul>
<li>ml - Machine learning functionality</li>
<li>ip - IPv4 data translation and parsing</li>
<li>parsers - Log Event parsing</li>
<li>io - Input and output features for a workflow</li>
<li>workflow - Workflow which receives input data and produces analytical output data</li>
<li>osi - Open source integration (VirusTotal, Farsight)</li>
</ul>

## When to use CLX

Use CLX to build your cyber data analytics workflows in a gpu-accelerated environment. CLX contains common cyber and cyber-ML functionality such as log parsing for specific data sources, cyber data type parsing (such as for IPv4) and dga detection. CLX also provides the ability to integrate this functionality into a CLX workflow which simply executes the series of parsing and ML functions needed to produce cyber analytic output.


## Log Parsing 

CLX provides parsers to parse common log types.
Here’s an example parsing a common [Windows Event Log](https://www.ultimatewindowssecurity.com/securitylog/encyclopedia/default.aspx) of event code type [4770](https://www.ultimatewindowssecurity.com/securitylog/encyclopedia/event.aspx?eventid=4770).

In [1]:
import cudf
from clx.parsers.windows_event_parser import WindowsEventParser
event = "04/03/2019 11:58:59 AM\\nLogName=Security\\nSourceName=Microsoft Windows security auditing.\\nEventCode=5156\\nEventType=0\\nType=Information\\nComputerName=user234.test.com\\nTaskCategory=Filtering Platform Connection\\nOpCode=Info\\nRecordNumber=241754521\\nKeywords=Audit Success\\nMessage=The Windows Filtering Platform has permitted a connection.\\r\\n\\r\\nApplication Information:\\r\\n\\tProcess ID:\\t\\t4\\r\\n\\tApplication Name:\\tSystem\\r\\n\\r\\nNetwork Information:\\r\\n\\tDirection:\\t\\tInbound\\r\\n\\tSource Address:\\t\\t100.20.100.20\\r\\n\\tSource Port:\\t\\t138\\r\\n\\tDestination Address:\\t100.20.100.30\\r\\n\\tDestination Port:\\t\\t138\\r\\n\\tProtocol:\\t\\t17\\r\\n\\r\\nFilter Information:\\r\\n\\tFilter Run-Time ID:\\t0\\r\\n\\tLayer Name:\\t\\tReceive/Accept\\r\\n\\tLayer Run-Time ID:\\t44"
wep = WindowsEventParser()
df = cudf.DataFrame()
df['raw'] = [event]
result_df = wep.parse(df, 'raw')
result_df.head()

Unnamed: 0,attributes_sam_account_name,account_information_account_domain,attributes_profile_path,member_account_name,changed_attributes_display_name,attributes_allowed_to_delegate_to,application_information_application_name,filter_information_filter_run_time_id,additional_information_pre_authentication_type,attributes_script_path,...,changed_attributes_password_last_set,network_information_destination_address,target_account_account_name,changed_attributes_home_directory,network_information_port,failure_information_sub_status,filter_information_layer_name,changed_attributes_account_expires,attributes_user_principal_name,network_information_source_port
0,,,,,,,system,0,,,...,,100.20.100.30,,,,,receive/accept,,,138


## Cyber Data Types (IPv4)

CLX provides the ability to parse different data types related to cyber, such as IPv4. Here’s an example of how to get started.

#### Convert IPv4 strings to ints

In [2]:
import clx.ip
import cudf
df = cudf.Series(["5.79.97.178", "94.130.74.45"])
result_df = clx.ip.ip_to_int(df)
print(result_df)

0      89088434
1    1585596973
dtype: int64


#### Check if IPv4 Strings are multicast

In [3]:
import clx.ip
import cudf
df = cudf.Series(["224.0.0.0", "239.255.255.255", "5.79.97.178"])
result_df = clx.ip.is_multicast(df)
print(result_df)

0     True
1     True
2    False
dtype: bool


## Machine Learning

CLX offers machine learning functions ready to integrate into your cudf analytics workflow. 

#### Calculate Rolling Z-Score
Follow this example to calculate the rolling z-score for a given cudf Series.

In [4]:
import clx.analytics.stats
import cudf
sequence = [3,4,5,6,1,10,34,2,1,11,45,34,2,9,19,43,24,13,23,10,98,84,10]
series = cudf.Series(sequence)
zscores_df = cudf.DataFrame()
zscores_df['zscore'] = clx.analytics.stats.rzscore(series, 7)
print(zscores_df)

          zscore
0           null
1           null
2           null
3           null
4           null
5           null
6    2.374423424
7   -0.645941275
8   -0.683973734
9    0.158832461
10   1.847751909
11   0.880026019
12  -0.950835449
13  -0.360593742
14   0.111407599
15   1.228914145
16  -0.074966331
17  -0.570321249
18   0.327849973
19  -0.934372308
20   2.296828498
21   1.282966989
22  -0.795223674


## Workflows

Now that we've gotten a handle on CLX functionality, let's try to tie some of this functionality into a CLX workflow where the output produced can be valuable analytical information.  
  
A workflow is genericly defined as a function that receives a cudf dataframe, performs some gpu operations on it and then returns an output cudf dataframe. In our use case, we decide to show how to parse raw data within a workflow.

In [61]:
import cudf
from clx.workflow.workflow import Workflow
from clx.parsers.windows_event_parser import WindowsEventParser

wep = WindowsEventParser()

class LogParseWorkflow(Workflow):
    def workflow(self, dataframe):
        output = wep.parse(dataframe, "raw")
        return output
    
input_df = cudf.DataFrame()
input_df["raw"] = ["04/03/2019 11:58:59 AM\\nLogName=Security\\nSourceName=Microsoft Windows security auditing.\\nEventCode=5156\\nEventType=0\\nType=Information\\nComputerName=user234.test.com\\nTaskCategory=Filtering Platform Connection\\nOpCode=Info\\nRecordNumber=241754521\\nKeywords=Audit Success\\nMessage=The Windows Filtering Platform has permitted a connection.\\r\\n\\r\\nApplication Information:\\r\\n\\tProcess ID:\\t\\t4\\r\\n\\tApplication Name:\\tSystem\\r\\n\\r\\nNetwork Information:\\r\\n\\tDirection:\\t\\tInbound\\r\\n\\tSource Address:\\t\\t100.20.100.20\\r\\n\\tSource Port:\\t\\t138\\r\\n\\tDestination Address:\\t100.20.100.30\\r\\n\\tDestination Port:\\t\\t138\\r\\n\\tProtocol:\\t\\t17\\r\\n\\r\\nFilter Information:\\r\\n\\tFilter Run-Time ID:\\t0\\r\\n\\tLayer Name:\\t\\tReceive/Accept\\r\\n\\tLayer Run-Time ID:\\t44"]
lpw = LogParseWorkflow(name="my-log-parsing-workflow")
lpw.workflow(input_df)

Unnamed: 0,member_account_name,attributes_password_last_set,service_service_name,attributes_profile_path,account_information_security_id,additional_information_transited_services,additional_information_caller_computer_name,network_information_direction,new_logon_account_name,changed_attributes_home_drive,...,certificate_information_certificate_issuer_name,network_information_source_network_address,service_information_service_name,privileges,account_for_which_logon_failed_account_domain,network_information_network_address,service_server,new_account_account_name,user_account_name,attributes_user_account_control
0,,,,,,,,inbound,,,...,,,,,,,,,,


#### Workflow I/O

A workflow can receive and output data from different locations including CSV files and Kafka. To integrate I/O into your workflow simply indicate your workflow configurations within a `workflow.yaml` file, or define your configurations at instantiation within a python dictionary.  
The workflow class will first look for any configuration file here:  
<ul>
    <li>/etc/clx/[workflow-name]/workflow.yaml then </li>
    <li>~/.config/clx/[workflow-name]/workflow.yaml </li>
</ul>

To learn more about workflow configurations click here. (TODO: Add workflow config documentation)

First, let's create our input data file

In [62]:
import cudf
input_df = cudf.DataFrame()
input_df["raw"] = ["04/03/2019 11:58:59 AM\\nLogName=Security\\nSourceName=Microsoft Windows security auditing.\\nEventCode=5156\\nEventType=0\\nType=Information\\nComputerName=user234.test.com\\nTaskCategory=Filtering Platform Connection\\nOpCode=Info\\nRecordNumber=241754521\\nKeywords=Audit Success\\nMessage=The Windows Filtering Platform has permitted a connection.\\r\\n\\r\\nApplication Information:\\r\\n\\tProcess ID:\\t\\t4\\r\\n\\tApplication Name:\\tSystem\\r\\n\\r\\nNetwork Information:\\r\\n\\tDirection:\\t\\tInbound\\r\\n\\tSource Address:\\t\\t100.20.100.20\\r\\n\\tSource Port:\\t\\t138\\r\\n\\tDestination Address:\\t100.20.100.30\\r\\n\\tDestination Port:\\t\\t138\\r\\n\\tProtocol:\\t\\t17\\r\\n\\r\\nFilter Information:\\r\\n\\tFilter Run-Time ID:\\t0\\r\\n\\tLayer Name:\\t\\tReceive/Accept\\r\\n\\tLayer Run-Time ID:\\t44"]
input_df.to_csv("alert_data.csv")

Next, create and run the new workflow

In [60]:
from clx.workflow.workflow import Workflow
from clx.parsers.windows_event_parser import WindowsEventParser
import os
dirpath = os.getcwd()

source = {
   "type": "fs",
   "input_format": "csv",
   "input_path": dirpath + "alert_data.csv",
   "schema": ["raw"],
   "delimiter": ",",
   "required_cols": ["raw"],
   "dtype": ["str"],
   "header": 0
}
destination = {
   "type": "fs",
   "output_format": "csv",
   "output_path": dirpath + "alert_data_output.csv"
}
wep = WindowsEventParser()

class LogParseWorkflow(Workflow):
    def workflow(self, dataframe):
        output = wep.parse(dataframe, "raw")
        return output

lpw = LogParseWorkflow(source=source, destination=destination, name="my-log-parsing-workflow")
lpw.run_workflow()

Lastly, read output data

In [66]:
f = open('alert_data_output.csv', "r")
f.readlines()

['member_account_name,attributes_password_last_set,service_service_name,attributes_profile_path,account_information_security_id,additional_information_transited_services,additional_information_caller_computer_name,network_information_direction,new_logon_account_name,changed_attributes_home_drive,filter_information_layer_run_time_id,new_logon_security_id,additional_information_result_code,eventcode,changed_attributes_logon_hours,account_information_supplied_realm_name,additional_information_ticket_options,subject_security_id,detailed_authentication_information_key_length,changed_attributes_script_path,changed_attributes_display_name,detailed_authentication_information_transited_services,subject_logon_id,changed_attributes_sam_account_name,network_information_workstation_name,service_information_service_id,subject_account_name,account_information_user_id,new_logon_account_domain,attributes_user_workstations,account_locked_out_account_name,target_account_old_account_name,network_informati