# cyBERT: a flexible log parser based on the BERT language model

## Table of Contents
* Introduction
* Download cyBERT Apache model from HuggingFace
* Load model into cyBERT
* Download a sample of Apache logs
* Parse raw log data with cyBERT

## Introduction

One of the most arduous tasks of any security operation (and equally as time consuming for a data scientist) is ETL and parsing. This notebook illustrates the simple steps to parse a sample of DNS log data using cyBERT.

In [5]:
import cudf

#from clx.analytics.cybert import Cybert
from cybert import Cybert

In [2]:
CONFIG_FILENAME = "dns_parser/config.json"
MODEL_FILENAME = "dns_parser/pytorch_model.bin"

TESTING_DATA = "dns_data.csv"

## Load model into cyBERT

In [6]:
cybert = Cybert()
cybert.load_model(MODEL_FILENAME, CONFIG_FILENAME)

## Load csv data into cudf

In [7]:
logs_df = cudf.read_csv("Trainning_data_template.csv")

## Parse raw log data with cyBERT

In [9]:
parsed_df, confidence_df = cybert.inference(logs_df["raw"])



In [7]:
parsed_df

Unnamed: 0,Code,Date,ClientIP,Protocol,Query,dst,src,spt,app,InfobloxDNSView,destinationDnsDomain,InfobloxDNSQClass,InfobloxDNSQType,InfobloxDNSQFlags,InfobloxDNSRCode,InfobloxAnCount,InfobloxNsCount,InfobloxArCount,"msg="""""
0,<30>,28-Dec-2021 07:18:41.940,195.3.101.50#11763,UDP,identity-eu.webex.com IN A response:NOERROR-AE...,,,,,,,,,,,,,,
1,<134>1,2021-12-28T05:08:12.537Z,,proto=UDP,,NOERROR|1|dst=10.241.9.107,src=10.254.202.18,spt=59634,app=DNS,InfobloxDNSView=recursive,destinationDnsDomain=sin02-110-ru13.webex.com,InfobloxDNSQClass=IN,InfobloxDNSQType=AAAA,InfobloxDNSQFlags=+,InfobloxDNSRCode=NOERROR,InfobloxAnCount=0,InfobloxNsCount=0,InfobloxArCount =,
2,<30>,28-Dec-2021 07:18:41.708,10.248.228.79#59719,UDP,urrda.webex.com IN A response:NOERROR+E urrda....,,,,,,,,,,,,,,
3,<134>1,2021-12-28T05:08:12.006Z,,proto=UDP,,NOERROR|1|dst=10.241.9.107,src=150.253.208.65,spt=39308,app=DNS,InfobloxDNSView=recursive,destinationDnsDomain=eascb37602.webex.com,InfobloxDNSQClass=IN,InfobloxDNSQType=A,InfobloxDNSQFlags=+,InfobloxDNSRCode=NOERROR,InfobloxAnCount=1,InfobloxNsCount=0,InfobloxArCount =,"msg="" eascb37602.webex.com.265 IN A 150.253.20..."
4,<134>1,2021-12-28T05:08:11.998Z,,proto=UDP,,NOERROR|1|dst=10.241.9.107,src=10.247.140.189,spt=41402,app=DNS,InfobloxDNSView=recursive,destinationDnsDomain=esaas-nrt03-webexmeetings...,InfobloxDNSQClass=IN,InfobloxDNSQType=A,InfobloxDNSQFlags=+,InfobloxDNSRCode=NOERROR,InfobloxAnCount=2,InfobloxNsCount=0,InfobloxArCount =,"msg="" esaas-nrt03-webexmeetings-s.webex.com.25..."
5,<134>1,2021-12-28T05:08:11.534Z,,proto=UDP,,NOERROR|1|dst=10.241.9.107,src=114.29.210.167,spt=37399,app=DNS,InfobloxDNSView=recursive,destinationDnsDomain=adc-sjc02-f-1.fusion.prv....,InfobloxDNSQClass=IN,InfobloxDNSQType=A,InfobloxDNSQFlags=+,InfobloxDNSRCode=NOERROR,InfobloxAnCount=1,InfobloxNsCount=0,InfobloxArCount =,"msg="" adc-sjc02-f-1.FUSION.prv.webex.com.253 I..."
6,<30>,28-Dec-2021 07:18:41.635,156.154.76.146#34804,UDP,ams01-wxpd-lb01-adns.webex.com IN AAAA respons...,,,,,,,,,,,,,,
7,<134>1,2021-12-28T05:08:10.809Z,,proto=UDP,,NOERROR|1|dst=10.241.9.107,src=10.247.140.111,spt=38173,app=DNS,InfobloxDNSView=recursive,destinationDnsDomain=wsinmw-as-1.prod.infra.we...,InfobloxDNSQClass=IN,InfobloxDNSQType=A,InfobloxDNSQFlags=+,InfobloxDNSRCode=NOERROR,InfobloxAnCount=1,InfobloxNsCount=0,InfobloxArCount =,"msg="" wsinmw-as-1.prod.infra.webex.com.3 IN A ..."


In [9]:
confidence_df

Unnamed: 0,B-Code,I-Code,O,B-Date,I-Date,B-ClientIP,I-ClientIP,B-Protocol,B-Query,I-Query,...,B-InfobloxDNSRCode,I-InfobloxDNSRCode,B-InfobloxAnCount,I-InfobloxAnCount,B-InfobloxNsCount,I-InfobloxNsCount,B-InfobloxArCount,I-InfobloxArCount,"B-msg=""""","I-msg="""""
0,0.994032,0.996289,0.998772,0.991379,0.998529,0.989277,0.997686,0.991093,0.981679,0.998813,...,,,,,,,,,,
1,0.994053,0.997324,0.99911,0.993544,0.998712,,,0.994578,,,...,0.990725,0.991458,0.982728,0.985362,0.986288,0.989843,0.990357,0.990739,,
2,0.993899,0.996102,0.998722,0.99181,0.998595,0.986626,0.997658,0.991177,0.978159,0.998865,...,,,,,,,,,,
3,0.994094,0.997424,0.999109,0.993732,0.998704,,,0.994481,,,...,0.991909,0.992101,0.98994,0.989473,0.987984,0.990448,0.989992,0.990983,0.988798,0.998324
4,0.993574,0.99735,0.999101,0.994074,0.998687,,,0.994377,,,...,0.99193,0.992038,0.989676,0.988998,0.985543,0.988248,0.990054,0.991156,0.990316,0.998626
5,0.994159,0.99741,0.999105,0.994341,0.998714,,,0.994605,,,...,0.991018,0.991654,0.989991,0.989725,0.987865,0.99042,0.991809,0.992116,0.989559,0.998531
6,0.993728,0.996274,0.998747,0.990858,0.998559,0.987604,0.997703,0.991022,0.983381,0.998037,...,,,,,,,,,,
7,0.994123,0.997433,0.999103,0.994361,0.998699,,,0.994542,,,...,0.991561,0.992017,0.990088,0.989256,0.987486,0.990745,0.99152,0.99179,0.989839,0.998498
