<a href="https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/Stolen_Szechuan_Sauce_Data_Upload.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Stolen Szechuan Sauce

This is a simple colab demonstrating one way of uploading data from the Stolen Szechuan Sauce challenge (found [here](https://dfirmadness.com/the-stolen-szechuan-sauce/)).

This colab will not go into any analysis of the data, only uploading data to a sketch.

A word of notice, this notebook can be run on the cloud runtimes, but then few changes need to be made. However it is assumed that you are connecting to a local runtime, see [instructions here](https://research.google.com/colaboratory/local-runtimes.html). This makes it easier to import data that is already on your system.

For a more generic instructions of Colab can be [found here](https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/colab-timesketch-demo.ipynb)

In [None]:
# @title Import libraries
# @markdown This cell loads libraries that we will use througout the notebook.
import io
import os
import codecs

import altair as alt
import numpy as np
import pandas as pd

from timesketch_api_client import config
from timesketch_import_client import helper
from timesketch_import_client import importer

## AutoRuns File

Let's read the file that contains the output of the AutoRuns file.

In [None]:
# @markdown This needs to be changed to reflect the correct path.

PATH_TO_FOLDER = '/mnt/chromeos/MyFiles/Downloads' # @param {type: "string"}
# @markdown the path to the folder will be used for all subsequent paths
# @markdown as a root folder.
AUTO_RUN_FILENAME = 'autoruns-desktop-sdn1rpt.csv' # @param {type: "string"}

PATH_TO_CSV = os.path.join(PATH_TO_FOLDER, AUTO_RUN_FILENAME)


Now we can read the content of the file:

In [None]:
df = None
with codecs.open(PATH_TO_CSV, 'r', encoding='utf-8', errors='replace') as fh:
  df = pd.read_csv(fh, error_bad_lines=False)

print(df.shape)

Quite a few errors, let's look at the data.

In [None]:
df.head(3)

This does not look right, let's look at the content of the file, let's look at the hex code (for that we will use the `!` which allows us to execute shell commands)

In [None]:
!dd if=$PATH_TO_CSV bs=128 count=1 | xxd

This file is not UTF-8, it's encoded as UTF-16, so let's now read the file in again, this time as UTF-16

In [None]:
df = None
with codecs.open(PATH_TO_CSV, 'r', encoding='utf-16', errors='replace') as fh:
  df = pd.read_csv(fh, error_bad_lines=False)

print(df.shape)

No errors, let's look at the content

In [None]:
df.head(3)

This looks correct now, let's make the data a bit more Timesketch ready.

The first thing is to create a datetime field that contains the timestamp. We will use the built-in conversion in pandas:

In [None]:
df['datetime'] = pd.to_datetime(df['Time'])

The next thing is to add few fields that Timesketch expects:

In [None]:
df['data_type'] = 'autoruns:record'
df['timestamp_desc'] = 'Entry Recorded'
df['message'] = 'AutoRun: [' + df['Category'] + ' - ' + df['Profile'] + '] ' + df['Image Path']

df.head(3)

We can take a quick look at the data frame we just read in:

In [None]:
df.info()

### Upload To TS

Let's upload this data to TS. For that we first need to get a copy of the Timesketch client, then we will need to get a copy of a sketch object.

In [None]:
ts_client = config.get_client()
[(x.id, x.name) for x in ts_client.list_sketches()]

In [None]:
# @markdown This needs to be changed to reflect the correct sketch.

SKETCH_ID = 6 # @param {type: "integer"}

Now we are ready to upload the data. The sketch that we want to use is the one with the ID of 6.

We will use the importer client to import the data as a data frame, for that we need to setup an import streamer:

In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper) 

  streamer.set_timeline_name('autoruns_desktop_sdn1rpt')

  streamer.add_data_frame(df)

What we did there was create a copy of the Import client, and then configured it (defining the sketch to use and what the name of the timeline we are going to choose).

Now this data has been uploaded to the sketch in TS but there is an error in TS import, that is if we go and visit the sketch we can see that the sketch hasn't been uploaded correctly, so let's copy the error here.

Then we will delete/remove the timeline from the sketch so that there isn't an error one in TS.


```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/timesketch/lib/tasks.py", line 558, in run_csv_jsonl
    for event in read_and_validate(file_handle):
  File "/usr/local/lib/python3.6/dist-packages/timesketch/lib/utils.py", line 225, in read_and_validate_jsonl
    linedict['timestamp'] = parser.parse(linedict['datetime'])
  File "/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py", line 646, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py", line 725, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens
  File "/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py", line 207, in split
    return list(cls(s))
  File "/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py", line 76, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))
TypeError: Parser must be a string or character stream, not NoneType
```

This does indicate issues with datetime parsing. Let's take a closer look at the data frame in question. The first thing we check is to see whether there are any empty dates in the frame:

In [None]:
df[df.datetime.isna()]

The check we use is `isna` which checks to see if a field is empty or (Not a number).

There are quite a few records with an empty date field. Let's exclude those. For that we will need to upload a slice of the data frame that doesn't contain any records with an empty date.

In [None]:
df.shape

In [None]:
df[~df.datetime.isna()].shape

There seem to be 9 records without a date... let's remove them from the upload (by just uploading a slice of the data).

In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper) 

  streamer.set_timeline_name('autoruns_desktop_sdn1rpt_w_time')

  streamer.add_data_frame(df[~df.datetime.isna()].copy())

### Server AutoRun File

Let's take the server Autoruns next

In [None]:
DC_FILENAME = 'autorunsc-citadel-dc01.csv' # @param {type: "string"}

dc_path = os.path.join(PATH_TO_FOLDER, DC_FILENAME)

auto_server_df = None
with codecs.open(dc_path, 'r', encoding='utf-16', errors='replace') as fh:
  auto_server_df = pd.read_csv(fh, error_bad_lines=False)

print(auto_server_df.shape)

Let's `groom` it for TS

In [None]:
auto_server_df['datetime'] = pd.to_datetime(df['Time'])
auto_server_df['data_type'] = 'autoruns:record'
auto_server_df['timestamp_desc'] = 'Entry Recorded'
auto_server_df['message'] = 'AutoRun: [' + auto_server_df['Category'] + ' - ' + auto_server_df['Profile'] + '] ' + auto_server_df['Image Path']

auto_server_df.head(3)

And upload (using the same method as before)

In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper) 

  streamer.set_timeline_name('autoruns_citadel_dc01_w_time')

  streamer.add_data_frame(auto_server_df[~auto_server_df.datetime.isna()].copy())

Now we've got both autoruns in there

## Plaso Files

Let's in the plaso files, using:

```shell
$ timesketch_importer --sketch_id 6 20200918_0417_DESKTOP-SDN1RPT.plaso 
```

or using the importer client in colab

In [None]:
DESKTOP_PATH = '20200918_0417_DESKTOP-SDN1RPT.plaso' #@param {type: "string"}
SERVER_PATH = '20200918_0347_CDrive_new.plaso' #@param {type: "string"}


In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper) 

  streamer.set_timeline_name('desktop-sdn1rpt.plaso')
  streamer.add_file(os.path.join(PATH_TO_FOLDER, DESKTOP_PATH))

In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper) 

  streamer.set_timeline_name('dc1_plaso')
  streamer.add_file(os.path.join(PATH_TO_FOLDER, SERVER_PATH))

## PCAP Files

Another important factor in the challenge are the provided PCAP files. We need to get them checked into TS.

Let's start parsing them. There are essentially two different methods of doing so:

1. Using Wireshark to do the parsing and work with a CSV file
2. Parse the PCAP file using python libraries and use that.

Let's explore both options.

### Wireshark Route

Wireshark has a neat feature to export a set of packages or all packets into various other formats. This also includes CSV. As Timesketch is able to handle CSV data, this is worth an attempt.

To export packets to csv use:

```Wireshark → File → Export Packet Dissections```

And choose CSV.

The exported CSV will include all displayed columns. One thing to note here is that the time by default is relative to the first packet in the capture. You need to adjust that. 

Go to:

```Wireshark → View → Time Display Format```

And select ```UTC Date and Time of the Day```

To learn more about Time settings in Wireshark, visit wireshark.org

The now exported CSV looks promising. Some things need to be adjusted like the datetime column name and the format, but we already know how to do that from the autoruns csv file.


In [None]:
# @markdown Change the path to what fits on your system.
PCAP_CSV_PATH = 'all_packets.csv' #@param {type: "string"}

In [None]:
pcap_df = pd.read_csv(
    os.path.join(PATH_TO_FOLDER, PCAP_CSV_PATH),
    encoding='utf-8', parse_dates=False)

pcap_df.shape

#### Modify DataFrame

Now let's rename fields and add other fields to make it work better for Timesketch.

In [None]:
pcap_df.head(3)

Now we've got a general idea about how the data looks like, so we can change it.

In [None]:
# convert the 'Date' column to datetime format 
pcap_df['Time']= pd.to_datetime(pcap_df['Time']) 
pcap_df['data_type'] = 'pcap:wireshark:entry'
pcap_df['timestamp_desc'] = 'Time Logged'
pcap_df['source_short'] = 'LOG'
pcap_df['source'] = 'Network'
pcap_df['message'] = '[' + pcap_df['Protocol'] + '] ' + pcap_df['Info'] + ' (' + pcap_df['Source'] + ':' + pcap_df['src port'].astype('str') + ' -> ' + pcap_df['Destination'] + ':' + pcap_df['DST port'].astype('str') + ')'

pcap_df = pcap_df.rename(columns={'Time': 'datetime'})

pcap_df.info()

Let's look at the data frame now

In [None]:
pcap_df.head(3)

Adjust ports

In [None]:
pcap_df['DST port'] = pcap_df['DST port'].astype(pd.Int32Dtype())
pcap_df['src port'] = pcap_df['src port'].astype(pd.Int32Dtype())

pcap_df.head(3)

#### Upload CSV

Now we can upload the data to TS

In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper) 

  streamer.set_timeline_name('wireshark_decoded_pcap')

  streamer.add_data_frame(pcap_df[~pcap_df.datetime.isna()].copy())

### Using Python Libraries

Now we can use python libraries, such as scapy. This is a much slower method than using Wireshark and a CSV. It is however more flexible, there are more things that can be done here.

(for this we also have a progress bar since this will take some time to execute)

Make sure that your environment has scapy installed, if not you can execute:

In [None]:
!pip install -q scapy
!pip install -q tqdm
!pip install -q ipywidgets

In [None]:
# @markdown Import needed libraries for using scapy.
import binascii
import datetime
import pytz

import tqdm
from tqdm import tqdm_notebook, tnrange

import ipywidgets as widgets

from scapy import all as scapy_all

In [None]:
# @markdown Change this to the correct path on your system.
PCAP_PATH = 'case001.pcap' # @param {type: "string"}


Let's read in the PCAP file, word of warning, this will take a **really long time**

In [None]:
packets = scapy_all.rdpcap(
    os.path.join(PATH_TO_FOLDER, PCAP_PATH))

In [None]:
# @markdown check how many packets are in there
packets

Now we can start going through the packets to generate a data frame, since that's what we want so that we can upload data to TS

To convert the data to a dataframe we are borrowing code from : https://github.com/secdevopsai/Packet-Analytics/blob/master/Packet-Analytics.ipynb (see the [medium post here](https://medium.com/hackervalleystudio/learning-packet-analysis-with-data-science-5356a3340d4e))

In [None]:
# @markdown Collect field names from IP/TCP/UDP
# @markdown *These will be columns in DF*
ip_fields = [(field.name) for field in scapy_all.IP().fields_desc]
tcp_fields = [(field.name) for field in scapy_all.TCP().fields_desc]
udp_fields = [(field.name) for field in scapy_all.UDP().fields_desc]

print(ip_fields)
print(tcp_fields)
print(udp_fields)

ip_fields_new = [("ip_"+field.name) for field in scapy_all.IP().fields_desc]
tcp_fields_new = [("tcp_"+field.name) for field in scapy_all.TCP().fields_desc]
udp_fields_new = [("udp_"+field.name) for field in scapy_all.UDP().fields_desc]

dataframe_fields = ip_fields_new + ['time'] + tcp_fields_new + ['payload', 'datetime', 'raw']

#### Upload Data To Timesketch

Now that we've got the columns sorted out, we can now move on to go through each of the packets, create a dict and upload that directly to Timesketch.

Let's use the code from our previous example, except this time adding a progress bar. We are going to stream the results from the parsing directly to Timesketch.

**Word of warning: this will also take considerable amount of time to execute and it may even crash your notebook. You have been warned!**

In [None]:
sketch = ts_client.get_sketch(SKETCH_ID)
import_helper = helper.ImportHelper() 

with importer.ImportStreamer() as streamer:
  streamer.set_sketch(sketch)
  streamer.set_config_helper(import_helper)

  # Lower the threshold, which defines how many entries we go through before we flush the buffer.
  streamer.set_entry_threshold(1000)
  streamer.set_data_type('scapy:pcap:entry')
  streamer.set_timestamp_description('PCAP Entry')
  streamer.set_timeline_name('network_pcap_with_scapy')
  streamer.set_message_format_string('{raw:s}')

  for packet in tqdm_notebook(packets[scapy_all.IP]):
    # Field array for each row of DataFrame
 
    field_values = []
    # Add all IP fields to dataframe
    for field in ip_fields:
      if field == 'options':
        # Retrieving number of options defined in IP Header
        field_values.append(len(packet[scapy_all.IP].fields[field]))
      else:
        field_values.append(packet[scapy_all.IP].fields[field])
    
    field_values.append(packet.time)
    layer_type = type(packet[scapy_all.IP].payload)
    for field in tcp_fields:
      try:
        if field == 'options':
          field_values.append(len(packet[layer_type].fields[field]))
        else:
          field_values.append(packet[layer_type].fields[field])
      except:
        field_values.append(None)
    
    # Append payload
    field_values.append(len(packet[layer_type].payload))
 
    date_value = datetime.datetime.fromtimestamp(packet.time, tz=pytz.utc)
    field_values.append(date_value.isoformat())
    field_values.append(str(packet.show2))

    # Create a dict and upload it to timesketch.
    packet_dict = dict(zip(dataframe_fields, field_values))
    ip_flags = packet_dict.get('ip_flags')
    if not ip_flags is None:
      packet_dict['ip_flags'] = ip_flags.names

    tcp_flags = packet_dict.get('tcp_flags')
    if not tcp_flags is None:
      packet_dict['tcp_flags'] = tcp_flags.names

    del packet_dict['time']

    streamer.add_dict(packet_dict)