# Lacework Demo Notebook

This is a simple demonstration of a notebook that uses the Lacebook container and connects to a Lacework instance to pull some data off it.

## Connect to an Instance

Basic imports have already been completed, they are done at loading time of the lacebook kernel.

To see what imports are done, please see [the code here](https://github.com/lacework/python-sdk/blob/main/jupyter/docker/docker_build/00-import.py)

To connect to a Lacework instance we just need to create a client object, which is an instance of
`LaceworkJupyterHelper`. The Jupyter helper is a simple wrapper around the Python SDK that returns the
output of the API calls as a pandas DataFrame instead of a dict. The SDK can be queried directly, thus bypassing
the wrapper, by calling `client.sdk.<FUNCTION>`.

To execute cells in a Jupyter notebook click on the cell below and hit "shift + enter"

**when no option/parameter is passed to the LaceworkJupyterHelper it attempts to read credentials from the system environment variables (inside the lacebook container) or from the CLI config on the host. If neither of these are available you may need to manually enter the credentials.**

In [None]:
client = LaceworkJupyterHelper()

If you need to manually enter the credentials you can run the above cell using the parameters:

```
client = LaceworkJupyterHelper(
    api_key=API_KEY,
    api_secret=API_SECRET,
    account=ACCOUNT,
)
```

There are other parameters as well, to see a full list, as well as to be able to see the docstring for any function, run a cell with the function name followed by a `?`, eg: 

```
LaceworkJupyterHelper?
```

## Events

Now that we are connected to the client, we can start querying for data. Let's start by looking at recent event activity. We can generate the start and end time manually or we can take advantage of the `utils` library that comes as part of the Jupyter helper. Let's make use of that and lets query the last 5 days.

In [None]:
start_date, end_date = utils.parse_date_offset('LAST 5 DAYS')

Now that we've got start and end time, we can simply use the client to get events from the last five days.

In [None]:
event_df = client.events.get_for_date_range(start_date, end_date)

Since the data that we get back is a data frame, we can start exploring it. One quick way of understanding the data that we get is to use the `.shape`, that will give us the number of rows and columns inside the returned data frame.

This is not really a tutorial on how pandas DataFrame works, there are other better guides on the Internet for that. This is just an example of few things one can do with the Lacework connection.

In [None]:
event_df.shape

Now let's use the `value_counts` to summarize or to run an aggregation on a single column within the data frame. For this reason, let's look at what sort of severity these events have.

In [None]:
event_df.SEVERITY.value_counts()

We may only be interested in a subset of these events, so let's do a quick filter and only care about `Critical` and `High` events, at least for now.

In [None]:
subevents_df = event_df[event_df.SEVERITY.isin(['High', 'Critical'])]

Now we've got fewer events to look at. Let's take a closer look here.

In [None]:
subevents_df.head(3)

By looking at just three events from the data frame we can start to see what sort of information is stored in the dataframe, which can give us better ideas on how to filter it.

Now, let's look at what sort of event types we've got for these high and critical events.

In [None]:
subevents_df.EVENT_TYPE.value_counts()

We can now start to look at some of these events in more details, let's look at one particular event type.

Since the events will differ depending on when you run this command and on the events generated by your own environment, you will most likely need to change the value that is being filtered on here below. Adjust according to what you are seeing.

In [None]:
subevents_df[subevents_df.EVENT_TYPE == 'UserLoggedInFromNewLocation']

Now we may want to take a closer look at this particular event...

In [None]:
event_id = int(subevents_df[subevents_df.EVENT_TYPE == 'UserLoggedInFromNewLocation'].iloc[0].EVENT_ID)

event = client.events.get_details(event_id)

In [None]:
event

We get back a single event details, but we get it back as a data frame. We also notice that there is this field called `ENTITY_MAP` which contains another JSON, we can now use a function inside the utils library to flatten this field out.

In [None]:
event_flattened = utils.flatten_data_frame(event)

Let's look at this flattened DataFrame

In [None]:
event_flattened

Now the entire `ENTITY_MAP` has been expanded into separate columns. Let's look at this as a dict, and print it out a bit prettier.

In [None]:
event_dict = event_flattened.iloc[0].to_dict()
max_length = max([len(x) for x in event_dict.keys()])

for key, value in event_dict.items():
    fmt_string = f'[{{key:>{max_length}s}}] = {{value}}'
    print(fmt_string.format(key=key, value=value))

Now we can start reading through this event to see whether this is something we need to investigate further, and look for inside our environment.

## Vulnerabilities

Let's look at another function of the Python SDK, the vulnerabilities.

In [None]:
vuln_df = client.vulnerabilities.get_host_vulnerabilities()

We can see that both the `packages` and the `summary` columns are JSON structures, we can flatten this out again.

In [None]:
vuln_flatten_df = utils.flatten_data_frame(vuln_df)

Let's look at the flattened data frame here:

In [None]:
vuln_flatten_df.head(4)

Now we've got quite a lot more fields here. The flattening does not really work well for this case, since it will create many packages_N subsections. There is an option in the flattening to create new rows instead of generating more columns.

Let's flatten the DataFrame one more time, using this option now.

In [None]:
vuln_flatten_df = utils.flatten_data_frame(
    vuln_df, lists_to_rows=True
)

In [None]:
vuln_flatten_df.head(4)

This looks better, now we can start to look at package names, etc.

In [None]:
vuln_flatten_df.columns

Now we can summarize some of these for further inspection...

In [None]:
vuln_flatten_df['packages.name'].value_counts()

And to look at the criticality of things...

In [None]:
vuln_flatten_df['packages.severity'].value_counts()

In [None]:
vuln_flatten_df['packages.fix_available'].value_counts()

We can also start to look for signs of a particular vulnerability... let's say vulnerabilities where there is a fix available, the severity is high to critical and the word `remote` comes somewhere in the description of it.

In [None]:
vuln_slice = vuln_flatten_df[
    (vuln_flatten_df['packages.fix_available'] == '1') &
    (vuln_flatten_df['packages.severity'].isin(['High', 'Critical'])) &
    (vuln_flatten_df['packages.description'].str.contains('remote'))][
        ['packages.name', 'cve_id', 'packages.version', 'packages.fixed_version', 'packages.severity']].drop_duplicates()

vuln_slice

We can take a look at CVEs and severity together for instance

In [None]:
vuln_slice[['cve_id', 'packages.severity']].value_counts()

In [None]:
vuln_slice[['cve_id', 'packages.severity']].drop_duplicates()

And from this we could get some indication of priority of tasks, etc.

## LQL Queries

You can also run LQL queries here.

In [None]:
client.queries.execute(
    evaluator_id='Cloudtrail',
    arguments={
        'StartTimeRange': '2021-09-01',
        'EndTimeRange': '2021-09-07'
    },
    query_text=some_lql_query)

## Final Words

To discover what is possible within the client, use jupyter functions

In [None]:
client.*?

This will show you what API wrappers are available, and within each of these you can find out the available functions.

In [None]:
client.events.*?

And to find out how to use each function.

In [None]:
client.events.get_for_date_range?