# Trunk Recorder Log Analysis

This notebook uses some regex to parse the trunk-recorder log file and create some statistics. The easiest way to use is to configure trunk-recorder to save logs and it will happily create a new logfile each day it is in service. While it is possible to use output from docker logs or copy from the console, in those cases you'd have to strip ANSI color codes from the output with the `asci2text` program in `colorized-logs`

## Usage

In general, gzip your logfile and copy your logfile over to this directory as "tar.log.gz". Or if you like, you can edit the `python logfile = logfile = gzip.open("tr.log.gz", "rt")` in the code box immediately below.


In [None]:
import datetime
import gzip
import re

import pandas as pd

logfile = gzip.open("tr.log.gz", "rt")
calldict = {}
# Define the regex pattern for our log entries
# line = "[2024-05-09 12:31:45.009426] (info)   [pwcp25]	126C	TG:       1007 (            PWPD West 1)	Freq: 851.962500 MHz	Concluding Recorded Call - Last Update: 4s	Recorder last write:4.72949	Call Elapsed: 12"
log_pattern = r".*\[(\S+\s\S+)\]\s+\((\S+)\)\s+\[(\S+)\]\s+(\d+)\S+\s+\S+\s+(\d+).*Freq:\s+(\d+\.\d+).*MHz\s+(.*)"
# If you DO NOT have "talkgroupDisplayFormat": "id_tag" set, you can change the log_pattern to this below to grab the numeric
# talkgroup numbers:
# log_pattern = r".*\[(\S+\s\S+)\]\s+\((\S+)\)\s+\[(\S+)\]\s+(\d+)\S+\s+\S+\s+(\d+).*Freq:\s+(\d+\.\d+).*MHz\s+(.*)"
# And if you are using "talkgroupDisplayFormat": "tag_id", try this pattern:
# log_pattern = r".*\[(\S+\s\S+)\]\s+\((\S+)\)\s+\[(\S+)\]\s+(\d+).*(\d+).*Freq:\s+(\d+\.\d+).*MHz\s+(.*)"
for line in logfile:
    # line = "2024-05-09T12:23:26.007469771Z [2024-05-09 12:23:26.005761] (info)   [pwcp25]	16C	TG:       2003 (                PWFD 5B)	Freq: 851.725000 MHz	Concluding Recorded Call - Last Update: 4s	Recorder last write:4.79871	Call Elapsed: 13"
    if match := re.match(log_pattern, line):
        calldata = match[7]
        # Technically this isn't a real unixtimestamp as it's not timezone aware,
        # but we're just using it to create a unique index identifier.
        calldate = datetime.datetime.strptime(match[1], "%Y-%m-%d %H:%M:%S.%f")
        callts = calldate.timestamp()
        # Index for the dict is timestamp(ish)-talkgroup
        callindex = f"{int(callts)}{int(match[5].strip())}"

        # Second round of regexp.  Now we are going to harvest data from calldata - the "everything else"
        regexp_dict = {
            "excluded": r".*(Not recording talkgroup.).*",
            "encrypted": r".*(Not Recording: ENCRYPTED).*",
            "unknown_tg": r".*(TG not in Talkgroup File).*",
            "no_source": r".*(no source covering Freq).*",
            "standard": r".*Call Elapsed:\s+(\d+)",
        }
        for callclass, data_pattern in regexp_dict.items():
            if datamatch := re.match(data_pattern, calldata):
                calldict[callindex] = {"callclass": callclass}
                if callclass == "standard":
                    calldict[callindex] = {"duration": int(datamatch[1])}
                    # This log event happens at the end of a call, so we should adjust the calltime
                    # back by duration seconds to get to the start.
                    calldate = calldate + datetime.timedelta(seconds=-int(datamatch[1]))
                calldict[callindex].update(
                    {
                        "calldate": calldate,
                        "loglevel": str(match[2]),
                        "system": str(match[3]),
                        "callnumber": int(match[4]),
                        "talkgroup": int(match[5].strip()),
                        "frequency": float(match[6]),
                    }
                )
logfile.close()

calldf = pd.DataFrame.from_dict(calldict, orient="index")
# Technically this shouldn't be needed.  The dict construction _should_ set the class
# but for some reason it skips setting standard.  It does set duration though so that part of
# of the loop works.  This workaround sets the class to standard if there is a duration.
calldf.loc[calldf["duration"].notna(), "callclass"] = "standard"

# We're going to use ChanList.csv if we have it to convert decimal talkgroups to their
# Alpha Longform.  While this could be in the original log line, we do it here to take care
# of logs which might not have that enabled AND it allows us to see the number value of "unlisted" tg.
try:
    chanlist = pd.read_csv("ChanList.csv")
    calldf = pd.merge(
        left=calldf,
        right=chanlist,
        left_on="talkgroup",
        right_on="Decimal",
        how="left",
    )
    # Talkgroup was an int for matching; now it becomes a string
    calldf[["talkgroup"]] = calldf[["talkgroup"]].astype("str")
    # And now we merge in the Alpha Tag to talkgroups defined.  Undefined keep their
    # numeric value
    calldf.loc[calldf["Alpha Tag"].notna(), "talkgroup"] = calldf["Alpha Tag"]
except Exception:
    print("We couldn't open ChanList so talkgroups will remain numeric.")
# Finally, either way let's sort the columns in the dataframe and dump the extra columns
# from the ChanList merge
calldf = calldf.filter(
    [
        "calldate",
        "loglevel",
        "system",
        "callnumber",
        "callclass",
        "talkgroup",
        "frequency",
        "duration",
    ],
    axis=1,
)
calldf.sort_values(by="calldate", inplace=True)

This section is just the top of all of the data in the pandas dataframe so you get a sense of what is there and if it looks normal.


In [None]:
import numpy as np
import plotly.express as px

pd.set_option("display.max_rows", 999)
pd.set_option("display.precision", 5)
display(calldf.head().style.hide(axis="index"))

## Filtering the dataset

Before we work with the data, this is a good place to filter some of the dataset. I'm going to remove any calls with a frequency of 0 (as is common when trunk-recorder first starts).


In [None]:
# Filter out the 0 frequency listings common when trunk-recorder first starts up.
calldf = calldf[calldf["frequency"] != 0]

## Call Classes and high level statistics


In [None]:
call_duration_count = calldf["duration"].notnull().sum()
average_call_duration = calldf["duration"].mean()
longest_call_duration = calldf["duration"].max()
average_call_duration = np.round(average_call_duration, 2)
display(f"The average call duration is: {average_call_duration} seconds")
display(f"And the longest call was: {longest_call_duration} seconds")

In [None]:
total_call_count = calldf.shape[0]

excludeddf = calldf[calldf["callclass"] == "excluded"]
excluded_call_count = excludeddf["callclass"].shape[0]

encrypteddf = calldf[calldf["callclass"] == "encrypted"]
encrypted_call_count = encrypteddf["callclass"].shape[0]

unknowndf = calldf[calldf["callclass"] == "unknown_tg"]
unknown_talkgroup_count = unknowndf["callclass"].shape[0]

nosourcedf = calldf[calldf["callclass"] == "no_source"]
no_source_count = nosourcedf["callclass"].shape[0]

standarddf = calldf[calldf["callclass"] == "standard"]
standard_count = standarddf["callclass"].shape[0]

# Graph time!
callcounts = [
    excluded_call_count,
    encrypted_call_count,
    unknown_talkgroup_count,
    no_source_count,
    standard_count,
]
callcategories = ("Excluded", "Encrypted", "Unknown Talkgroup", "No Source", "Recorded")

px.bar(
    x=callcategories,
    y=callcounts,
    color=callcategories,
    color_discrete_sequence=px.colors.qualitative.G10,
    labels={"x": "Class of Call", "y": "Count of Calls"},
    text_auto=True,
)

# Frequency Statistics


In [None]:
px.bar(
    calldf["frequency"].value_counts(),
    color=calldf["frequency"].value_counts().index.astype(str),
    color_discrete_sequence=px.colors.qualitative.G10,
    text_auto=True,
    labels={"frequency": "Frequency of Call", "value": "Count of Calls"},
    title="Frequencies Used",
)

In [None]:
nosourcedf = calldf[calldf["callclass"] == "no_source"]
freqs = nosourcedf["frequency"].unique()
for i in freqs:
    print(f"There were reports of SDRs not able to cover {i} Mhz frequency.")

## Talkgroup Stats


In [None]:
px.bar(
    calldf["talkgroup"].value_counts(),
    color=calldf["talkgroup"].value_counts().index.astype(str),
    color_discrete_sequence=px.colors.qualitative.G10,
    text_auto=True,
    labels={"talkgroup": "Talkgroup of Call", "value": "Count of Calls"},
    title="Talkgroups Used",
)

In [None]:
# Only the Calls we recorded
recordeddf = calldf[calldf["callclass"] == "standard"]

px.bar(
    recordeddf["talkgroup"].value_counts(),
    color=recordeddf["talkgroup"].value_counts().index.astype(str),
    color_discrete_sequence=px.colors.qualitative.G10,
    text_auto=True,
    labels={"talkgroup": "Talkgroup of Recorded Call", "value": "Count of Calls"},
    title="Recorded Talkgroups Used",
)

In [None]:
# Encrypted Calls Only
encrypteddf = calldf[calldf["callclass"] == "encrypted"]

px.bar(
    encrypteddf["talkgroup"].value_counts(),
    color=encrypteddf["talkgroup"].value_counts().index.astype(str),
    color_discrete_sequence=px.colors.qualitative.G10,
    text_auto=True,
    labels={"talkgroup": "Talkgroup of Encrypted Call", "value": "Count of Calls"},
    title="Encrypted Talkgroups Used",
)

In [None]:
# Unknown Talkgroup Calls Only
unknowndf = calldf[calldf["callclass"] == "unknown_tg"]

px.bar(
    unknowndf["talkgroup"].value_counts(),
    color=unknowndf["talkgroup"].value_counts().index.astype(str),
    color_discrete_sequence=px.colors.qualitative.G10,
    text_auto=True,
    labels={"talkgroup": "Unknown Talkgroup", "value": "Count of Calls"},
    title="Unknown Talkgroups",
)