## Determing 'baseline devices' with the Kismet packets table
Kismet database have two main tables: devices, which has a summary of each device and what information Kismet knows about it, and the packets tables, which has literally every packet, which can be useful for determining a device's behavior, presence, activities, etc.

Here we'll see a way to just get the 'baseline devices' from the kismet database. Not sure if this is the best way, but it's A WAY.

Basically, what will happen is that we'll divide the timeframe of the scan (in this case about two days) into 'buckets' or intervals of 300 seconds (5 minutes) and then see which devices are in at least 95 percent of those buckets.

You can, of course, redefine those parameters. I am simply selecting five minutes and 95 percent as decent numbers that work for me. In certain environments, especially when you have a lot of access points around, you may miss some less vocal devices, like TVs, game systems, etc. You may want to adjust the numbers a bit.


In [1]:
#necessary imports
import pandas as pd
import sqlite3

#set up connection to db
conn = sqlite3.connect('guate.kismet')

#define sql query to pull each mac and timestamp from packets table
query = 'SELECT ts_sec, sourcemac FROM packets;'

#read the data into a dataframe
kismet = pd.read_sql_query(query, conn)
conn.close()

#set a reasonable interval, could be seconds, minutes, whatever
#I'm going with 300s, since 5 minutes is a reasonable time
#to give every device a chance to be "present"
interval = '300s'

In [2]:
#get the start and stop time with .min() and .max() on the ts_sec column
min_ts = pd.to_datetime(kismet.ts_sec.min(), unit='s')
max_ts = pd.to_datetime(kismet.ts_sec.max(), unit='s')

#reset/round/whatever the timestamps to five-minute intervals via .floor() method on the ts
#this will push each timestamp to the next highest even multiple of that time interval
#i.e., if you started scanning at 16:57:01, the first interval will start at 15:00:00
#then the intervals will be 15:05:00, 15:10:00, etc.
kismet.ts_sec = pd.to_datetime(kismet.ts_sec, unit='s').dt.floor(interval)

#get total number of intervals
total_intervals = len(kismet.ts_sec.value_counts())

print ('Start Time: {} Stop Time: {}'.format(min_ts, max_ts))
print ('Number of intervals: {}'.format(total_intervals))

#dropping all duplicates within each 'bucket'
#we only want to see which devices are present in each bucket
#and don't care in this case about each packet
kismet.drop_duplicates(inplace=True)

Start Time: 2020-03-23 12:41:48 Stop Time: 2020-03-25 12:21:18
Number of intervals: 573


In [3]:
#group the dataframe by mac, count (size) how many intervals it appears in
#and sort in descending order, showing most-active (baseline) devices at the top
#this could be used to filter, identify, whatever...
k1 = kismet.groupby(['sourcemac']).size().to_frame('intervals').sort_values('intervals', ascending=False)

In [4]:
#k1 holds just the mac (index) and the number of intervals present
k1.head(10)

Unnamed: 0_level_0,intervals
sourcemac,Unnamed: 1_level_1
00:00:00:00:00:00,573
F0:9F:C2:F4:10:8D,573
F2:9F:C2:F7:F8:30,573
F2:9F:C2:F8:F4:B6,573
F2:9F:C2:F5:10:8D,573
F2:9F:C2:F8:F8:30,573
F0:9F:C2:FE:A4:95,573
F2:9F:C2:FD:A4:95,573
F2:9F:C2:FE:A4:95,573
F0:9F:C2:FD:A4:95,573


In [5]:
#I'm gonna say that anything appearing in more than 95% of the intervals is baseline
baseline_devices = k1[(k1.intervals >= 0.95 * total_intervals)]

In [6]:
#baseline_devices at this point is a new dataframe
baseline_devices

Unnamed: 0_level_0,intervals
sourcemac,Unnamed: 1_level_1
00:00:00:00:00:00,573
F0:9F:C2:F4:10:8D,573
F2:9F:C2:F7:F8:30,573
F2:9F:C2:F8:F4:B6,573
F2:9F:C2:F5:10:8D,573
F2:9F:C2:F8:F8:30,573
F0:9F:C2:FE:A4:95,573
F2:9F:C2:FD:A4:95,573
F2:9F:C2:FE:A4:95,573
F0:9F:C2:FD:A4:95,573


In [7]:
#just look at the number of baseline devices in baseline_devices
len(baseline_devices)

21

In [8]:
#just convert to a Python list
baseline_list = list(baseline_devices.index)

In [9]:
baseline_list

['00:00:00:00:00:00',
 'F0:9F:C2:F4:10:8D',
 'F2:9F:C2:F7:F8:30',
 'F2:9F:C2:F8:F4:B6',
 'F2:9F:C2:F5:10:8D',
 'F2:9F:C2:F8:F8:30',
 'F0:9F:C2:FE:A4:95',
 'F2:9F:C2:FD:A4:95',
 'F2:9F:C2:FE:A4:95',
 'F0:9F:C2:FD:A4:95',
 'F0:9F:C2:F8:F8:30',
 'F0:9F:C2:F8:F4:B6',
 'F0:9F:C2:F7:F8:30',
 'F0:9F:C2:F7:F4:B6',
 'F0:9F:C2:F5:10:8D',
 'F2:9F:C2:F7:F4:B6',
 'CC:2D:E0:3E:27:FB',
 'F2:9F:C2:F4:10:8D',
 'F0:9F:C2:FD:AA:B7',
 '60:38:E0:DA:F9:63',
 'F2:9F:C2:FD:AA:B7']