Workshop on IoT Networking - Data Analysis Track  - Version 0.8

Network Traffic Analysis Using Python/Jupyter

Guilherme G. Martins - gmartins uchicago edu 2020

Other tracks:
- IoT Networking - Data Analysis - Using Python to Analyze IoT Network Traffic  
- IoT Networking - Data Collection - Using Single Board Computers to Collect and Monitor your Network Traffic. (TBD)
- IoT Survey on open source components and IoT building blocks. (TBD)

Requisites:
- Mac/Windows/Linux OS with terminal
- Python3/jupyter notebook/virtualenv+pip (https://jupyter.org/install)
- Wireshark Application (https://www.wireshark.org/#download)
- *bash* and *wget* via terminal
- IoTLab's geoip API (Use it inside the IoT Lab, network 192.168.XXX.0/24)

Motivation:

It is no secret that the proliferation of connected devices is imposing challenges from security and privacy standpoints. Your home network used to be a safe place with a handful of very well known connected devices. Now, it's even hard to keep track of the total number of connected devices, temperature sensors, cameras, smart toys, refrigerators, just to name a few. Multiple technologies are used to enable these devices to communicate and to interact with each other: Bluetooth, Zigbee, Near Field Communication (NFC) are just examples of communication protocols. But when it comes to using the full set of features provided by your IoT device and application, in most cases, it is required an internet connection for sending and receiving data to the cloud or the IoT Backend. Beyond just hoping that the IoT designers and operators are doing the right thing keeping both backend and IoT software secure, there are a few concepts, tools and techniques that we can be used to expose how these devices operate.

Goals
- How to decode a network traffic capture file (pcap) into csv (comma separated values);
- How to identify network packets from a specific devices in your network;
- How to visualize the TCP/UDP endpoints for all the external established connections;
- How to correlate activities and interaction with the IoT devices with a volume of sent and received data;  

References:

https://en.wikipedia.org/wiki/MAC_address

### 1. Testing Requirements

In [1]:
#we'll be using bash run wget,tshark transformations scripts
!/bin/bash --version

GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
Copyright (C) 2007 Free Software Foundation, Inc.


In [2]:
#wget to easily download the datasets and files from jupyter
!wget --version

GNU Wget 1.20.3 built on darwin19.0.0.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls 
+ntlm +opie -psl +ssl/openssl 

Wgetrc: 
    /usr/local/etc/wgetrc (system)
Locale: 
    /usr/local/Cellar/wget/1.20.3_2/share/locale 
Compile: 
    clang -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc" 
    -DLOCALEDIR="/usr/local/Cellar/wget/1.20.3_2/share/locale" -I. 
    -I../lib -I../lib -I/usr/local/opt/openssl@1.1/include -DNDEBUG -g 
    -O2 
Link: 
    clang -DNDEBUG -g -O2 -lidn2 -L/usr/local/opt/openssl@1.1/lib -lssl 
    -lcrypto -ldl -lz ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a 
    -liconv -lintl -Wl,-framework -Wl,CoreFoundation -lunistring 

Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Nik

In [4]:
#Make sure the terminal command 'tshark' is in your PATH environment variable and ready to be used.
#We'll use tshark to extract .csv data from the .pcap (packet capture format) to we can generate analysis
path=%env PATH
%env PATH=$path:/Applications/Wireshark.app/Contents/MacOS/
!tshark --version

env: PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/gmartins/golang/go1.13.1/go/bin/:/Applications//Visual Studio Code.app/Contents/Resources/app/bin/:/Users/gmartins/.local/bin/:/Users/gmartins/Library/Python/3.7/bin/:/Applications/Wireshark.app/Contents/MacOS/:/Applications/Wireshark.app/Contents/MacOS/
TShark (Wireshark) 3.0.7 (v3.0.7-0-g9435717b91f5)

Copyright 1998-2019 Gerald Combs <gerald@wireshark.org> and contributors.
License GPLv2+: GNU GPL version 2 or later <http://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with libpcap, without POSIX capabilities, with GLib 2.37.6,
with zlib 1.2.11, with SMI 0.4.8, with c-ares 1.15.0, with Lua 5.2.4, with
GnuTLS 3.4.17, with Gcrypt 1.7.7, with MIT Kerberos, with MaxMind DB resolver,
with nghttp2 1.39.2, with LZ4, with Snappy, with libxml2 2.9.9.

Running on 

### 2. Downloading the Data

In [5]:
# For analysing IoT devices you need to link information from multiple sources.
# The very first stop is to look at the MAC addresses and translate the first 3 octets 
# into the manufactor ID. (Keep in mind that mac addresses can be cloned or simply
# assigned to any arbitrary address by a malicious code running with root provileges 
# in the IoT firmwares.)
# The mac address resolution can be done using the field (eth.dst_resolved and
# eth.src_resolved) while extracting csv from pcap (or even enabling mac address 
# resolution in the Wireshark GUI), but here we understand how to link this information
# from it a reliable source without relying on an external application.
# https://en.wikipedia.org/wiki/MAC_address
# https://en.wikipedia.org/wiki/Organizationally_unique_identifier
ouiurl="http://standards-oui.ieee.org/oui/oui.txt" 
#ouiurl="https://linuxnet.ca/ieee/oui.txt" #sanitized version of oui dataset
!if [ ! -f 'oui.txt' ]; then wget $ouiurl; else echo "INFO: file present"; fi

--2020-01-13 15:04:39--  http://standards-oui.ieee.org/oui/oui.txt
Resolving standards-oui.ieee.org (standards-oui.ieee.org)... 140.98.223.27
Connecting to standards-oui.ieee.org (standards-oui.ieee.org)|140.98.223.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4340119 (4.1M) [text/plain]
Saving to: ‘oui.txt’


2020-01-13 15:04:42 (1.48 MB/s) - ‘oui.txt’ saved [4340119/4340119]



In [6]:
# download pcap for a single device dataset
pcapurl="http://192.168.143.1/camera1.pcap" # 3 days of packet capture
#pcapurl="http://192.168.143.1/camera2.pcap"
!if [ ! -f 'camera1.pcap' ]; then wget $pcapurl; else echo "INFO: file present"; fi

--2020-01-13 15:04:50--  http://192.168.143.1/camera1.pcap
Connecting to 192.168.143.1:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-01-13 15:04:50 ERROR 404: Not Found.



In [7]:
!tail -n 20 oui.txt

				Farum    DK-3520
				DK

64-BC-58   (hex)		Intel Corporate
64BC58     (base 16)		Intel Corporate
				Lot 8, Jalan Hi-Tech 2/3
				Kulim  Kedah  09000
				MY

28-E3-4E   (hex)		HUAWEI TECHNOLOGIES CO.,LTD
28E34E     (base 16)		HUAWEI TECHNOLOGIES CO.,LTD
				No.2 Xin Cheng Road, Room R6,Songshan Lake Technology Park
				Dongguan    523808
				CN

94-E9-EE   (hex)		Huawei Device Co., Ltd.
94E9EE     (base 16)		Huawei Device Co., Ltd.
				No.2 of Xincheng Road, Songshan Lake Zone
				Dongguan  Guangdong  523808
				CN


In [11]:
#import sys
#sys.path.append("/Users/gmartins/Library/Python/3.7/lib/python/site-packages")
import re
import pandas as pd

In [12]:
def generate_oui_dataframe():
    with open('oui.txt','r') as f:
        ouilines = f.readlines()
        p = re.compile("^(..-..-..).*\t\t(.*)") # Extract mac prefix 44-4A-DB and Manufacturer 
        macoui=[] # Organizational Unique Identifier OUI eg 44:4a:db
        macman=[] # manufacturer eg "Apple, Inc."
        for line in ouilines:
            r = p.match(line)
            if r is not None:
                try:
                    r1=r.group(1).replace("-",":").lower()
                    r2=r.group(2)
                except IndexError as ie:
                    print("WARN: generate_oui_dataframe regex - " + str(ie))
                    continue
                macoui.append(r1)
                macman.append(r2)
        df=pd.DataFrame({'macoui':macoui, 'macman':macman})
    return df

df=generate_oui_dataframe()
df

Unnamed: 0,macoui,macman
0,00:22:72,American Micro-Fuel Device Corp.
1,00:d0:ef,IGT
2,08:61:95,Rockwell Automation
3,f4:bd:9e,"Cisco Systems, Inc"
4,58:85:e9,Realme Chongqing MobileTelecommunications Corp...
...,...,...
27432,40:2e:71,Texas Instruments
27433,70:76:dd,OxyGuard Internation A/S
27434,64:bc:58,Intel Corporate
27435,28:e3:4e,"HUAWEI TECHNOLOGIES CO.,LTD"
