# NFStream: a Flexible Network Data Analysis Framework

In [1]:
import nfstream
print(nfstream.__version__)

6.5.3


[**NFStream**][repo] is a multiplatform Python framework providing fast, flexible, and expressive data structures designed to make 
working with **online** or **offline** network data easy and intuitive. It aims to be Python's fundamental high-level 
building block for doing practical, **real-world** network flow data analysis. Additionally, it has the broader 
goal of becoming **a unifying network data analytics framework for researchers** providing data reproducibility 
across experiments.

* **Performance:** NFStream is designed to be fast: [**AF_PACKET_V3/FANOUT**][packet] on Linux, multiprocessing, native
[**CFFI based**][cffi] computation engine, and [**PyPy**][pypy] full support.
* **Encrypted layer-7 visibility:** NFStream deep packet inspection is based on [**nDPI**][ndpi]. 
It allows NFStream to perform [**reliable**][reliable] encrypted applications identification and metadata 
fingerprinting (e.g. TLS, SSH, DHCP, HTTP).
* **System visibility:** NFStream probes the monitored system's kernel to obtain information on open Internet sockets 
and collects guaranteed ground-truth (process name, PID, etc.) at the application level.
* **Statistical features extraction:** NFStream provides state of the art of flow-based statistical feature extraction. 
It includes post-mortem statistical features (e.g., minimum, mean, standard deviation, and maximum of packet size and 
inter-arrival time) and early flow features (e.g. sequence of first n packets sizes, inter-arrival times, and directions).
* **Flexibility:** NFStream is easily extensible using [**NFPlugins**][nfplugin]. It allows the creation of a new flow 
feature within a few lines of Python.
* **Machine Learning oriented:** NFStream aims to make Machine Learning Approaches for network traffic management 
reproducible and deployable. By using NFStream as a common framework, researchers ensure that models are trained using 
the same feature computation logic, and thus, a fair comparison is possible. Moreover, trained models can be deployed 
and evaluated on live networks using [**NFPlugins**][nfplugin]. 


In this notebook, we demonstrate a subset of features provided by [**NFStream**][repo].

[ndpi]: https://github.com/ntop/nDPI
[nfplugin]: https://nfstream.github.io/docs/api#nfplugin
[reliable]: http://people.ac.upc.edu/pbarlet/papers/ground-truth.pam2014.pdf
[repo]: https://nfstream.org/
[pypy]: https://www.pypy.org/
[cffi]: https://cffi.readthedocs.io/en/latest/index.html

In [2]:
from nfstream import NFStreamer, NFPlugin
import pandas as pd
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

## Flow aggregation made simple

In the following, we are going to use the main object provided by nfstream, `NFStreamer` which have the following parameters:

* `source` [default=None]: Packet capture source. Pcap file path or network interface name.
* `decode_tunnels` [default=True]: Enable/Disable GTP/TZSP tunnels decoding.
* `bpf_filter` [default=None]: Specify a [BPF filter][bpf] filter for filtering selected traffic.
* `promiscuous_mode` [default=True]: Enable/Disable promiscuous capture mode.
* `snapshot_length` [default=1500]: Control packet slicing size (truncation) in bytes.
* `idle_timeout` [default=120]: Flows that are idle (no packets received) for more than this value in seconds are expired.
* `active_timeout` [default=1800]: Flows that are active for more than this value in seconds are expired.
* `accounting_mode` [default=0] : Specify the accounting mode that will be used to report bytes related features (0: Link layer, 1: IP layer, 2: Transport layer, 3: Payload).
* `udps` [default=None]: Specify user defined NFPlugins used to extend NFStreamer.
* `n_dissections` [default=20]: Number of per flow packets to dissect for L7 visibility feature. When set to 0, L7 visibility feature is disabled.
* `statistical_analysis` [default=False]: Enable/Disable post-mortem flow statistical analysis.
* `splt_analysis` [default=0]: Specify the sequence of first packets length for early statistical analysis. When set to 0, splt_analysis is disabled.
* `max_nflows` [default=0]:	Specify the number of maximum flows to capture before returning. Unset when equal to 0.
* `n_meters` [default=0]: Specify the number of parallel metering processes. When set to 0, NFStreamer will automatically scale metering according to available physical cores on the running host.
* `performance_report` [default=0]: [**Performance report**](https://github.com/nfstream/nfstream/blob/master/assets/PERFORMANCE_REPORT.md) interval in seconds. Disabled whhen set to 0. Ignored for offline capture.
* `system_visibility_mode` [default=0]	Enable system process mapping by probing the host machine.
* `system_visibility_poll_ms` [default=100]	Set the polling interval in milliseconds for system process mapping feature (0 is the maximum achievable rate).

`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method.

[bpf]: https://biot.com/capstats/bpf.html

In [3]:
df = NFStreamer(source="pcap/instagram.pcap").to_pandas()

In [4]:
df.head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type
0,0,0,192.168.0.103,40:f3:08:c3:8e:e1,40:f3:08,33936,31.13.93.52,00:1b:2f:f0:7e:b4,00:1b:2f,443,6,4,0,0,1436720898386,1436720908442,10056,68,45688,1436720898386,1436720908442,10056,34,5555,1436720898475,1436720908442,9967,34,40133,TLS,Web,0,6,,,,,
1,1,0,192.168.0.106,00:16:44:1f:59:66,00:16:44,17500,255.255.255.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,17500,17,4,0,0,1436720906017,1436720906024,7,4,580,1436720906017,1436720906024,7,4,580,0,0,0,0,0,Dropbox,Cloud,0,6,,,,,
2,2,0,192.168.0.106,00:16:44:1f:59:66,00:16:44,17500,192.168.0.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,17500,17,4,0,0,1436720906022,1436720906022,0,1,145,1436720906022,1436720906022,0,1,145,0,0,0,0,0,Dropbox,Cloud,0,6,,,,,
3,3,0,192.168.0.1,00:1b:2f:f0:7e:b4,00:1b:2f,520,192.168.0.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,520,17,4,0,0,1436720906025,1436720906025,0,1,66,1436720906025,1436720906025,0,1,66,0,0,0,0,0,Unknown,Unspecified,0,0,,,,,
4,4,0,192.168.0.103,00:00:00:00:00:00,00:00:00,0,192.168.0.103,00:00:00:00:00:00,00:00:00,0,1,4,0,0,1436720908464,1436720911139,2675,5,510,1436720908464,1436720911139,2675,5,510,0,0,0,0,0,ICMP,Network,0,6,,,,,


In [5]:
df.shape

(38, 38)

We can enable post-mortem statistical flow features extraction as follow:

In [6]:
df = NFStreamer(source="pcap/instagram.pcap", statistical_analysis=True).to_pandas()

In [7]:
df.head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,bidirectional_min_ps,bidirectional_mean_ps,bidirectional_stddev_ps,bidirectional_max_ps,src2dst_min_ps,src2dst_mean_ps,src2dst_stddev_ps,src2dst_max_ps,dst2src_min_ps,dst2src_mean_ps,dst2src_stddev_ps,dst2src_max_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stddev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stddev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stddev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type
0,0,0,192.168.0.103,40:f3:08:c3:8e:e1,40:f3:08,33936,31.13.93.52,00:1b:2f:f0:7e:b4,00:1b:2f,443,6,4,0,0,1436720898386,1436720908442,10056,68,45688,1436720898386,1436720908442,10056,34,5555,1436720898475,1436720908442,9967,34,40133,66,671.882353,661.76184,1464,66,163.382353,322.650107,1431,66,1180.382353,502.204535,1464,0,150.089552,951.791862,7669,0,304.727273,1349.724098,7669,0,302.030303,1358.385703,7709,0,0,0,0,68,10,0,0,0,0,0,0,34,3,0,0,0,0,0,0,34,7,0,0,TLS,Web,0,6,,,,,
1,1,0,192.168.0.106,00:16:44:1f:59:66,00:16:44,17500,255.255.255.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,17500,17,4,0,0,1436720906017,1436720906024,7,4,580,1436720906017,1436720906024,7,4,580,0,0,0,0,0,145,145.0,0.0,145,145,145.0,0.0,145,0,0.0,0.0,0,1,2.333333,1.527525,4,1,2.333333,1.527525,4,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dropbox,Cloud,0,6,,,,,
2,2,0,192.168.0.106,00:16:44:1f:59:66,00:16:44,17500,192.168.0.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,17500,17,4,0,0,1436720906022,1436720906022,0,1,145,1436720906022,1436720906022,0,1,145,0,0,0,0,0,145,145.0,0.0,145,145,145.0,0.0,145,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dropbox,Cloud,0,6,,,,,
3,3,0,192.168.0.1,00:1b:2f:f0:7e:b4,00:1b:2f,520,192.168.0.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,520,17,4,0,0,1436720906025,1436720906025,0,1,66,1436720906025,1436720906025,0,1,66,0,0,0,0,0,66,66.0,0.0,66,66,66.0,0.0,66,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Unknown,Unspecified,0,0,,,,,
4,4,0,192.168.0.103,00:00:00:00:00:00,00:00:00,0,192.168.0.103,00:00:00:00:00:00,00:00:00,0,1,4,0,0,1436720908464,1436720911139,2675,5,510,1436720908464,1436720911139,2675,5,510,0,0,0,0,0,102,102.0,0.0,102,102,102.0,0.0,102,0,0.0,0.0,0,0,668.75,1173.672122,2420,0,668.75,1173.672122,2420,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,ICMP,Network,0,6,,,,,


We can enable early statistical flow features extraction as follow:

In [8]:
df = NFStreamer(source="pcap/instagram.pcap", splt_analysis=10).to_pandas()

In [9]:
df.head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,splt_direction,splt_ps,splt_piat_ms,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type
0,0,0,192.168.0.103,40:f3:08:c3:8e:e1,40:f3:08,33936,31.13.93.52,00:1b:2f:f0:7e:b4,00:1b:2f,443,6,4,0,0,1436720898386,1436720908442,10056,68,45688,1436720898386,1436720908442,10056,34,5555,1436720898475,1436720908442,9967,34,40133,"[0, 1, 1, 0, 0, 1, 1, 0, 1, 0]","[1431, 66, 679, 66, 1063, 66, 1464, 66, 209, 66]","[0, 89, 76, 0, 1523, 50, 340, 0, 2, 0]",TLS,Web,0,6,,,,,
1,1,0,192.168.0.106,00:16:44:1f:59:66,00:16:44,17500,255.255.255.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,17500,17,4,0,0,1436720906017,1436720906024,7,4,580,1436720906017,1436720906024,7,4,580,0,0,0,0,0,"[0, 0, 0, 0, -1, -1, -1, -1, -1, -1]","[145, 145, 145, 145, -1, -1, -1, -1, -1, -1]","[0, 2, 1, 4, -1, -1, -1, -1, -1, -1]",Dropbox,Cloud,0,6,,,,,
2,2,0,192.168.0.106,00:16:44:1f:59:66,00:16:44,17500,192.168.0.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,17500,17,4,0,0,1436720906022,1436720906022,0,1,145,1436720906022,1436720906022,0,1,145,0,0,0,0,0,"[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]","[145, -1, -1, -1, -1, -1, -1, -1, -1, -1]","[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]",Dropbox,Cloud,0,6,,,,,
3,3,0,192.168.0.1,00:1b:2f:f0:7e:b4,00:1b:2f,520,192.168.0.255,ff:ff:ff:ff:ff:ff,ff:ff:ff,520,17,4,0,0,1436720906025,1436720906025,0,1,66,1436720906025,1436720906025,0,1,66,0,0,0,0,0,"[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]","[66, -1, -1, -1, -1, -1, -1, -1, -1, -1]","[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]",Unknown,Unspecified,0,0,,,,,
4,4,0,192.168.0.103,40:f3:08:c3:8e:e1,40:f3:08,38816,46.33.70.160,00:1b:2f:f0:7e:b4,00:1b:2f,80,6,4,0,0,1436720900684,1436720900750,66,52,58994,1436720900684,1436720900750,66,13,1118,1436720900716,1436720900744,28,39,57876,"[0, 1, 0, 1, 1, 1, 1, 1, 1, 1]","[326, 1484, 66, 1484, 1484, 1484, 1484, 1484, ...","[0, 32, 1, 0, 1, 2, 2, 0, 0, 0]",HTTP.Instagram,SocialNetwork,0,6,photos-h.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,


We can enable IP anonymization as follow:

In [10]:
df = NFStreamer(source="pcap/instagram.pcap", 
                statistical_analysis=True).to_pandas(columns_to_anonymize=["src_ip", "src_mac", "dst_ip", "dst_mac"])

In [11]:
df.head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,bidirectional_min_ps,bidirectional_mean_ps,bidirectional_stddev_ps,bidirectional_max_ps,src2dst_min_ps,src2dst_mean_ps,src2dst_stddev_ps,src2dst_max_ps,dst2src_min_ps,dst2src_mean_ps,dst2src_stddev_ps,dst2src_max_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stddev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stddev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stddev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type
0,0,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,57936,3a44c94fd7c9aefa07df278f016460d75aa809c94a571c...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,6,4,0,0,1436720900687,1436720901200,513,58,50220,1436720900687,1436720901200,513,24,1837,1436720900744,1436720901200,456,34,48383,66,865.862069,696.739485,1484,66,76.541667,51.643409,319,186,1423.029412,252.360311,1484,0,9.0,45.124035,321,0,22.304348,70.131976,322,0,13.818182,58.73109,323,0,0,0,0,58,4,0,0,0,0,0,0,24,1,0,0,0,0,0,0,34,3,0,0,HTTP.Instagram,SocialNetwork,0,6,photos-g.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,
1,1,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,38816,c26866c915aaa410921c4fc309477eb0ceba2caec77bcf...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,6,4,0,0,1436720900684,1436720900750,66,52,58994,1436720900684,1436720900750,66,13,1118,1436720900716,1436720900744,28,39,57876,66,1134.5,612.257779,1484,66,86.0,72.111026,326,1484,1484.0,0.0,1484,0,1.294118,4.49575,32,0,5.5,9.17011,33,0,0.736842,0.68514,2,0,0,0,0,52,1,0,0,0,0,0,0,13,1,0,0,0,0,0,0,39,0,0,0,HTTP.Instagram,SocialNetwork,0,6,photos-h.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,
2,2,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,37350,68df6e56301d6c238c302eaa732d2075ef802914771e7b...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,6,4,0,0,1436720901262,1436720901262,0,1,324,1436720901262,1436720901262,0,1,324,0,0,0,0,0,324,324.0,0.0,324,324,324.0,0.0,324,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,HTTP.Instagram,SocialNetwork,0,6,photos-a.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,
3,3,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,33603,107d5f2f69c5e2f3da41be7ce2a59c0d818947212d6ef0...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,53,17,4,0,0,1436720908524,1436720908575,51,2,298,1436720908524,1436720908524,0,1,89,1436720908575,1436720908575,0,1,209,89,149.0,84.852814,209,89,89.0,0.0,89,209,209.0,0.0,209,51,51.0,0.0,51,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,DNS.Instagram,Network,0,6,igcdn-photos-a-a.akamaihd.net,,,,
4,4,0,d7feda5309e4f8477aac71903e83486f9f13566cc836f8...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,40855,6,4,0,0,1436720952611,1436720952611,0,2,140,1436720952611,1436720952611,0,1,74,1436720952611,1436720952611,0,1,66,66,70.0,5.656854,74,74,74.0,0.0,74,66,66.0,0.0,66,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,1,0,0,0,2,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,HTTP,Web,1,1,,,,,


Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:

* Compute data ratio on both direction (src2dst and dst2src)

In [12]:
df["src2dst_bytes_data_ratio"] = df['src2dst_bytes'] / df['bidirectional_bytes']
df["dst2src_bytes_data_ratio"] = df['dst2src_bytes'] / df['bidirectional_bytes']

In [13]:
df.head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,bidirectional_min_ps,bidirectional_mean_ps,bidirectional_stddev_ps,bidirectional_max_ps,src2dst_min_ps,src2dst_mean_ps,src2dst_stddev_ps,src2dst_max_ps,dst2src_min_ps,dst2src_mean_ps,dst2src_stddev_ps,dst2src_max_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stddev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stddev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stddev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type,src2dst_bytes_data_ratio,dst2src_bytes_data_ratio
0,0,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,57936,3a44c94fd7c9aefa07df278f016460d75aa809c94a571c...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,6,4,0,0,1436720900687,1436720901200,513,58,50220,1436720900687,1436720901200,513,24,1837,1436720900744,1436720901200,456,34,48383,66,865.862069,696.739485,1484,66,76.541667,51.643409,319,186,1423.029412,252.360311,1484,0,9.0,45.124035,321,0,22.304348,70.131976,322,0,13.818182,58.73109,323,0,0,0,0,58,4,0,0,0,0,0,0,24,1,0,0,0,0,0,0,34,3,0,0,HTTP.Instagram,SocialNetwork,0,6,photos-g.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,,0.036579,0.963421
1,1,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,38816,c26866c915aaa410921c4fc309477eb0ceba2caec77bcf...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,6,4,0,0,1436720900684,1436720900750,66,52,58994,1436720900684,1436720900750,66,13,1118,1436720900716,1436720900744,28,39,57876,66,1134.5,612.257779,1484,66,86.0,72.111026,326,1484,1484.0,0.0,1484,0,1.294118,4.49575,32,0,5.5,9.17011,33,0,0.736842,0.68514,2,0,0,0,0,52,1,0,0,0,0,0,0,13,1,0,0,0,0,0,0,39,0,0,0,HTTP.Instagram,SocialNetwork,0,6,photos-h.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,,0.018951,0.981049
2,2,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,37350,68df6e56301d6c238c302eaa732d2075ef802914771e7b...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,6,4,0,0,1436720901262,1436720901262,0,1,324,1436720901262,1436720901262,0,1,324,0,0,0,0,0,324,324.0,0.0,324,324,324.0,0.0,324,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,HTTP.Instagram,SocialNetwork,0,6,photos-a.ak.instagram.com,,,Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...,,1.0,0.0
3,3,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,33603,107d5f2f69c5e2f3da41be7ce2a59c0d818947212d6ef0...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,53,17,4,0,0,1436720908524,1436720908575,51,2,298,1436720908524,1436720908524,0,1,89,1436720908575,1436720908575,0,1,209,89,149.0,84.852814,209,89,89.0,0.0,89,209,209.0,0.0,209,51,51.0,0.0,51,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,DNS.Instagram,Network,0,6,igcdn-photos-a-a.akamaihd.net,,,,,0.298658,0.701342
4,4,0,d7feda5309e4f8477aac71903e83486f9f13566cc836f8...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,80,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,40855,6,4,0,0,1436720952611,1436720952611,0,2,140,1436720952611,1436720952611,0,1,74,1436720952611,1436720952611,0,1,66,66,70.0,5.656854,74,74,74.0,0.0,74,66,66.0,0.0,66,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,1,0,0,0,2,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,HTTP,Web,1,1,,,,,,0.528571,0.471429


* Filter data according to some criterias:

In [14]:
df[df["dst_port"] == 443].head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,bidirectional_min_ps,bidirectional_mean_ps,bidirectional_stddev_ps,bidirectional_max_ps,src2dst_min_ps,src2dst_mean_ps,src2dst_stddev_ps,src2dst_max_ps,dst2src_min_ps,dst2src_mean_ps,dst2src_stddev_ps,dst2src_max_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stddev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stddev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stddev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type,src2dst_bytes_data_ratio,dst2src_bytes_data_ratio
5,5,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,33763,6c662e3d71901ed2227ad7c8bd2e074e240ca921855da1...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,443,6,4,0,0,1436720908466,1436720910950,2484,11,5397,1436720908466,1436720908723,257,5,1279,1436720908518,1436720910950,2432,6,4118,66,490.636364,588.17264,1464,66,255.8,424.405702,1015,66,686.333333,668.351006,1464,0,248.4,698.093945,2227,0,64.25,126.502635,254,0,486.4,976.910078,2227,0,0,0,0,11,3,0,0,0,0,0,0,5,1,0,0,0,0,0,0,6,2,0,0,TLS,Web,0,6,,,,,,0.236984,0.763016
8,8,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,41181,c9f555cb103cf7cb79a76242fe4b121682293ebb993677...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,443,6,4,0,0,1436720908576,1436720908733,157,14,5567,1436720908576,1436720908733,157,8,896,1436720908615,1436720908662,47,6,4671,66,397.642857,566.204041,1484,66,112.0,86.328277,292,66,778.5,720.057706,1484,0,12.076923,22.746654,71,0,22.428571,28.56488,71,0,9.4,17.728508,41,2,0,0,0,13,4,0,0,1,0,0,0,7,2,0,0,1,0,0,0,6,2,0,0,TLS.Instagram,SocialNetwork,0,6,igcdn-photos-a-a.akamaihd.net,54ae5fcb0159e2ddf6a50e149221c7c7,34d6f0ad0a79e4cfdf145e640cc93f78,,,0.160948,0.839052
9,9,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,58690,7e269e3e2e4c87e46547c961422353c0034227df8cb6e5...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,443,6,4,0,0,1436720952561,1436720952561,0,2,169,1436720952561,1436720952561,0,2,169,0,0,0,0,0,66,84.5,26.162951,103,66,84.5,26.162951,103,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0,0,0,0,0,2,1,0,1,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,0,TLS,Web,0,6,,,,,,1.0,0.0
11,11,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,56382,8296dd65fdefef3bde6b1205337bd3270f4960d0747ae6...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,443,6,4,0,0,1436720898354,1436720899158,804,17,2647,1436720898354,1436720899158,804,9,1583,1436720898499,1436720899122,623,8,1064,66,155.705882,128.137994,530,66,175.888889,164.147376,530,66,133.0,74.989523,231,0,50.25,72.009722,181,0,100.5,83.5139,183,0,89.0,84.261498,183,2,0,0,0,16,8,0,0,1,0,0,0,8,4,0,0,1,0,0,0,8,4,0,0,TLS.Instagram,SocialNetwork,0,6,telegraph-ash.instagram.com,54ae5fcb0159e2ddf6a50e149221c7c7,acb741bcdffb787c5a52654c78645bdf,,,0.598036,0.401964
15,15,0,5885370fbc1de250a4570351f2679e915e15245a5534bd...,b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...,40:f3:08,41182,c9f555cb103cf7cb79a76242fe4b121682293ebb993677...,7f6b3b13330898c4dcf505e44b642c996b9674139831e0...,00:1b:2f,443,6,4,0,0,1436720908577,1436720908737,160,14,5567,1436720908577,1436720908737,160,8,896,1436720908616,1436720908665,49,6,4671,66,397.642857,566.204041,1484,66,112.0,86.328277,292,66,778.5,720.057706,1484,0,12.307692,23.346608,71,1,22.857143,28.439577,71,0,9.8,20.801442,47,2,0,0,0,13,4,0,0,1,0,0,0,7,2,0,0,1,0,0,0,6,2,0,0,TLS.Instagram,SocialNetwork,0,6,igcdn-photos-a-a.akamaihd.net,54ae5fcb0159e2ddf6a50e149221c7c7,34d6f0ad0a79e4cfdf145e640cc93f78,,,0.160948,0.839052


## Extend nfstream

In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].

[nfplugin]: https://nfstream.github.io/docs/api#nfplugin

* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow.

In [15]:
class Packet40Count(NFPlugin):
    def on_init(self, pkt, flow): # flow creation with the first packet
        if pkt.ip_size == 40:
            flow.udps.packet_with_40_ip_size=1
        else:
            flow.udps.packet_with_40_ip_size=0
        
    def on_update(self, pkt, flow): # flow update with each packet belonging to the flow
        if pkt.ip_size == 40:
            flow.udps.packet_with_40_ip_size += 1

In [16]:
df = NFStreamer(source="pcap/google_ssl.pcap", udps=[Packet40Count()]).to_pandas()

In [17]:
df.head()

Unnamed: 0,id,expiration_id,src_ip,src_mac,src_oui,src_port,dst_ip,dst_mac,dst_oui,dst_port,protocol,ip_version,vlan_id,tunnel_id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,bidirectional_duration_ms,bidirectional_packets,bidirectional_bytes,src2dst_first_seen_ms,src2dst_last_seen_ms,src2dst_duration_ms,src2dst_packets,src2dst_bytes,dst2src_first_seen_ms,dst2src_last_seen_ms,dst2src_duration_ms,dst2src_packets,dst2src_bytes,application_name,application_category_name,application_is_guessed,application_confidence,requested_server_name,client_fingerprint,server_fingerprint,user_agent,content_type,udps.packet_with_40_ip_size
0,0,0,172.31.3.224,80:c6:ca:00:9e:9f,80:c6:ca,42835,216.58.212.100,00:0e:8e:4d:b4:a8,00:0e:8e,443,6,4,0,0,1434443394683,1434443401353,6670,28,9108,1434443394683,1434443401353,6670,16,1512,1434443394717,1434443401308,6591,12,7596,TLS,Web,1,1,,,,,,14


Our Dataframe have a new column named `udps.packet_with_40_ip_size`.