# nfstream: a flexible network data analysis framework

[**nfstream**][repo] is a Python package providing fast, flexible, and expressive data structures designed to make working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** network data analysis in Python. Additionally, it has
the broader goal of becoming **a common network data processing framework for researchers** providing data reproducibility across experiments.

* **Performance:** **nfstream** is designed to be fast (x10 faster with pypy3 support) with a small CPU and memory footprint.
* **Layer-7 visibility:** **nfstream** deep packet inspection engine is based on [**nDPI**][ndpi]. It allows nfstream to perform [**reliable**][reliable] encrypted applications identification and metadata extraction (e.g. TLS, QUIC, TOR, HTTP, SSH, DNS, etc.).
* **Flexibility:** add a flow feature in 2 lines as an [**NFPlugin**][nfplugin].
* **Machine Learning oriented:** add your trained model as an [**NFPlugin**][nfplugin].

In this notebook, we demonstrate a subset of features provided by [**nfstream**][repo].

[documentation]: https://nfstream.github.io/
[ndpi]: https://github.com/ntop/nDPI
[nfplugin]: https://nfstream.github.io/docs/api#nfplugin
[reliable]: http://people.ac.upc.edu/pbarlet/papers/ground-truth.pam2014.pdf
[repo]: https://nfstream.github.io/

In [1]:
from nfstream import NFStreamer, NFPlugin
import pandas as pd
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

## Flow aggregation made simple

In the following, we are going to use the main object provided by nfstream, `NFStreamer` which have the following parameters:

* `source` [default= `None` ]: Source of packets. Possible values: `live_interface_name` or  `pcap_file_path`.
* `snaplen` [default= `65535` ]: Packet capture length.
* `idle_timeout` [default= `30` ]: Flows that are inactive for more than this value in seconds will be exported.
* `active_timeout` [default= `300` ]: Flows that are active for more than this value in seconds will be exported.
* `plugins` [default= `()` ]: Set of user defined NFPlugins.
* `dissect` [default= `True` ]: Enable nDPI deep packet inspection library for Layer 7 visibility.
* `max_tcp_dissections` [default= `10` ]: Maximum per flow TCP packets to dissect (ignored when dissect=False).
* `max_udp_dissections` [default= `16` ]: Maximum per flow UDP packets to dissect (ignored when dissect=False).
* `statistics` [default= `False`]: Enable statistical flow features extraction.
* `account_ip_padding_size` [default= `False`]: Enable Ethernet padding accounting when reporting IP sizes.

`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method.

In [2]:
df = NFStreamer(source="pcaps/instagram.pcap").to_pandas()

In [3]:
df.head()

Unnamed: 0,id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,src2dst_first_seen_ms,src2dst_last_seen_ms,dst2src_first_seen_ms,dst2src_last_seen_ms,version,src_port,dst_port,protocol,vlan_id,src_ip,dst_ip,bidirectional_packets,bidirectional_raw_bytes,bidirectional_ip_bytes,bidirectional_duration_ms,src2dst_packets,src2dst_raw_bytes,src2dst_ip_bytes,src2dst_duration_ms,dst2src_packets,dst2src_raw_bytes,dst2src_ip_bytes,dst2src_duration_ms,expiration_id,master_protocol,app_protocol,application_name,category_name,client_info,server_info,j3a_client,j3a_server
0,27,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,80,58216,6,4,31.13.86.52,192.168.0.103,150,153558,151458,1704.681885,103,150456,149014,1704.681885,47,3102,2444,1700.713867,0,7,119,HTTP.Facebook,SocialNetwork,,,,
1,19,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,26540,53,17,4,192.168.0.103,8.8.8.8,2,298,270,46.539062,1,89,75,46.539062,1,209,195,0.0,0,5,211,DNS.Instagram,SocialNetwork,,igcdn-photos-g-a.akamaihd.net,,
2,6,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,33976,80,6,4,192.168.0.103,77.67.29.17,34,29039,28563,7361.755127,14,924,728,7361.755127,20,28115,27835,7360.779053,0,0,7,HTTP,Web,,,,
3,29,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,0.0,0.0,4,58690,443,6,4,192.168.0.103,46.33.70.159,2,169,141,0.336182,2,169,141,0.336182,0,0,0,-1.0,0,0,91,TLS,Web,,,,
4,25,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,58052,80,6,4,192.168.0.103,82.85.26.162,75,57239,56189,90.485107,37,2702,2184,90.485107,38,54537,54005,29.175049,0,7,211,HTTP.Instagram,SocialNetwork,,photos-g.ak.instagram.com,,


We can enable statistical flow features extraction as follow:

In [4]:
df = NFStreamer(source="pcaps/instagram.pcap", statistics=True).to_pandas()

In [5]:
df.head()

Unnamed: 0,id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,src2dst_first_seen_ms,src2dst_last_seen_ms,dst2src_first_seen_ms,dst2src_last_seen_ms,version,src_port,dst_port,protocol,vlan_id,src_ip,dst_ip,bidirectional_packets,bidirectional_raw_bytes,bidirectional_ip_bytes,bidirectional_duration_ms,src2dst_packets,src2dst_raw_bytes,src2dst_ip_bytes,src2dst_duration_ms,dst2src_packets,dst2src_raw_bytes,dst2src_ip_bytes,dst2src_duration_ms,expiration_id,bidirectional_min_raw_ps,bidirectional_mean_raw_ps,bidirectional_stdev_raw_ps,bidirectional_max_raw_ps,src2dst_min_raw_ps,src2dst_mean_raw_ps,src2dst_stdev_raw_ps,src2dst_max_raw_ps,dst2src_min_raw_ps,dst2src_mean_raw_ps,dst2src_stdev_raw_ps,dst2src_max_raw_ps,bidirectional_min_ip_ps,bidirectional_mean_ip_ps,bidirectional_stdev_ip_ps,bidirectional_max_ip_ps,src2dst_min_ip_ps,src2dst_mean_ip_ps,src2dst_stdev_ip_ps,src2dst_max_ip_ps,dst2src_min_ip_ps,dst2src_mean_ip_ps,dst2src_stdev_ip_ps,dst2src_max_ip_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stdev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stdev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stdev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,master_protocol,app_protocol,application_name,category_name,client_info,server_info,j3a_client,j3a_server
0,27,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,80,58216,6,4,31.13.86.52,192.168.0.103,150,153558,151458,1704.681885,103,150456,149014,1704.681885,47,3102,2444,1700.713867,0,66,1023.72,649.692298,1464,1128,1460.737864,33.107064,1464,66,66.0,0.0,66,52,1009.72,649.692298,1450,1114,1446.737864,33.107064,1450,52,52.0,0.0,52,0.0,11.440818,105.965686,1246.765137,0.030029,16.712567,132.534657,1246.856934,0.1521,36.972041,189.939848,1247.467041,0,0,0,0,150,12,0,0,0,0,0,0,103,12,0,0,0,0,0,0,47,0,0,0,7,119,HTTP.Facebook,SocialNetwork,,,,
1,19,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,26540,53,17,4,192.168.0.103,8.8.8.8,2,298,270,46.539062,1,89,75,46.539062,1,209,195,0.0,0,89,149.0,84.852814,209,89,89.0,0.0,89,209,209.0,0.0,209,75,135.0,84.852814,195,75,75.0,0.0,75,195,195.0,0.0,195,46.539062,46.539062,0.0,46.539062,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,211,DNS.Instagram,SocialNetwork,,igcdn-photos-g-a.akamaihd.net,,
2,6,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,33976,80,6,4,192.168.0.103,77.67.29.17,34,29039,28563,7361.755127,14,924,728,7361.755127,20,28115,27835,7360.779053,0,66,854.088235,711.201475,1484,66,66.0,0.0,66,66,1405.75,317.048207,1484,52,840.088235,711.201475,1470,52,52.0,0.0,52,52,1391.75,317.048207,1470,0.029785,223.083489,1274.296656,7321.503174,0.092041,566.288856,2113.051097,7321.503174,0.031006,387.409424,1684.562322,7343.780029,0,0,0,0,34,1,0,2,0,0,0,0,14,0,0,1,0,0,0,0,20,1,0,1,0,7,HTTP,Web,,,,
3,29,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,0.0,0.0,4,58690,443,6,4,192.168.0.103,46.33.70.159,2,169,141,0.336182,2,169,141,0.336182,0,0,0,-1.0,0,66,84.5,26.162951,103,66,84.5,26.162951,103,-1,-1.0,-1.0,-1,52,70.5,26.162951,89,52,70.5,26.162951,89,-1,-1.0,-1.0,-1,0.336182,0.336182,0.0,0.336182,0.336182,0.336182,-1.0,0.336182,-1.0,-1.0,-1.0,-1.0,0,0,0,0,2,1,0,1,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,0,0,91,TLS,Web,,,,
4,25,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,58052,80,6,4,192.168.0.103,82.85.26.162,75,57239,56189,90.485107,37,2702,2184,90.485107,38,54537,54005,29.175049,0,66,763.186667,702.481388,1484,66,73.027027,42.743737,326,396,1435.184211,212.312602,1484,52,749.186667,702.481388,1470,52,59.027027,42.743737,312,382,1421.184211,212.312602,1470,0.0,1.222772,7.099327,61.310059,0.031006,2.484639,10.258239,62.164062,0.030029,0.788515,0.774075,2.411133,0,0,0,0,75,4,0,0,0,0,0,0,37,1,0,0,0,0,0,0,38,3,0,0,7,211,HTTP.Instagram,SocialNetwork,,photos-g.ak.instagram.com,,


Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:

* Compute data ratio on both direction (src2dst and dst2src)

In [6]:
df["src2dst_raw_bytes_data_ratio"] = df['src2dst_raw_bytes'] / df['bidirectional_raw_bytes']
df["dst2src_raw_bytes_data_ratio"] = df['dst2src_raw_bytes'] / df['bidirectional_raw_bytes']

In [7]:
df.head()

Unnamed: 0,id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,src2dst_first_seen_ms,src2dst_last_seen_ms,dst2src_first_seen_ms,dst2src_last_seen_ms,version,src_port,dst_port,protocol,vlan_id,src_ip,dst_ip,bidirectional_packets,bidirectional_raw_bytes,bidirectional_ip_bytes,bidirectional_duration_ms,src2dst_packets,src2dst_raw_bytes,src2dst_ip_bytes,src2dst_duration_ms,dst2src_packets,dst2src_raw_bytes,dst2src_ip_bytes,dst2src_duration_ms,expiration_id,bidirectional_min_raw_ps,bidirectional_mean_raw_ps,bidirectional_stdev_raw_ps,bidirectional_max_raw_ps,src2dst_min_raw_ps,src2dst_mean_raw_ps,src2dst_stdev_raw_ps,src2dst_max_raw_ps,dst2src_min_raw_ps,dst2src_mean_raw_ps,dst2src_stdev_raw_ps,dst2src_max_raw_ps,bidirectional_min_ip_ps,bidirectional_mean_ip_ps,bidirectional_stdev_ip_ps,bidirectional_max_ip_ps,src2dst_min_ip_ps,src2dst_mean_ip_ps,src2dst_stdev_ip_ps,src2dst_max_ip_ps,dst2src_min_ip_ps,dst2src_mean_ip_ps,dst2src_stdev_ip_ps,dst2src_max_ip_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stdev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stdev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stdev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,master_protocol,app_protocol,application_name,category_name,client_info,server_info,j3a_client,j3a_server,src2dst_raw_bytes_data_ratio,dst2src_raw_bytes_data_ratio
0,27,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,80,58216,6,4,31.13.86.52,192.168.0.103,150,153558,151458,1704.681885,103,150456,149014,1704.681885,47,3102,2444,1700.713867,0,66,1023.72,649.692298,1464,1128,1460.737864,33.107064,1464,66,66.0,0.0,66,52,1009.72,649.692298,1450,1114,1446.737864,33.107064,1450,52,52.0,0.0,52,0.0,11.440818,105.965686,1246.765137,0.030029,16.712567,132.534657,1246.856934,0.1521,36.972041,189.939848,1247.467041,0,0,0,0,150,12,0,0,0,0,0,0,103,12,0,0,0,0,0,0,47,0,0,0,7,119,HTTP.Facebook,SocialNetwork,,,,,0.979799,0.020201
1,19,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,26540,53,17,4,192.168.0.103,8.8.8.8,2,298,270,46.539062,1,89,75,46.539062,1,209,195,0.0,0,89,149.0,84.852814,209,89,89.0,0.0,89,209,209.0,0.0,209,75,135.0,84.852814,195,75,75.0,0.0,75,195,195.0,0.0,195,46.539062,46.539062,0.0,46.539062,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,211,DNS.Instagram,SocialNetwork,,igcdn-photos-g-a.akamaihd.net,,,0.298658,0.701342
2,6,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,33976,80,6,4,192.168.0.103,77.67.29.17,34,29039,28563,7361.755127,14,924,728,7361.755127,20,28115,27835,7360.779053,0,66,854.088235,711.201475,1484,66,66.0,0.0,66,66,1405.75,317.048207,1484,52,840.088235,711.201475,1470,52,52.0,0.0,52,52,1391.75,317.048207,1470,0.029785,223.083489,1274.296656,7321.503174,0.092041,566.288856,2113.051097,7321.503174,0.031006,387.409424,1684.562322,7343.780029,0,0,0,0,34,1,0,2,0,0,0,0,14,0,0,1,0,0,0,0,20,1,0,1,0,7,HTTP,Web,,,,,0.031819,0.968181
3,29,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,0.0,0.0,4,58690,443,6,4,192.168.0.103,46.33.70.159,2,169,141,0.336182,2,169,141,0.336182,0,0,0,-1.0,0,66,84.5,26.162951,103,66,84.5,26.162951,103,-1,-1.0,-1.0,-1,52,70.5,26.162951,89,52,70.5,26.162951,89,-1,-1.0,-1.0,-1,0.336182,0.336182,0.0,0.336182,0.336182,0.336182,-1.0,0.336182,-1.0,-1.0,-1.0,-1.0,0,0,0,0,2,1,0,1,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,0,0,91,TLS,Web,,,,,1.0,0.0
4,25,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,58052,80,6,4,192.168.0.103,82.85.26.162,75,57239,56189,90.485107,37,2702,2184,90.485107,38,54537,54005,29.175049,0,66,763.186667,702.481388,1484,66,73.027027,42.743737,326,396,1435.184211,212.312602,1484,52,749.186667,702.481388,1470,52,59.027027,42.743737,312,382,1421.184211,212.312602,1470,0.0,1.222772,7.099327,61.310059,0.031006,2.484639,10.258239,62.164062,0.030029,0.788515,0.774075,2.411133,0,0,0,0,75,4,0,0,0,0,0,0,37,1,0,0,0,0,0,0,38,3,0,0,7,211,HTTP.Instagram,SocialNetwork,,photos-g.ak.instagram.com,,,0.047206,0.952794


* Filter data according to some criterias:

In [8]:
df[df["dst_port"] == 443].head()

Unnamed: 0,id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,src2dst_first_seen_ms,src2dst_last_seen_ms,dst2src_first_seen_ms,dst2src_last_seen_ms,version,src_port,dst_port,protocol,vlan_id,src_ip,dst_ip,bidirectional_packets,bidirectional_raw_bytes,bidirectional_ip_bytes,bidirectional_duration_ms,src2dst_packets,src2dst_raw_bytes,src2dst_ip_bytes,src2dst_duration_ms,dst2src_packets,dst2src_raw_bytes,dst2src_ip_bytes,dst2src_duration_ms,expiration_id,bidirectional_min_raw_ps,bidirectional_mean_raw_ps,bidirectional_stdev_raw_ps,bidirectional_max_raw_ps,src2dst_min_raw_ps,src2dst_mean_raw_ps,src2dst_stdev_raw_ps,src2dst_max_raw_ps,dst2src_min_raw_ps,dst2src_mean_raw_ps,dst2src_stdev_raw_ps,dst2src_max_raw_ps,bidirectional_min_ip_ps,bidirectional_mean_ip_ps,bidirectional_stdev_ip_ps,bidirectional_max_ip_ps,src2dst_min_ip_ps,src2dst_mean_ip_ps,src2dst_stdev_ip_ps,src2dst_max_ip_ps,dst2src_min_ip_ps,dst2src_mean_ip_ps,dst2src_stdev_ip_ps,dst2src_max_ip_ps,bidirectional_min_piat_ms,bidirectional_mean_piat_ms,bidirectional_stdev_piat_ms,bidirectional_max_piat_ms,src2dst_min_piat_ms,src2dst_mean_piat_ms,src2dst_stdev_piat_ms,src2dst_max_piat_ms,dst2src_min_piat_ms,dst2src_mean_piat_ms,dst2src_stdev_piat_ms,dst2src_max_piat_ms,bidirectional_syn_packets,bidirectional_cwr_packets,bidirectional_ece_packets,bidirectional_urg_packets,bidirectional_ack_packets,bidirectional_psh_packets,bidirectional_rst_packets,bidirectional_fin_packets,src2dst_syn_packets,src2dst_cwr_packets,src2dst_ece_packets,src2dst_urg_packets,src2dst_ack_packets,src2dst_psh_packets,src2dst_rst_packets,src2dst_fin_packets,dst2src_syn_packets,dst2src_cwr_packets,dst2src_ece_packets,dst2src_urg_packets,dst2src_ack_packets,dst2src_psh_packets,dst2src_rst_packets,dst2src_fin_packets,master_protocol,app_protocol,application_name,category_name,client_info,server_info,j3a_client,j3a_server,src2dst_raw_bytes_data_ratio,dst2src_raw_bytes_data_ratio
3,29,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,0.0,0.0,4,58690,443,6,4,192.168.0.103,46.33.70.159,2,169,141,0.336182,2,169,141,0.336182,0,0,0,-1.0,0,66,84.5,26.162951,103,66,84.5,26.162951,103,-1,-1.0,-1.0,-1,52,70.5,26.162951,89,52,70.5,26.162951,89,-1,-1.0,-1.0,-1,0.336182,0.336182,0.0,0.336182,0.336182,0.336182,-1.0,0.336182,-1.0,-1.0,-1.0,-1.0,0,0,0,0,2,1,0,1,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,0,0,91,TLS,Web,,,,,1.0,0.0
7,23,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,60908,443,6,4,192.168.0.103,46.33.70.136,19,9340,9074,188.201904,10,1369,1229,188.201904,9,7971,7845,166.321045,0,66,491.578947,599.508624,1484,66,136.9,120.091123,375,66,885.666667,678.590082,1484,52,477.578947,599.508624,1470,52,122.9,120.091123,361,52,871.666667,678.590082,1470,0.061035,10.455661,16.187532,56.304932,0.671143,17.177979,19.019548,56.304932,0.701904,20.790131,31.604038,88.165039,2,0,0,0,18,6,0,0,1,0,0,0,9,3,0,0,1,0,0,0,9,3,0,0,91,211,TLS.Instagram,SocialNetwork,igcdn-photos-g-a.akamaihd.net,"a248.e.akamai.net,*.akamaihd.net,*.akamaihd-st...",54ae5fcb0159e2ddf6a50e149221c7c7,34d6f0ad0a79e4cfdf145e640cc93f78,0.146574,0.853426
11,32,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,4,49355,443,6,4,192.168.2.17,31.13.86.52,1366,1310382,1291258,14291.343994,456,33086,26702,14291.343994,910,1277296,1264556,14276.072021,0,66,959.284041,656.4863,1454,66,72.557018,57.267879,657,66,1403.621978,231.165855,1454,52,945.284041,656.4863,1440,52,58.557018,57.267879,643,52,1389.621978,231.165855,1440,0.0,10.469849,284.036113,10107.126953,0.00293,31.409547,492.263563,10107.126953,0.0,15.70525,348.827125,2804.834961,2,0,0,0,1365,24,0,2,1,0,0,0,455,7,0,1,1,0,0,0,910,17,0,1,91,211,TLS.Instagram,SocialNetwork,scontent-mxp1-1.cdninstagram.com,,7a29c223fb122ec64d10f0a159e07996,f4febc55ea12b31ae17cfb7e614afda8,0.025249,0.974751
13,33,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,4,49357,443,6,4,192.168.2.17,31.13.86.52,144,107306,105290,13539.177979,63,6340,5458,13539.177979,81,100966,99832,13526.353027,0,66,745.180556,674.991247,1454,66,100.634921,129.268825,663,66,1246.493827,468.87448,1454,52,731.180556,674.991247,1440,52,86.634921,129.268825,649,52,1232.493827,468.87448,1440,0.000977,94.679566,894.821462,10413.415039,0.00293,218.373838,1366.185495,10413.415039,0.000977,169.079413,1201.038973,2566.97998,2,0,0,0,143,11,0,2,1,0,0,0,62,5,0,1,1,0,0,0,81,6,0,1,91,211,TLS.Instagram,SocialNetwork,scontent-mxp1-1.cdninstagram.com,,44dab16d680ef93487bc16ad23b3ffb1,,0.059083,0.940917
14,34,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,1568796000000.0,4,49358,443,6,4,192.168.2.17,31.13.86.52,388,309238,303806,13538.522949,165,14193,11883,13538.522949,223,295045,291923,13525.730225,0,66,797.005155,680.517617,1454,66,86.018182,101.044151,654,66,1323.071749,382.173896,1454,52,783.005155,680.517617,1440,52,72.018182,101.044151,640,52,1309.071749,382.173896,1440,0.000977,34.983263,531.727457,10201.950928,0.003906,82.551969,818.381155,10201.950928,0.000977,60.926713,702.976026,2355.734131,2,0,0,0,387,18,0,2,1,0,0,0,164,7,0,1,1,0,0,0,223,11,0,1,91,211,TLS.Instagram,SocialNetwork,scontent-mxp1-1.cdninstagram.com,,44dab16d680ef93487bc16ad23b3ffb1,,0.045897,0.954103


## Extend nfstream

In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].

[nfplugin]: https://nfstream.github.io/docs/api#nfplugin

* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow.

In [9]:
class packet_with_40_ip_size(NFPlugin):
    def on_init(self, pkt): # flow creation with the first packet
        if pkt.ip_size == 40:
            return 1
        else:
            return 0
        
    def on_update(self, pkt, flow): # flow update with each packet belonging to the flow
        if pkt.ip_size == 40:
            flow.packet_with_40_ip_size += 1

In [10]:
df = NFStreamer(source="pcaps/google_ssl.pcap", plugins=[packet_with_40_ip_size()]).to_pandas()

In [11]:
df.head()

Unnamed: 0,id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,src2dst_first_seen_ms,src2dst_last_seen_ms,dst2src_first_seen_ms,dst2src_last_seen_ms,version,src_port,dst_port,protocol,vlan_id,src_ip,dst_ip,bidirectional_packets,bidirectional_raw_bytes,bidirectional_ip_bytes,bidirectional_duration_ms,src2dst_packets,src2dst_raw_bytes,src2dst_ip_bytes,src2dst_duration_ms,dst2src_packets,dst2src_raw_bytes,dst2src_ip_bytes,dst2src_duration_ms,expiration_id,master_protocol,app_protocol,application_name,category_name,client_info,server_info,j3a_client,j3a_server,packet_with_40_ip_size
0,0,1434443000000.0,1434443000000.0,1434443000000.0,1434443000000.0,1434443000000.0,1434443000000.0,4,42835,443,6,4,172.31.3.224,216.58.212.100,28,9108,8696,6669.871094,16,1512,1288,6669.871094,12,7596,7408,6591.211182,0,91,126,TLS.Google,Web,,,,,14


Our Dataframe have a new column named `packet_with_40_ip_size`.

In some cases, we need volatile features.
Let's have an example use case as following:

* We want to compute the maximum per flow  packet inter arrival time.
* Our feature will be based on iat that we do not want as feature.

Note that such feature already implemented within nfstream statistical features.

In [12]:
class iat(NFPlugin):
    def on_init(self, pkt):
        return [-1, pkt.time] # [iat value, last packet timestamp]
    def on_update(self, pkt, flow):
        flow.iat = [pkt.time - flow.iat[1], pkt.time]

class maximum_iat_ms(NFPlugin):
    def on_init(self, pkt):
        return -1 # we will set it as -1 as init value
    def on_update(self, pkt, flow):
        if flow.iat[0] > flow.maximum_iat_ms:
            flow.maximum_iat_ms = flow.iat[0]

In [13]:
df = NFStreamer(source="pcaps/instagram.pcap", plugins=[iat(volatile=True), maximum_iat_ms()]).to_pandas()

In [14]:
df.head()

Unnamed: 0,id,bidirectional_first_seen_ms,bidirectional_last_seen_ms,src2dst_first_seen_ms,src2dst_last_seen_ms,dst2src_first_seen_ms,dst2src_last_seen_ms,version,src_port,dst_port,protocol,vlan_id,src_ip,dst_ip,bidirectional_packets,bidirectional_raw_bytes,bidirectional_ip_bytes,bidirectional_duration_ms,src2dst_packets,src2dst_raw_bytes,src2dst_ip_bytes,src2dst_duration_ms,dst2src_packets,dst2src_raw_bytes,dst2src_ip_bytes,dst2src_duration_ms,expiration_id,master_protocol,app_protocol,application_name,category_name,client_info,server_info,j3a_client,j3a_server,maximum_iat_ms
0,27,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,80,58216,6,4,31.13.86.52,192.168.0.103,150,153558,151458,1704.681885,103,150456,149014,1704.681885,47,3102,2444,1700.713867,0,7,119,HTTP.Facebook,SocialNetwork,,,,,1246.765137
1,19,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,26540,53,17,4,192.168.0.103,8.8.8.8,2,298,270,46.539062,1,89,75,46.539062,1,209,195,0.0,0,5,211,DNS.Instagram,SocialNetwork,,igcdn-photos-g-a.akamaihd.net,,,46.539062
2,6,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,33976,80,6,4,192.168.0.103,77.67.29.17,34,29039,28563,7361.755127,14,924,728,7361.755127,20,28115,27835,7360.779053,0,0,7,HTTP,Web,,,,,7321.503174
3,29,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,0.0,0.0,4,58690,443,6,4,192.168.0.103,46.33.70.159,2,169,141,0.336182,2,169,141,0.336182,0,0,0,-1.0,0,0,91,TLS,Web,,,,,0.336182
4,25,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,1436721000000.0,4,58052,80,6,4,192.168.0.103,82.85.26.162,75,57239,56189,90.485107,37,2702,2184,90.485107,38,54537,54005,29.175049,0,7,211,HTTP.Instagram,SocialNetwork,,photos-g.ak.instagram.com,,,61.310059


Our Dataframe have a new column named `maximum_iat_ms` containing the maximum observed packet 
inter arrval time per flow and set to -1 when there is only 1 packet.