# Band Depth

Given an ensemble of data drawn from a distribution F, _data depth_ quantifies how central (or deep) is a particular sample within the cloud of the sampled data. The deeper samples are considered more representative of the ensemble and are assigned high depth values whereas samples farther away from the rest of the ensemble are considered to be outliers and are correspondingly assigned lower depth values. Therefore, the notion of data depth provides a center outward ordering (also known as order statistics) for an ensemble of sampled data. [Mirzagar, Whitaker (2014)](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6875964)

The work of [Lopez-Pintado, Romo (2009)](https://www.tandfonline.com/doi/pdf/10.1198/jasa.2009.0108) introduces the notion of _functional band depth_, a generalization of data depth for a higher dimension that is designed for ensembles of functions. The difference between the notion of functional band depth and other generalizations for higher dimensions is that it goes beyond the point-wise analysis of functional data, it provides a measure of centrality of a function among an ensemble of functional data that is both sensitive to the shape and the position of a function in comparison to the rest of ensemble members.

In [1]:
from curves import plot_lines, functional_boxplot, functional_boxplot_from_df, split_datasets
from bokeh.plotting import show, output_file

In [6]:
output_file('taxis_v3_top50.html')
plot_taxis_v3 = plot_lines('../data/taxis_v3.csv',"../outputs/taxis_v3_out.txt",'fd','1/1/2014','12/31/2018', 20, use_top50=False)
show(plot_taxis_v3)

(36, 1826)


In [7]:
output_file('fbplot_taxisv1.html')
fbplot_taxisv1 = functional_boxplot('../data/taxis_v1.csv',"../outputs/taxis_v1_out.txt",'01/01/2018','12/31/2018','tmd')
show(fbplot_taxisv1)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  data_outliers = pd.concat([data_outliers,raw_outliers[col]],axis=1)


(36, 53)
(36, 5)
DatetimeIndex(['2018-01-01', '2018-12-25', '2018-12-26', '2018-08-19',
               '2018-12-30'],
              dtype='datetime64[ns]', freq=None)


In [8]:
data_weekdays_taxi, data_weekends_taxi = split_datasets('../data/taxis_v1.csv',"../outputs/taxis_v1_out.txt",'01/01/2018','12/31/2018')
print(data_weekdays_taxi.shape, data_weekends_taxi.shape)

(36, 261) (36, 104)


In [9]:
fbplot_taxisv1_weekday_od = functional_boxplot_from_df(data_weekdays_taxi,'od','Taxi trips on weekdays using original depth')
show(fbplot_taxisv1_weekday_od)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  data_outliers = pd.concat([data_outliers,raw_outliers[col]],axis=1)


(36, 5)
(36, 5)
DatetimeIndex(['2018-01-01', '2018-04-03', '2018-04-19', '2018-05-31',
               '2018-01-04'],
              dtype='datetime64[ns]', freq=None)
