# Bring Your Own Data (BYOD) Outlier Detection

In this notebook, we show a simple use-case of our system using [OECD](https://data.oecd.org/) dataset. In the dataset, we detect three different types of outliers:
* Global outliers: values that rarely appear in the real-world data. 
* Local outliers: values that are different from other values in the same attribute. 
* Null outliers: values that have no meaning

 ## Setup
 
 * Setup __HOME__ directory
 * Setup pandas options to display full dataframes

In [1]:
%load_ext autoreload
%autoreload 2

from pathlib import Path
from labext.prelude import M, A, W

M.DataTable.register()

__HOME__ = Path("../byod-cleaning-api")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [2]:
import pandas as pd

# options to display full dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

In [3]:
# Input csv file is read into pandas DataFrame
input_file = __HOME__ / "data/aid_worker.csv"

df = pd.read_csv(input_file, dtype=str, keep_default_na=False)
df = df.applymap(lambda x: x.strip())

## Outlier Detection
---------------------------------
BYOD outlier detection service is deployed at https://bclean.mint.isi.edu/detect. 

The `POST` request takes data as follows:
```json
{
    "table":{
        "column1": ["val1", "val2"],
        "column2": ["val3", "val4"]
    }
}
```

In [4]:
data = df.to_dict(orient="list")

----------------------------------------
The response data has the following form:
```json
{
    "table":{
        "column1": ["[[[val1]]]", "val2"],
        "column2": ["val3", "val4"]
    }
}
```
where `[[[value]]]` denotes the outliers

In [12]:
from requests.auth import HTTPBasicAuth
import requests

auth = HTTPBasicAuth('mint', 'asf12jkj!%&')

# response = requests.post("https://bclean.mint.isi.edu/detect", json={"table": data}, auth=auth) # for deployed service
response = requests.post("http://127.0.0.1:5000//detect", json={"table": data}, auth=auth) # for local
result_df = pd.DataFrame.from_dict(response.json()["table"], orient="index").transpose()

--------------------------------
Outliers are annotated as `[[[value]]]`. For example, in the first column `GDP per capita`, all values are global outliers since the regex pattern `[0-9]+ [0-9]+` rarely appears in real-world data.

In [10]:
# show result in the same column order as original file
result_df[df.columns]

Unnamed: 0,Incident ID,Year,Month,Day,Country,Region,District,City,UN,INGO,LNGO/NRCS,ICRC,IFRC,Other,Nationals killed,Nationals wounded,Nationals kidnapped,Total nationals,Internationals killed,Internationals wounded,Internationals kidnapped,Total internationals,Total killed,Total wounded,Total kidnapped,Total affected,Gender Male,Gender Female,Gender Unknown,Means of attack,Attack context,Location,Latitude,Longitude,Actor type,Actor name,Details,Verified,Source
0,22,1997,9,24,Ethiopia,Ogaden,"[[[]]] (True, False, True)","[[[]]] (True, False, True)",2,0,0,0,0,0,2,0,0,2,0,0,0,0,2,0,0,2,1,0,1,Shooting,Individual attack,Unknown,8.53056,44.795,Unknown,Unknown,"[[[2 UN national staffers shot dead in apparent robbery attempt in Ogaden region Sept 24.]]] (True, False, False)",Archived,Archived
1,47,1998,6,25,Ethiopia,Somali,"[[[]]] (True, False, True)","[[[travelling from Gode to Degeh Bur]]] (False, True, False)",0,0,0,6,0,0,0,0,0,0,0,0,6,6,0,0,6,6,1,0,5,Kidnapping,Ambush,Road,7.0,44.0,"[[[Non-state armed group: Regional]]] (True, False, False)","[[[Al-Itihaad al-Islamiya]]] (False, True, False)","[[[6 ICRC international staff (1 Swiss, 5 Somali) abducted when travelling from Gode to Degeh Bur in three marked vehicles on June 25. On July 3 the Islamic group al-Ittihad al-Islami claimed responsibility, stating that the hostages were under investigation.]]] (True, False, False)",Archived,Archived
2,73,1999,4,"[[[]]] (True, True, True)",Ethiopia,"[[[]]] (True, False, True)","[[[]]] (True, False, True)","[[[]]] (True, False, True)",0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,0,0,1,Kidnapping,Unknown,Unknown,9.145,40.489673,Unknown,Unknown,"[[[1 INGO international (French) staff kidnapped and later released]]] (True, False, False)",Archived,Archived
3,103,2000,2,"[[[]]] (True, True, True)",Ethiopia,"[[[]]] (True, False, True)","[[[]]] (True, False, True)","[[[]]] (True, False, True)",0,2,0,0,0,0,1,0,0,1,0,1,0,1,1,1,0,2,0,0,2,Unknown,Ambush,Unknown,9.145,40.489673,Unknown,Unknown,"[[[1 INGO national staff killed and 1 international staff wounded when vehicle ambushed.]]] (True, False, False)",Archived,Archived
4,475,2006,9,20,Ethiopia,"[[[]]] (True, False, True)","[[[]]] (True, False, True)","[[[]]] (True, False, True)",0,0,0,2,0,0,0,0,1,1,0,0,1,1,0,0,2,2,0,0,2,Kidnapping,Unknown,Unknown,9.145,40.489673,Unknown,Unknown,"[[[2 ICRC (1 international [Irish], 1 national) staff kidnapped, Sept 18; released unharmed on Sept 23.]]] (True, True, False)",Archived,Archived
5,782,2008,"[[[]]] (True, True, True)","[[[]]] (True, True, True)",Ethiopia,"[[[]]] (True, False, True)","[[[]]] (True, False, True)","[[[]]] (True, False, True)",0,0,3,0,0,0,3,0,0,3,0,0,0,0,3,0,0,3,0,0,3,Landmine,Ambush,Road,9.145,40.489673,Unknown,Unknown,Three national staff of a local partner organization of an INGO were killed in a mine incident returning from an area they worked under a sub-grant.,Yes,Focal Point
6,793,2008,7,1,Ethiopia,"[[[]]] (True, False, True)","[[[]]] (True, False, True)","[[[Road, between Dire Dawa and Addis Ababa]]] (True, True, False)",1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,1,0,0,Shooting,Ambush,Road,9.02497,38.74689,Unknown,Unknown,"[[[1 UN national staff in Ethiopia was shot and injured by armed robbers on the road between Dire Dawa and Addis Ababa, while travelling on private business , 1 July 2008.]]] (True, False, False)",Yes,Focal Point
7,942,2009,3,1,Ethiopia,Gode,"[[[]]] (True, False, True)","[[[]]] (True, False, True)",1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,0,Bodily assault,"[[[Mob violence]]] (False, True, False)",Unknown,5.9526975,43.5522312,Unknown,Unknown,"[[[1 UN national staff injured when her vehicle was attacked by civilians thowing stones.]]] (True, False, False)",Yes,Focal Point
8,964,2010,1,3,Ethiopia,Gode,"[[[]]] (True, False, True)","[[[]]] (True, False, True)",1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,1,0,0,Bodily assault,Detention,Unknown,5.9526975,43.5522312,"[[[Host State]]] (False, True, False)","[[[Ethiopian Defence Forces]]] (True, True, False)","[[[1 UN national staff injured when physically assaulted by EDF military forces for carrying a VHF radio.]]] (True, False, False)",Yes,Focal Point
9,1069,2010,3,23,Ethiopia,"[[[Oromiya]]] (False, False, True)","[[[]]] (True, False, True)","[[[Rayitu]]] (False, False, True)",2,0,0,0,0,0,0,2,0,2,0,0,0,0,0,2,0,2,2,0,0,Unknown,"[[[Mob violence]]] (False, True, False)",Road,8.0,39.0,Unaffiliated,Oromo clan,"[[[A convoy of contracted trucks was attacked with rudimentary weapons by the Oromo clan and food looted while on the road through Oromo territory; two drivers were slightly injured.]]] (True, True, False)",Yes,Focal Point
