# BASS Experiment
***TODO: Make this functional!***

## Import Libs
For this project, we don't need anything too fancy. Just the math, numpy, and pandas libraries.

In [475]:
import pandas as pd
import numpy as np
import math

## Get Data
### Display a maximum of 50 columns
The dataset has 50 columns. Pandas hides columns when there are too many.

To view them all, we need to set the maximum viewable columns of Pandas to 50.

In [476]:
pd.options.display.max_columns = 50

### Import data from BASS csv file
Data copied from: [https://bass.bnshosting.net/api/scanresults?_format=csv](https://bass.bnshosting.net/api/scanresults?_format=csv)

In [477]:
data = pd.read_csv('./scan_results.csv')

  interactivity=interactivity, compiler=compiler, result=result)


## Clean data

### Get only the columns needed
Ignore the columns we don't need. We only need these columns to clean our data and to find the best connection.

In [478]:
needed_data = data.ix[:, ('BANDWIDTH', 'OPERATOR', 'CONNECTIVITY_extraInfo', 'LOCATION_mLatitude', 'LOCATION_mLongitude', 'CONNECTIVITY_typeName')]

### Remove WIFI-connected data
We focus on the data from mobile sources rather than from the ones from WiFI.

In [479]:
needed_data = needed_data[needed_data['CONNECTIVITY_typeName'] == 'MOBILE']

### Transform  `CONNECTIVITY_extraInfo` and `OPERATOR` values

#### Create table for simplifying values

In [480]:
operator_values = [("smart", "smart"), ("globe", "globe"), ("talk", "tnt"), ("TNT", "tnt"), ("tm", "tm"), ("sun", "sun")]
extrainfo_values = [("internet.globe.com.ph", "globe")]

#### Define force_value function and helpers
`force_value` replaces a string with another string if it contains a certain string.

Example:
```python
force_value("globe telecom", operator_values) # This will result in "globe"
```

In [481]:
def force_value(operator, values):
    if operator is None:
        return "invalid"
    for contained, forced in values:
        if contained in str(operator).lower():
            return forced
    return "invalid"

def force_operator(x):
    return force_value(x, operator_values)
    
def force_extrainfo(x):
    return force_value(x, extrainfo_values)

#### Apply transformation

##### Transform `OPERATOR` column to accepted values

In [482]:
needed_data['OPERATOR'] = needed_data['OPERATOR'].apply(force_operator)

##### Transform `CONNECTIVITY_extraInfo` column to accepted values

In [483]:
needed_data['CONNECTIVITY_extraInfo'] = needed_data['CONNECTIVITY_extraInfo'].apply(force_extrainfo)

### Filter rows with only valid values

#### Define filter functions

In [484]:
def not_invalid_str(x):
    return x != "invalid"

def not_null_str(x):
    return x != "null"

def string_type(x):
    return type(x) == str

def not_zero(x):
    return x != 0

def bandwidth_to_float(x):
    return float(x.replace(" Kbps", ""))

def filter_bandwidth(bandwidth):
    if type(bandwidth) == str and "Kbps" not in bandwidth:
        return "invalid"
    else:
        return bandwidth

#### Remove invalid rows by with filters for invalid data

In [485]:
operator_filter  = needed_data['OPERATOR'].apply(not_invalid_str)
extrainfo_filter = needed_data['CONNECTIVITY_extraInfo'].apply(not_invalid_str)
latitude_filter  = needed_data['LOCATION_mLatitude'].apply(not_null_str)
longitude_filter = needed_data['LOCATION_mLongitude'].apply(not_null_str)
bandwidth_filter = needed_data['BANDWIDTH'].apply(string_type)
combined_filters = np.logical_and.reduce((operator_filter, extrainfo_filter, latitude_filter, longitude_filter, bandwidth_filter))
filtered_data = needed_data[combined_filters]

#### Removed 'Kbps' from `BANDWIDTH` data

In [486]:
filtered_data['BANDWIDTH'] = filtered_data['BANDWIDTH'].apply(filter_bandwidth)
filtered_data['BANDWIDTH'] = filtered_data['BANDWIDTH'].apply(bandwidth_to_float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


### Define Euclidean distance function
We need the Euclidean `get_distance` function to get the distances between the other points from the given point.

In [487]:
X = 0
Y = 1
def get_distance(coords, target_coords):
    return math.sqrt(((coords[X] - target_coords[X]) ** 2) + ((coords[Y] - target_coords[Y]) ** 2))

## Define function for getting best operator

In [488]:
def get_best_operator(data, latitude, longitude, k=10):
    # Create a new `DISTANCE` column which is the distance of each point to the current location.
    data["DISTANCE"] = data[["LOCATION_mLatitude", "LOCATION_mLongitude"]].apply(lambda x: get_distance((float(x["LOCATION_mLatitude"]), float(x["LOCATION_mLongitude"])), (latitude, longitude)), axis=1)
    
    # Remove all measurements from the current location, i.e., euclidean distance is zero
    all_points_not_including_itself = data[data["DISTANCE"].apply(not_zero)]
    
    # Sort data in ascending order according to distance
    sorted_according_to_distance = all_points_not_including_itself.sort_values("DISTANCE")
    
    # Get nearest k locations -- default k is 10
    nearest_k_points = sorted_according_to_distance[:k]
    
    # Sort according to bandwidth in descending order
    sorted_according_to_bandwidth = nearest_k_points.sort_values("BANDWIDTH", ascending=False)
    
    # Get first row
    target_row = sorted_according_to_bandwidth[:1]
    
    # Get OPERATOR column
    target_column = target_row["OPERATOR"]
    
    # Target value
    target = target_column.iloc[0]
    
    # Target value in upper case
    target_uppercase = target.upper()
    
    return target_uppercase

### Run on given latitude and longitude

In [489]:
latitude = 11.24079
longitude = 125.00229
get_best_operator(filtered_data, latitude, longitude)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


'GLOBE'