<h1><span style="color:red">Data Preparation for SuAVE</span></h1>

Several data enhancement operations are included in this notebook:
* identifying the number of header rows, and ros and columns to keep or drop 
* assigning SuAVE qualifiers by introspecting the data (and letting you edit and approve the assignments)
* adding geographic coordinates (in WGS84) based on a selected variable with placenames
* adding geometric information based on a supplied GeoJSON file
* generating images based on a selected text variable

You will have an option to enhance an existing survey dataset passed from Suave, or load a local CSV file.  

Additionally, launch respective notebooks if you need to convert between binary representation of multiple-response variables and SuAVE #multi variables, or prepare Likert scale variables for analysis

<h1><span style="color:red">Once you retrieved and explored the data file, please run only those cells that you need!</span></h1>

Author: Enrique Sanchez

## 1. Retrieve survey parameters from the URL

In [None]:
%%javascript
function getQueryStringValue (key)
{  
    return unescape(window.location.search.replace(new RegExp("^(?:.*[&\\?]" + escape(key).replace(/[\.\+\*]/g, "\\$&") + "(?:\\=([^&]*))?)?.*$", "i"), "$1"));
}
IPython.notebook.kernel.execute("survey_url='".concat(getQueryStringValue("surveyurl")).concat("'"));
IPython.notebook.kernel.execute("views='".concat(getQueryStringValue("views")).concat("'"));
IPython.notebook.kernel.execute("view='".concat(getQueryStringValue("view")).concat("'"));
IPython.notebook.kernel.execute("user='".concat(getQueryStringValue("user")).concat("'"));
IPython.notebook.kernel.execute("csv_file='".concat(getQueryStringValue("csv")).concat("'")); 
IPython.notebook.kernel.execute("dzc_file='".concat(getQueryStringValue("dzc")).concat("'")); 
IPython.notebook.kernel.execute("params='".concat(getQueryStringValue("params")).concat("'")); 
IPython.notebook.kernel.execute("active_object='".concat(getQueryStringValue("activeobject")).concat("'")); 
IPython.notebook.kernel.execute("full_notebook_url='" + window.location + "'"); 

## 2. Import libraries, and select how to process the data

In [None]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import pandas as pd
from IPython.display import Markdown, display

# Importing additional libraries
import panel as pn
import requests
import re

# Loading extensions
pn.extension()

# Importing scripts
import FileScript as fs
import QualifierSuave as ql
import StringImageSuave as si
import GeoToolsSuave as gt

def printmd(string):
    display(Markdown(string))

absolutePath = "../../temp_csvs/"


<h2><span style="color:red">To launch a notebook for processing #multi and Likert scale variables, make a selection and click the URL below</span></h2>
Otherwise, continue to step 3


In [None]:
radio_group = pn.widgets.RadioBoxGroup(name='Select notebook', options=['Convert binary variables to #multi', 
                                                                        'Convert #multi to binary',
                                                                        'Recode Likert scale variables'], 
                                       inline=False)
radio_group

In [None]:
if radio_group.value == 'Convert binary variables to #multi':
    nb_name = "Binary_to_multi"
elif radio_group.value == 'Convert #multi to binary':
    nb_name = "Binary_to_multi"
elif radio_group.value == 'Recode Likert scale variables':
    nb_name = "Likert_recode"
    
import webbrowser
url_partitioned = full_notebook_url.partition('/operations')
base_url = url_partitioned[0];
url1 = ('{base_url}/operations/wrangling/{nb_name}.ipynb?'+'surveyurl=' + survey_url + '&' + 'views=' + views + '&' 'view=' + view + '&' + 'user='+user+'&'+'csv='+csv_file+'&'+'dzc='+dzc_file+"&"+'activeobject='+active_object).format(base_url=base_url, nb_name=nb_name)

printmd("<b><span style='color:red'>Click the URL to open the selected notebook:</span></b>")
print(url1)

# webbrowser.open(url1)


<bold><hr></bold>
<h2><span style="color:red">3. Execute one block of cells, under 3a, or 3b, to:</span></h2>
<h3>
<span style="color:red">
<ul>
    <li>3a: Load survey file from SuAVE</li>
    <li>3b: Import a local CSV file</li>
</ul>    
</span>
</h3>
<h2><span style="color:red">3a: Load survey file from SuAVE</span></h2>


In [None]:
fname = absolutePath + csv_file
printmd("<b><span style='color:red'>Now continue to step 4</span></b>")

# fs.view_data(absolutePath + csv_file)

<h2><span style="color:red">3b: Import a local CSV file</span></h2>


In [None]:
data = pn.widgets.FileInput()
data

In [None]:
fname = absolutePath + data.filename
data.save(fname)
printmd("<b><span style='color:red'>Now continue to step 4</span></b>")


## 4. Explore the data and define the dataframe to work with

In [None]:
fs.view_data(fname)

## 5. Generate & Edit Qualifiers

In [None]:
ql.qualifier_editor()

In [None]:
# Local updated data frame
df = ql.updated_df

## 6. Geocoder (Optional)
Select a placename variable and generate Latitude and Longitude columns

In [None]:
gt.geocoder(ql.stored_text)

In [None]:
# Local updated data frame
df = ql.updated_df

## 7. GeoJSON to Geometry
Generate a 'geometry' column based on an external GeoJSON file. One of feature properties in the GeoJSON file should have feature names, to match feature names in the survey file.

In [None]:
file = pn.widgets.FileInput()
file

In [None]:
gt.json_to_geometry(file.value, ql.stored_text)

In [None]:
# Local updated data frame
df = ql.updated_df

## 8. Generate images based on text values
Creates a set of images based on a selected variable for use with SuAVE

In [None]:
si.image_display(df, ql.stored_text)

In [None]:
# Local updated data frame
df = ql.updated_df

## 9. Final Data
Explore the dataframe before generating a new survey from it

In [None]:
df = ql.updated_df
ql.slider(df)

## 10. Generate a new survey and open it in SuAVE

In [None]:
def printmd(string):
    display(Markdown(string))

#Input survey name

input_csv = widgets.Text()
output_csv = widgets.Text()

def bind_input_to_output1(sender):
    output_csv.value = input_csv.value

# Tell the text input widget to call bind_input_to_output() on submit
input_csv.on_submit(bind_input_to_output1)

printmd("<b><span style='color:red'>Input temporary csv name here, " + 
        "press Enter, and then run the next cell:</span></b>")
# Display input text box widget for input
display(input_csv)
display(output_csv)

In [None]:
absolutePath = "../../temp_csvs/"
csv_file = input_csv.value.replace(' ', '_')

# new filename
new_file = absolutePath + csv_file + '_v1.csv'
printmd("<b><span style='color:red'>A new temporary file will be created at: </span></b>")
print(new_file)
df.to_csv(new_file, index=None)

In [None]:
#Input survey name

input_text = widgets.Text()
output_text = widgets.Text()

def bind_input_to_output2(sender):
    output_text.value = input_text.value

# Tell the text input widget to call bind_input_to_output() on submit
input_text.on_submit(bind_input_to_output2)

printmd("<b><span style='color:red'>Input survey name here, " +
        "press Enter, and then run the next cell:</span></b>")
# Display input text box widget for input
display(input_text)
display(output_text)

In [None]:
#Print survey name
survey_name = output_text.value
printmd("<b><span style='color:red'>Survey Name is: </span></b>" + survey_name)

In [None]:
#Input SuAVE username

input_user = widgets.Text()
output_user = widgets.Text()

def bind_input_to_output3(sender):
    output_user.value = input_user.value

# Tell the text input widget to call bind_input_to_output() on submit
input_user.on_submit(bind_input_to_output3)

printmd("<b><span style='color:red'>Input SuAVE username here, " + 
        "press Enter, and then run the next cell:</span></b>")
# Display input text box widget for input
display(input_user)
display(output_user)

In [None]:
# need to un-hardcode
user = input_user.value
survey_url = 'http://suave-dev.sdsc.edu/main/file=' + user + '_' + survey_name + '.csv'
referer = survey_url.split("/main")[0] +"/"
upload_url = referer + "uploadCSV"
new_survey_url_base = survey_url.split(user)[0]
views = ''
view = ''
dzc_file = ''

csv = {"file": open(new_file, "rb")}
upload_data = {
    'name': input_text.value,
    'dzc': dzc_file,
    'user':user
}
headers = {
    'User-Agent': 'suave user agent',
    'referer': referer
}

r = requests.post(upload_url, files=csv, data=upload_data, headers=headers)

if r.status_code == 200:
    printmd("<b><span style='color:red'>New survey created successfully</span></b>")
    regex = re.compile('[^0-9a-zA-Z_]')
    s_url = survey_name
    s_url =  regex.sub('_', s_url)

    url = new_survey_url_base + user + "_" + s_url + ".csv" + "&views=" + views + "&view=" + view
    adjusted_url = url[:-13]
    print(adjusted_url)
    printmd("<b><span style='color:red'>Click the URL to open the new survey</span></b>")
else:
    printmd("<b><span style='color:red'>Error creating new survey. Check if a survey with this name already exists.</span></b>")
    printmd("<b><span style='color:red'>Reason: </span></b>"+ str(r.status_code) + " " + r.reason)

## Explore with HoloViz

In [None]:
%%javascript
function getQueryStringValue (key)
{  
    return unescape(window.location.search.replace(new RegExp("^(?:.*[&\\?]" + escape(key).replace(/[\.\+\*]/g, "\\$&") + "(?:\\=([^&]*))?)?.*$", "i"), "$1"));
}
IPython.notebook.kernel.execute("full_notebook_url='" + window.location + "'"); 

In [None]:
holoviz_notebook = full_notebook_url.split('operations')[0]+'operations/holoviz/holoviz.ipynb?'
survey = 'surveyurl=' + adjusted_url + '&user=' + user + '&csv=' + new_file.split('/')[-1]
holoviz_url = holoviz_notebook+survey

printmd("<b><span style='color:red'>Click the URL to open the HoloViz notebook:</span></b>")
print(holoviz_url)