<h1><span style="color:red">Data Preparation for SuAVE</span></h1>

Several data enhancement operations are included in this notebook:
* identifying the number of header rows, and rows and columns to keep or drop 
* assigning SuAVE qualifiers by introspecting the data (and letting you edit and approve the assignments)
* adding geographic coordinates (in WGS84) based on a selected variable with placenames
* adding geometric information based on a supplied GeoJSON file
* generating images based on a selected text variable

You will have an option to enhance an existing survey dataset passed from Suave, or load a local CSV file.  

Additionally, launch respective notebooks if you need to convert between binary representation of multiple-response variables and SuAVE #multi variables, or prepare ordinal scale variables for analysis

<h1><span style="color:red">Once you retrieved and explored the data file, please run only those cells that you need!</span></h1>

Author: Enrique Sanchez

## 1. Retrieve survey parameters from the URL

In [None]:
%%javascript
function getQueryStringValue (key)
{  
    return unescape(window.location.search.replace(new RegExp("^(?:.*[&\\?]" + escape(key).replace(/[\.\+\*]/g, "\\$&") + "(?:\\=([^&]*))?)?.*$", "i"), "$1"));
}
IPython.notebook.kernel.execute("survey_url='".concat(getQueryStringValue("surveyurl")).concat("'"));
IPython.notebook.kernel.execute("views='".concat(getQueryStringValue("views")).concat("'"));
IPython.notebook.kernel.execute("view='".concat(getQueryStringValue("view")).concat("'"));
IPython.notebook.kernel.execute("user='".concat(getQueryStringValue("user")).concat("'"));
IPython.notebook.kernel.execute("csv_file='".concat(getQueryStringValue("csv")).concat("'")); 
IPython.notebook.kernel.execute("dzc_file='".concat(getQueryStringValue("dzc")).concat("'")); 
IPython.notebook.kernel.execute("params='".concat(getQueryStringValue("params")).concat("'")); 
IPython.notebook.kernel.execute("active_object='".concat(getQueryStringValue("activeobject")).concat("'")); 
IPython.notebook.kernel.execute("full_notebook_url='" + window.location + "'"); 

## 2. Import libraries, and select how to process the data

In [None]:
# common imports
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Markdown, display

import pandas as pd
pd.set_option('display.max_colwidth', 0)
    
import numpy as np
import panel as pn

pn.extension()
def printmd(string):
    display(Markdown(string))

absolutePath = "../../temp_csvs/"

# local imports
import sys
sys.path.insert(1, '../../helpers')
import panel_libs as panellibs
import suave_integration as suaveint

# specific imports
import requests
import re

# Importing scripts
import FileScript as fs
import QualifierSuave as ql
import StringImageSuave as si
import GeoToolsSuave as gt

url_partitioned = full_notebook_url.partition('/operations')
base_url = url_partitioned[0];


<h2><span style="color:red">To launch a notebook for processing #multi and ordinal scale variables, make a selection and click the URL below</span></h2>
Otherwise, continue to step 3


In [None]:
radio_group = pn.widgets.RadioBoxGroup(name='Select notebook', options=['Convert binary variables to #multi', 
                                                                        'Convert #multi to binary',
                                                                        'Recode ordinal scale variables'], 
                                       inline=False)
radio_group

In [None]:
if radio_group.value == 'Convert binary variables to #multi':
    nb_name = "Binary_to_multi"
elif radio_group.value == 'Convert #multi to binary':
    nb_name = "Multi_to_binary"
elif radio_group.value == 'Recode ordinal scale variables':
    nb_name = "Ordinal_recode"
    
import webbrowser
url1 = ('{base_url}/operations/wrangling/{nb_name}.ipynb?'+'surveyurl=' + survey_url + '&' + 'views=' + views + '&' 'view=' + view + '&' + 'user='+user+'&'+'csv='+csv_file+'&'+'dzc='+dzc_file+"&"+'activeobject='+active_object).format(base_url=base_url, nb_name=nb_name)

printmd("<b><span style='color:red'>Click the URL to open the selected notebook:</span></b>")
print(url1)

# webbrowser.open(url1)


## 3. Select a survey file from SuAVE or import a local CSV file

In [None]:
data_select = pn.widgets.RadioBoxGroup(name='Select notebook', options=['Load survey file from SuAVE', 
                                                                        'Import a local CSV file'], 
                                       inline=False)
data_select

In [None]:
data_input = pn.widgets.FileInput()
    
def check_selection():
    if data_select.value == 'Load survey file from SuAVE':
        global fname
        fname = absolutePath + csv_file
        printmd("<b><span style='color:red; font-size: 200%;'>Current SuAVE survey will be loaded. Continue to step 4.</span></b>")

    else:
        message = pn.pane.HTML("<b><span style='color:red; font-size: 200%;'>Upload data and continue to step 4.</span><br><span style='font-size: 150%;'>IMPORTANT: The local CSV file should not have SuAVE-specific variable names!</span></b>", width=700)
        return pn.Column(message, data_input)
    
check_selection()

## 4. Explore the data and define the dataframe to work with

In [None]:
if not pd.isnull(data_input.filename):
    fname = absolutePath + data_input.filename
    data_input.save(fname)
df = fs.updated_df
# visualize the dataframe
with pd.option_context("display.max_columns", None):
    if any("geometry" in col for col in df.columns):
        display(df.drop(['geometry'],axis=1))
    else:
        display(df)
    


In [None]:
# Define a dataframe subset
fs.view_data(fname)



## 5. Generate & Edit Qualifiers

In [None]:
printmd("<b><span style='color:red'>If you see an error message, you probably haven't clicked 'Finish & Save Data' in the previous dataframe view.</span></b>")

ql.qualifier_editor()

In [None]:
# Local updated data frame
df = ql.updated_df

## 6. Geocoder: placenames to point coordinates (Optional)
Select a placename variable and generate Latitude and Longitude columns

In [None]:
gt.geocoder(ql.stored_text)

In [None]:
# Local updated data frame
df = ql.updated_df

## 7. GeoJSON to Geometry (Optional)
Generate a 'geometry' column based on an external GeoJSON file. One of feature properties in the GeoJSON file should have feature names, to match feature names in the survey file.

In [None]:
file = pn.widgets.FileInput()
file

In [None]:
gt.json_to_geometry(file.value, ql.stored_text)

In [None]:
# Local updated data frame
df = ql.updated_df

## 8. Generate images based on text values
Creates a set of images based on a selected variable for use with SuAVE

In [None]:
si.image_display(df, ql.stored_text, full_notebook_url.split('/qualgeoimage')[0])

In [None]:
# Local updated data frame
df = ql.updated_df

## 9. Final Data
Explore the dataframe before generating a new survey from it

In [None]:
df = ql.updated_df.fillna('')
panellibs.slider(df)

## 10. Generate a new survey and open it in SuAVE

In [None]:
if data_select.value == 'Import a local CSV file':
    csv_file = data_input.filename
    dzc_file = ''
    
new_file = suaveint.save_csv_file(updated_df, absolutePath, csv_file)

In [None]:
#Input survey name

import ipywidgets as widgets
from IPython.display import display

input_text = widgets.Text(placeholder='Enter Survey Name...')
output_text = widgets.Text()

def bind_input_to_output(sender):
    output_text.value = input_text.value

# Tell the text input widget to call bind_input_to_output() on submit
input_text.on_submit(bind_input_to_output)

printmd("<b><span style='color:red'>Input survey name here, press Enter, and then run the next cell:</span></b>")
# Display input text box widget for input
display(input_text)

display(output_text)

In [None]:
#Print survey name
survey_name = output_text.value
printmd("<b><span style='color:red'>Survey Name is: </span></b>" + survey_name)

In [None]:
suaveint.create_survey(survey_url,new_file, survey_name, dzc_file, user, csv_file, view, views, data_select.value)

## Explore the data frame with HoloViz

In [None]:
nb_name = 'holoviz/holoviz.ipynb'
import webbrowser
url1 = ('{base_url}/operations/{nb_name}?'+'surveyurl=' + survey_url + '&' + 'user='+user+'&'+'csv='+new_file.split('/')[-1]+'&'+'dzc='+dzc_file+"&"+'activeobject=null').format(base_url=base_url, nb_name=nb_name)

printmd("<b><span style='color:red'>Click the URL to open the selected notebook:</span></b>")
print(url1)