<h1><span style="color:red">Data Preparation for SuAVE</span></h1>

Several data enhancement operations are included in this notebook:
* identifying the number of header rows, and rows and columns to keep or drop 
* assigning SuAVE qualifiers by introspecting the data (and letting you edit and approve the assignments)
* adding geographic coordinates (in WGS84) based on a selected variable with placenames
* adding geometric information based on a supplied GeoJSON file
* generating images based on a selected text variable

You will have an option to enhance an existing survey dataset passed from Suave, or load a local CSV file.  

Additionally, launch respective notebooks if you need to convert between binary representation of multiple-response variables and SuAVE #multi variables, or prepare ordinal scale variables for analysis

<h1><span style="color:red">Once you retrieved and explored the data file, please run only those cells that you need!</span></h1>

Author: Enrique Sanchez

## 1. Retrieve survey parameters from the URL

In [1]:
%%javascript
function getQueryStringValue (key)
{  
    return unescape(window.location.search.replace(new RegExp("^(?:.*[&\\?]" + escape(key).replace(/[\.\+\*]/g, "\\$&") + "(?:\\=([^&]*))?)?.*$", "i"), "$1"));
}
IPython.notebook.kernel.execute("survey_url='".concat(getQueryStringValue("surveyurl")).concat("'"));
IPython.notebook.kernel.execute("views='".concat(getQueryStringValue("views")).concat("'"));
IPython.notebook.kernel.execute("view='".concat(getQueryStringValue("view")).concat("'"));
IPython.notebook.kernel.execute("user='".concat(getQueryStringValue("user")).concat("'"));
IPython.notebook.kernel.execute("csv_file='".concat(getQueryStringValue("csv")).concat("'")); 
IPython.notebook.kernel.execute("dzc_file='".concat(getQueryStringValue("dzc")).concat("'")); 
IPython.notebook.kernel.execute("params='".concat(getQueryStringValue("params")).concat("'")); 
IPython.notebook.kernel.execute("active_object='".concat(getQueryStringValue("activeobject")).concat("'")); 
IPython.notebook.kernel.execute("full_notebook_url='" + window.location + "'"); 

<IPython.core.display.Javascript object>

## 2. Import libraries, and select how to process the data

In [2]:
# common imports
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import Markdown, display

import pandas as pd
pd.set_option('display.max_colwidth', 0)
    
import numpy as np
import panel as pn

pn.extension()
def printmd(string):
    display(Markdown(string))

absolutePath = "/home/jovyan/jupyter-suave/temp_csvs/"

# local imports
import sys
sys.path.insert(1, '../../helpers')
import panel_libs as panellibs
import suave_integration as suaveint

# specific imports
import requests
import re

# Importing scripts
import FileScript as fs
import QualifierSuave as ql
import StringImageSuave as si
import GeoToolsSuave as gt

url_partitioned = full_notebook_url.partition('/operations')
base_url = url_partitioned[0];


<h2><span style="color:red">To launch a notebook for processing #multi and ordinal scale variables, make a selection and click the URL below</span></h2>
Otherwise, continue to step 3


In [3]:
radio_group = pn.widgets.RadioBoxGroup(name='Select notebook', options=['Convert binary variables to #multi', 
                                                                        'Convert #multi to binary',
                                                                        'Recode ordinal scale variables'], 
                                       inline=False)
radio_group

In [7]:
if radio_group.value == 'Convert binary variables to #multi':
    nb_name = "Binary_to_multi"
elif radio_group.value == 'Convert #multi to binary':
    nb_name = "Multi_to_binary"
elif radio_group.value == 'Recode ordinal scale variables':
    nb_name = "Ordinal_recode"
    
import webbrowser
url1 = ('{base_url}/operations/wrangling/{nb_name}.ipynb?'+'surveyurl=' + survey_url + '&' + 'views=' + views + '&' 'view=' + view + '&' + 'user='+user+'&'+'csv='+csv_file+'&'+'dzc='+dzc_file+"&"+'activeobject='+active_object).format(base_url=base_url, nb_name=nb_name)

printmd("<b><span style='color:red'>Click the URL to open the selected notebook:</span></b>")
print(url1)

# webbrowser.open(url1)


<b><span style='color:red'>Click the URL to open the selected notebook:</span></b>

https://jupyter-suave.nrp-nautilus.io/user/jkaminsky@ucsd.edu/notebooks/jupyter-suave/operations/wrangling/Binary_to_multi.ipynb?surveyurl=https://suave-net.sdsc.edu/main/file=joeykaminsky2_Tester_13.csv&views=&view=grid&user=joeykaminsky2&csv=joeykaminsky2_Tester_13.csv&dzc=https://dzgen.sdsc.edu/dzgen/lib-staging-uploads/6eb3af87e3c855ed01cdaad5591b4722/content.dzc&activeobject=null


## 3. Select a survey file from SuAVE or import a local CSV file

In [4]:
data_select = pn.widgets.RadioBoxGroup(name='Select notebook', options=['Load survey file from SuAVE', 
                                                                        'Import a local CSV file'], 
                                       inline=False)
data_select

In [5]:
data_input = pn.widgets.FileInput()
    
def check_selection():
    if data_select.value == 'Load survey file from SuAVE':
        global fname
        fname = absolutePath + csv_file
        printmd("<b><span style='color:red; font-size: 200%;'>Current SuAVE survey will be loaded. Continue to step 4.</span></b>")

    else:
        message = pn.pane.HTML("<b><span style='color:red; font-size: 200%;'>Upload data and continue to step 4.</span><br><span style='font-size: 150%;'>IMPORTANT: The local CSV file should not have SuAVE-specific variable names!</span></b>", width=700)
        return pn.Column(message, data_input)
    
check_selection()

<b><span style='color:red; font-size: 200%;'>Current SuAVE survey will be loaded. Continue to step 4.</span></b>

## 4. Explore the data and define the dataframe to work with

In [10]:
if not pd.isnull(data_input.filename):
    fname = absolutePath + data_input.filename
    data_input.save(fname)
df = fs.updated_df
# visualize the dataframe
with pd.option_context("display.max_columns", None):
    if any("geometry" in col for col in df.columns):
        display(df.drop(['geometry'],axis=1))
    else:
        display(df)
    


Unnamed: 0,Name,OAID#link#multi,Affiliation#sortquan,City#sortquan,Region#sortquan,Country#sortquan,Latitude#hidden,Longitude#hidden,Collaborators#multi#link#sortquan,Scope#multi#sortquan,Keywords#multi#sortquan,OA concepts#multi#sortquan,Publications#hidden,Publication Dates#multi#sortquan,#img,#netvis
0,A Olioso,https://openalex.org/A4227955457,Unknown,,,,,,https://openalex.org/A4227955454|https://openalex.org/A4227955461|https://openalex.org/A4227955455|https://openalex.org/A4227955463|https://openalex.org/A4227955453|https://openalex.org/A4227955464|https://openalex.org/A4227955456|https://openalex.org/A4227955462|https://openalex.org/A4227955460|https://openalex.org/A4227955459|https://openalex.org/A4227955452|https://openalex.org/A4227955458,aquifer|transboundary,,Groundwater|Geology|Geotechnical engineering|Hydrology (agriculture)|Aquifer|Environmental science|Computer science|Water resource management,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A4227955457"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2021,US,02ac504b6e11517e2110d174ea70a1a7ac1cf19899e1a0f23c29558f6225db03
1,A Olioso,https://openalex.org/A4226682424,Unknown,,,,,,https://openalex.org/A4226682420|https://openalex.org/A4226682425|https://openalex.org/A4226682421|https://openalex.org/A4226682429|https://openalex.org/A4226682431|https://openalex.org/A4226682426|https://openalex.org/A4226682422|https://openalex.org/A4226682428|https://openalex.org/A4226682427|https://openalex.org/A4226682419|https://openalex.org/A4226682423|https://openalex.org/A4226682430,aquifer|transboundary,,Groundwater|Geology|Geotechnical engineering|Hydrology (agriculture)|Aquifer|Environmental science|Water resource management,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A4226682424"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2021,US,8f95a1d08aacc416f1abe22426fe9c9fd2f8f338bb7365407f284e3985165d23
2,A. Alassane,https://openalex.org/A2484425674,Cheikh Anta Diop University,Dakar,,Senegal,14.686944,-17.463333,https://openalex.org/A2434763705|https://openalex.org/A3069707669|https://openalex.org/A2182351332|https://openalex.org/A3051995119,aquifer|transboundary,,Sociology|Population|Water supply|Demography|Groundwater|Water quality|Ecology|Geology|Geotechnical engineering|Groundwater recharge|Hydrology (agriculture)|Environmental engineering|Aquifer|Biology|Environmental science|Water resource management,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A2484425674"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2010,US,f55f8f5c25002f0f2a2e121be602248623f07494d5161a13d399ab12aa746bac
3,A. Aureli,https://openalex.org/A2422334401,Unknown,,,,,,https://openalex.org/A2304341794|https://openalex.org/A2182540860,aquifer|transboundary,,Karst|Biology|Tourism|Business|Environmental planning|Archaeology|Environmental science|Groundwater|Geotechnical engineering|Water resources|Water resource management|Environmental resource management|Environmental protection|Law|Ecology|Engineering|Multidisciplinary approach|Aquifer|Geography|Political science,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A2422334401"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2010,US,fee72a7c6e6595abd9a1fe8878cb3c9be76652d50b3986c6b1cb4ac610869e76
4,A. Aureli,https://openalex.org/A3086349667,Unknown,,,,,,https://openalex.org/A3085518772|https://openalex.org/A3085940897|https://openalex.org/A3086175637|https://openalex.org/A3086707070|https://openalex.org/A3216340081|https://openalex.org/A3084770820|https://openalex.org/A3085504345,aquifer|transboundary,,Environmental resource management|Groundwater|Environmental planning|Geology|Geotechnical engineering|Hydrology (agriculture)|Aquifer|Environmental science|Water resource management,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A3086349667"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2015,US,bd15eb707485fff043de670a9d011e35d88324f1f4113701f894b92ce4c64d88
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1372,Ä½udovÃ­t MolnÃ¡r,https://openalex.org/A2292818572,Unknown,,,,,,https://openalex.org/A2291355566|https://openalex.org/A2292165767,aquifer|transboundary,,Oceanography|Structural basin|Inflow|Cartography|Groundwater|Geology|Tributary|Geotechnical engineering|Hydrology (agriculture)|Aquifer|Alluvium|Environmental science|Geomorphology|Geography,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A2292818572"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2005,US,
1373,Å½. PekaÅ¡,https://openalex.org/A2491145178,Unknown,,,,,,https://openalex.org/A2478352227|https://openalex.org/A2641079011|https://openalex.org/A3114708001|https://openalex.org/A2061648226,aquifer|transboundary,,Karst|Business|Archaeology|Environmental planning|Process management|Geography,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A2491145178"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2016,US,46e5ed3ab9553bfac4c049dec30c1d226638344208cecd69b71825f0a43c111b
1374,Å½elimir PekaÅ¡,https://openalex.org/A4267790636,Unknown,,,,,,https://openalex.org/A4267790634|https://openalex.org/A4267790637|https://openalex.org/A4267790638|https://openalex.org/A4267790639|https://openalex.org/A4267790635,aquifer|transboundary,,Paleontology|Ideal (ethics)|Civil engineering|Karst|Groundwater|Law|Water resource management|Geology|Engineering|Geotechnical engineering|Hydrology (agriculture)|Aquifer|Environmental science|Geography|Political science,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A4267790636"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2016,US,ae12a61c22472adc53867b99f9e307e4e45871a8cc14c841b752512b2835f1a9
1375,Å½eljko KramariÄ,https://openalex.org/A2591057633,Unknown,,,,,,https://openalex.org/A3200145164|https://openalex.org/A1183704316,aquifer|transboundary,,Virology|Karst|Groundwater|Business|Environmental planning|Archaeology|Geology|Geotechnical engineering|Replication (statistics)|Hydrology (agriculture)|Aquifer|Biology|Environmental science|Geography|Water resource management,"<a href='#' onClick='javascript:getPublication({oaids:""https://openalex.org/A2591057633"",search:""Keywords,Scope"",OAConcepts:""OA concepts""})'>Show publications</a>",2012,US,fe94c788e0bd99c18f77c0f32e6e3976c77e2c0b1e420fead4293dda6417de75


In [8]:
# Define a dataframe subset
fs.view_data(fname)



## 5. Generate & Edit Qualifiers

In [12]:
printmd("<b><span style='color:red'>If you see an error message, you probably haven't clicked 'Finish & Save Data' in the previous dataframe view.</span></b>")

ql.qualifier_editor()

<b><span style='color:red'>If you see an error message, you probably haven't clicked 'Finish & Save Data' in the previous dataframe view.</span></b>

In [13]:
# Local updated data frame
df = ql.updated_df

## 6. Geocoder: placenames to point coordinates (Optional)
Select a placename variable and generate Latitude and Longitude columns

In [14]:
gt.geocoder(ql.stored_text)

In [15]:
# Local updated data frame
df = ql.updated_df

## 7. GeoJSON to Geometry (Optional)
Generate a 'geometry' column based on an external GeoJSON file. One of feature properties in the GeoJSON file should have feature names, to match feature names in the survey file.

In [16]:
file = pn.widgets.FileInput()
file

In [17]:
gt.json_to_geometry(file.value, ql.stored_text)

TypeError: the JSON object must be str, bytes or bytearray, not NoneType

In [None]:
# Local updated data frame
df = ql.updated_df

## 8. Generate images based on text values
Creates a set of images based on a selected variable for use with SuAVE

In [18]:
si.image_display(df, ql.stored_text, full_notebook_url.split('/qualgeoimage')[0])

In [19]:
# Local updated data frame
df = ql.updated_df

## 9. Final Data
Explore the dataframe before generating a new survey from it

In [20]:
df = ql.updated_df.fillna('')
panellibs.slider(df)

## 10. Generate a new survey and open it in SuAVE

In [None]:
if data_select.value == 'Import a local CSV file':
    csv_file = data_input.filename
    dzc_file = ''
    
new_file = suaveint.save_csv_file(updated_df, absolutePath, csv_file)

In [None]:
#Input survey name

import ipywidgets as widgets
from IPython.display import display

input_text = widgets.Text(placeholder='Enter Survey Name...')
output_text = widgets.Text()

def bind_input_to_output(sender):
    output_text.value = input_text.value

# Tell the text input widget to call bind_input_to_output() on submit
input_text.on_submit(bind_input_to_output)

printmd("<b><span style='color:red'>Input survey name here, press Enter, and then run the next cell:</span></b>")
# Display input text box widget for input
display(input_text)

display(output_text)

In [None]:
#Print survey name
survey_name = output_text.value
printmd("<b><span style='color:red'>Survey Name is: </span></b>" + survey_name)

In [None]:
suaveint.create_survey(survey_url,new_file, survey_name, dzc_file, user, csv_file, view, views, data_select.value)

## Explore the data frame with HoloViz

In [None]:
nb_name = 'holoviz/holoviz.ipynb'
import webbrowser
url1 = ('{base_url}/operations/{nb_name}?'+'surveyurl=' + survey_url + '&' + 'user='+user+'&'+'csv='+new_file.split('/')[-1]+'&'+'dzc='+dzc_file+"&"+'activeobject=null').format(base_url=base_url, nb_name=nb_name)

printmd("<b><span style='color:red'>Click the URL to open the selected notebook:</span></b>")
print(url1)