# SDL.2 Using External Scripts & Modules 

### Objective: Learn how to apply an external script on data within Data Lab. Once the script is applied, send the calculated data to Workbench.

### Scenario:You have developed an algorithm to calculate the health of a Cooling Tower based on temperature data within each area.  In the second step, you will apply the `complex_health_score` function on the temperature data to get a health score for the Cooling Towers.


## Step 1: Upload Script to the Directory

Upload the `advanced_datascience_algorithms.py` script to the directory we are working in. Then, import it into our current notebook as shown below.


In [None]:
import advanced_datascience_algorithms as ada

<div class="alert alert-block alert-info">
<b>Tip:</b> If the script is frequently used, you can package it as a module and install across multiple Projects.
</div>

***

## Step 2: Pull data in Data Lab from a Worksheet 

The Data Science/Engineering Team has informed us that the Health Score Algorithm requires Temperature tags for all Areas in a Cooling Tower. 

In SDL.1 the required data was created for Cooling Tower 1.Update the display range to the last 7 days and copy the URL for <tt>Push Calculated Data</tt> (or <tt>Push Multiple Formulas</tt> worksheet). 

If you did not complete SDL.1, perform the following steps:
1. Open a new Workbench
2. Add `Area A_Temperature`, `Area B_Temperature`, `Area C_Temperature` signals from the `datasource='Example Data'`
3. (Optional): Cleanse `Area C_Temperature` using formula: `$signal.remove((($signal < 0).merge(15min))).agilefilter(5min)`

In [None]:
workbook_url = 'PASTE URL HERE'

cleansed_data = spy.pull(workbook_url,grid='12h')
cleansed_data.head()

<div class="alert alert-block alert-warning">
    <b>Discussion Topic:</b> What will be the data range of this <tt>spy.pull()</tt>? How can we change it?
</div>
 

In SDL.1, we used a dictionary to perform the search before we pulled the data. In this case we are using the URL of the workbench in string form to search for the signals and pull the data in one step.

***

## Step 3: Call the `complex_health_score` function
Call the `complex_health_score` function  in ada and store the results as `health_score`

In [None]:
health_score = ada.complex_health_score(cleansed_data)

#Renames the name of the signal output
health_score = health_score.rename(columns={'Health Score':'CT1 Health Score'})

# See the top 5 rows of the data table
health_score.head()

***
## Step 4: Push the Health Score back to Seeq

Push the `health_score` calculated using the algorithm from the Engineering/Data Science Team back into Seeq, where it can be used, for instance, in a dashboard.

In [None]:
spy.push(data = health_score, worksheet='Health Score')

<div class="alert alert-block alert-warning">
    <b>Discussion Topic:</b> When should you perform calculations in Workbench vs Data Lab?
</div>

***

## Step 5: Schedule job to Run Periodically (Optional) 

The `spy.job` schedules the notebook to run in the background periodically. Click the link for more information on [spy.jobs](https://python-docs.seeq.com/user_guide/spy.jobs.html)

Let's rerun this calculation every 7 days to view the new data.
 
<div class="alert alert-block alert-danger">
⚠<b> Warning </b>⚠ Scheduling jobs should be done with great care. Jobs consume CPU, memory, and disk space resources and can easily cause degraded performance on Seeq Server, Seeq Data Lab or an external system that you may be accessing.
</div>

<div class="alert alert-block alert-info">
<b>Tip:</b> If you are scheduling anything to run more frequent than every 15 minutes, discuss it with your Seeq administrator.
</div>



In [None]:
spy.jobs.schedule('every 7 days')

# The following line will cause the job to be unscheduled after the first time it runs.
# This is done so we dont use unnecessary resources on the learn server.
spy.jobs.unschedule()

***

## SDL.2: Summary

<div class="alert alert-block alert-warning">
    <b>Discussion Topic:</b> Are you a data scientist or do you work with a data scientist? How can you use example in your organization?
        <details>    
     <summary>✼</summary>
<i>This example demonstrates how data science teams can collaborate with engineering for code-based analysis.  An algorithm can be developed outside of Seeq and the results can be returned to Workbench and Organizer to be viewed by engineers, operators, leadership, and others.  This workflow effectively operationalizes data science/code products and streamlines cooperation between these distinct groups within your organization.</i>
    </details>       
</div>

The 4 steps above have been combined into one cell.  In this cell block, we will calculate the health score for Cooling Tower 2.  Copy the workbench created from **SDL.1 Summary** titled Push Multiple Formulas CT2. 


In [None]:
# Step 1: Upload Script to the Directory
import advanced_datascience_algorithms as ada

## Step 2: Pull data in Data Lab from a Worksheet 
cleansed_data = spy.pull('PASTE URL HERE',grid='12hr')

# Step 3: Call the `complex_health_score` function
health_score = ada.complex_health_score(cleansed_data)
health_score = health_score.rename(columns={'Health Score':'CT2 Health Score'})

#Step 4: Push the Health Score back to Seeq
spy.push(data=health_score, worksheet='Health Score')

#Step 5: Schedule job to Run Periodically

### Bonus: Perform a `spy.search()` on a URL


<div class="alert alert-block alert-warning">
    <b> Discussion Topic:</b> What is the difference between the output of a <tt>spy.search()</tt> when using a URL compared to a dictionary of query parameters?
</div>
 

In [None]:
cleansed_signals = spy.search(workbook_url)
cleansed_signals.head()