## LASI'17 Writing Analytics Workshop

This is the Jupyter Notebook for the 2017 LASI Writing Analytics workshop. The workbook is designed to be a starter to demonstrate some key ideas for writing analytics, and be extended by you to try these ideas with your own work.

This workbook uses the Scala kernel, but all examples could be implemented in Python, R, or other languages supported by Jupyter notebook.

### Accessing data

We need to be able to access our raw data, send it to be processed, and store the results. So these two objects provide some helper methods to:
- Read from and write to the file system
- Talk to the TAP API

#### Access the file system

In [1]:
object FileSys {
    
    import ammonite.ops._
    
    val IN_DIR_NAME = "input_files"
    val OUT_DIR_NAME = "output_files"
    
    val thisDir = pwd
    val inputFileDir = thisDir/IN_DIR_NAME
    val outputFileDir = thisDir/OUT_DIR_NAME
    
    def listThisDir = ls(thisDir).filterNot( path => { //Filter out the file/directories starting with '.'
        val startChar = path.last.head
        //println(s"startChar: $startChar")
        startChar=='.'
    })
    
    def resetOutputDir = {
        rm(outputFileDir)
        mkdir(outputFileDir)
    }
    
    def listInputFiles = {
        ls(inputFileDir) |? (_.ext == "txt")
    }
    
    def firstFile = listInputFiles.head
    
    def getLinesForFile(filepath:Path) = read.lines(filepath)
}

defined [32mobject[39m [36mFileSys[39m

#### Access the Text Analytics Pipeline (TAP)

In [11]:
object Tap {
    import scalaj.http._
    
    val API_URL = "https://b9yiddda69.execute-api.ap-southeast-2.amazonaws.com/initialtest/v1"
    val HEALTH_URL = API_URL+"/health"
    
    case class Message(message:String)

    def serverDetails = Http(API_URL).asString

    def getHealthMessage = {
        println(s"Connecting to $HEALTH_URL")
        val response = Http(HEALTH_URL).asString
        println(response)
        upickle.default.read[Message](response.body)
    }

    def serverIsHealthy = {
        try { getHealthMessage.message=="ok" }
        catch { case e:Exception => {
                println(s"There was a problem with the server: $e")
                false }
        }
    }
}

defined [32mobject[39m [36mTap[39m

#### Testing data access - file system

In [12]:
//Remove an output dir if it exists and recreate
FileSys.resetOutputDir

//Check if it's there
show(FileSys.listThisDir)

//If we need to write temporary files
//val tempDir = tmp.dir()

//Get a list of all text files in the input directory
val inputFiles = FileSys.listInputFiles

[33mList[39m(
  /Users/andrew/Documents/development/_projects/Notebooks/LASI-17/LASI-17 Writing Analytics Workshop.ipynb,
  /Users/andrew/Documents/development/_projects/Notebooks/LASI-17/README.md,
  /Users/andrew/Documents/development/_projects/Notebooks/LASI-17/input_files,
  /Users/andrew/Documents/development/_projects/Notebooks/LASI-17/output_files
)


[36minputFiles[39m: [32mSeq[39m[[32mammonite[39m.[32mops[39m.[32mPath[39m] = [33mList[39m(
  /Users/andrew/Documents/development/_projects/Notebooks/LASI-17/input_files/pharm-sample.txt
)

#### Testing data access - TAP

In [13]:
//Tap.serverDetails
if(Tap.serverIsHealthy) println("The server is healthy")
else println("The server is not working")

Connecting to https://b9yiddda69.execute-api.ap-southeast-2.amazonaws.com/initialtest/v1/health
HttpResponse({"message": "Network error communicating with endpoint"},504,Map(Connection -> Vector(keep-alive), Content-Length -> Vector(56), Content-Type -> Vector(application/json), Date -> Vector(Thu, 27 Apr 2017 08:55:20 GMT), Status -> Vector(HTTP/1.1 504 Gateway Timeout), Via -> Vector(1.1 034e630e8674ac0b9c29358c6c3163d4.cloudfront.net (CloudFront)), X-Amz-Cf-Id -> Vector(9q7x2w3HUphKaA--JIEOsUY8_xDP8q_jnt6JDfWW6kiogw9Y91-S8A==), x-amzn-RequestId -> Vector(3debc5fd-2b27-11e7-a6cb-bb69941cde3f), X-Cache -> Vector(Error from cloudfront)))
The server is not working


#### Putting the two together

Let's read a file from the file system (the input_files directory), submit it to the TAP API, and then save the results back to the file system (the output_files directory).