## Reflecting, writing, and analytics: What can we learn from student text as data?
### HERN Workshop - 27 Nov 2017

What does a students’ language say about their learning? When they put their personal thoughts into words, what does it reveal about them, their thinking, and their interactions with others? In this workshop we will explore some of the ways reflective writing can be used for learning, and take an introductory look at how we can discover meaningful aspects of the writing through computational analysis. During the workshop, we will experiment with a couple of tools for analysing writing, examine some cases of how these tools were used for learning, and we will establish some important principles for using writing analytics in a learning and teaching context.

### A couple of RWA examples 

#### Academic Reflective Writing

[Academic Writing Analytics (AWA)](http://awa.uts.edu.au/) 

- Login using AAF
- Try examplar reflections
- Look at the theoretical framework for reflective writing
- Examine how the framework links throught to the feedback

#### Reflection and Metacognition

[Towards the Discovery of Learner Metacognition From Reflective Writing](http://nlytx.io/2016/metacognition/index.html) 

- Try different examples and view the features
- Look at the theoretical link between metacognition and reflection
- Examine how the theory translates to the features

### DIY Reflective Writing Analytics

A basic Reflective Writing Analytics (RWA) task, step by step using the [Text Analytics Pipeline (TAP)](http://tap-test.utscic.edu.au)

#### The Task - Group Efficacy

Consider a large cohort of students undertaking an assignment in small groups. Most work is undertaken outside of face-to-face time, and therefore monitoring group interaction is not practical.

Suppose we wish to identify which groups are functioning well and which groups are having problems so that we can intervene early.

**The Writing:** *Students use [GoingOK](http://goingok.org) to write short personal reflections about their group work after each group interaction (or at least once per week).*

#### [1] Setup code necessary to do our analytics

Before we can actually perform the analytics, we need import some additional software and setup some helper functions. The following code block does this.

In [67]:
/* DEPENDENCIES */

//Load additional software to make a web connection to TAP and decode the JSON response
%AddDeps org.scalaj scalaj-http_2.11 2.3.0
%AddDeps org.json4s json4s-jackson_2.11 3.5.3

//Import the libraries that we are going to use into this notebook
import scalaj.http._                            // Handle web connection to TAP
import org.json4s._                             // Encode and decode JSON
import org.json4s.jackson.JsonMethods._
import scala.io.Source                          // Read from filesystem
import org.apache.toree.magic.{CellMagicOutput, CellMagic}   //Display custom HTML in the notebook
import org.apache.toree.kernel.protocol.v5.{Data, MIMEType}

Marking org.scalaj:scalaj-http_2.11:2.3.0 for download
Preparing to fetch from:
-> file:/tmp/toree_add_deps244737840313834164/
-> https://repo1.maven.org/maven2
-> New file at /tmp/toree_add_deps244737840313834164/https/repo1.maven.org/maven2/org/scalaj/scalaj-http_2.11/2.3.0/scalaj-http_2.11-2.3.0.jar
Marking org.json4s:json4s-jackson_2.11:3.5.3 for download
Preparing to fetch from:
-> file:/tmp/toree_add_deps244737840313834164/
-> https://repo1.maven.org/maven2
-> New file at /tmp/toree_add_deps244737840313834164/https/repo1.maven.org/maven2/org/json4s/json4s-jackson_2.11/3.5.3/json4s-jackson_2.11-3.5.3.jar


In [68]:
/* HELPER FUNCTIONS */

// Get a list of lines (paragraphs) from a text file in the example_text directory
def getLinesFromFile(name: String) = Source.fromFile(s"example_text/$name").getLines.toList

// Output HTML to the notebook after current cell
def displayHtml(html: String) = Left(CellMagicOutput(MIMEType.TextHtml -> html))

// Output list of strings with index and separator
def printOut(list:List[String],label:String=""): Unit = {
    val indexed = list.zipWithIndex
    indexed.foreach { case (str,idx) =>
        println(s"\n---[$label ${idx+1}]--------------------------------\n")
        println(str)
    }
}

// Build a GraphQL query from TAP query and input text
def buildGraphQlQuery(input:String,query:String)= {
    import org.json4s.JsonDSL._
    val variables = ("input" -> input)
    val fullQuery = ("query" -> query) ~ ("variables" -> variables)
    compact(render(fullQuery))
}

// The data structures for analytics from TAP
case class Token(idx:Int,term:String,lemma:String,postag:String)
case class Analytic(idx:Int,tokens:List[Token])

// Post query to TAP Url and return Analytic object for each sentence
def getAnalytics(server:String,query:String):List[Analytic] = {
    val url = s"http://$server/graphql" //The URL - graphql endpoint at server
    val queryRequest = Http(url).postData(query).header("content-type", "application/json") //The request
    val queryData = parse(queryRequest.asString.body)
    implicit val formats = DefaultFormats                 // An implicit allows extraction to scala Ojbect from JValue
    (queryData \ "data" \ "annotations" \ "analytics").extract[List[Analytic]]
}

// Extraction and HTML markup of personal pronouns
object Pronouns {
    
    def extractPersonal(analytics:List[Analytic]):List[String] = {
        analytics.flatMap( _.tokens.filter(_.postag.contains("PRP")).map(_.term))
    }

    def otherOne(term:String) = List("he","him","his","she","her","hers").contains(term.toLowerCase)
    def others(term:String) = List("them","they","theirs").contains(term.toLowerCase)
    def group(term:String) = List("we","us","our").contains(term.toLowerCase)
    def self(term:String) = List("i","me","my").contains(term.toLowerCase)

    def wrap(term:String,id:String) = s"""<span class="$id">$term</span>"""

    def markupPersonal(list: List[String]): String = list match {
        case Nil => ""
        case term :: rest => {
            val mt = term match {
                case t if otherOne(t) => wrap(t,"otherone")
                case t if others(t) => wrap(t,"others")
                case t if self(t) => wrap(t,"self")
                case t if group(t) => wrap(t,"group")
                case t => t
            }
            mt + " " + markupPersPronouns(rest)
        }
    }
}

#### [2] Look at writing cases for language features

To address our task, we need to answer a couple of questions about the writing:

- *What features are we likely to see in the students writing when group work is going well?* 
- *What about when the group is not functioning?*

Load example files and take a look...

In [69]:
//Load two files - Fake students from two different groups
val file1 = getLinesFromFile("grp1-pers1.txt")
val file2 = getLinesFromFile("grp2-pers1.txt")

//Get the text of the first paragraph for each
val student1 = file1.head
val student2 = file2.head

//View the text as HTML
displayHtml(s"<p><b>student 1: </b>$student1</p>"+s"<p><b>student 2: </b>$student2</p>")

#### Insignificant words that are significant

Often, when processing text computationally we are interested in content, and so words that don't contribute to the content, called stop words (a, the, this, then, me, I, us), are discarded and the algorithm works with the content words.

- *What do content words tell us about the effectiveness of the groups?*

- *What about the stop words? Do they tell us anything?*

#### Querying the Text Analytics Pipeline (TAP) 

To analyse the text with [TAP](http://tap-test.utscic.edu.au), we need to formulate a query for the type of analytics we want, send that query with the text to be analysed, and capture the result. 


In [70]:
//Formulate a query to tell TAP what analytics are wanted
val query = """
    query SentenceAnalysis($input: String!) {
      annotations(text:$input) {
        analytics {
          idx
          tokens {
            idx
            term
            lemma
            postag
          }
        }
      }
    }"""

//For convenience, put the texts in a list
val students = List(student1,student2)

//Format the query for each input text as GraphQL for TAP
val graphQlQueries = students.map(s => buildGraphQlQuery(s,query))

//View the final queries
printOut(graphQlQueries,"Query")


---[Query 1]--------------------------------

{"query":"\n    query SentenceAnalysis($input: String!) {\n      annotations(text:$input) {\n        analytics {\n          idx\n          tokens {\n            idx\n            term\n            lemma\n            postag\n          }\n        }\n      }\n    }","variables":{"input":"I can't believe that Harry has done nothing for our project. Everyone has been working diligently on it except him. He hasn't even started. If he doesn't lift his game this week, I'm going to talk to our tutor about it. I don't want us all to get a bad mark because he can't make an effort."}}

---[Query 2]--------------------------------

{"query":"\n    query SentenceAnalysis($input: String!) {\n      annotations(text:$input) {\n        analytics {\n          idx\n          tokens {\n            idx\n            term\n            lemma\n            postag\n          }\n        }\n      }\n    }","variables":{"input":"Our group has been working really well tog

In [71]:
//Set the address for TAP
val tapAddress = "tap-test.utscic.edu.au"

//Get the analytics
val analytics = graphQlQueries.map(q => getAnalytics(tapAddress,q))

//The RAW analytics...
printOut(analytics.map(_.toString), "Analytics")


---[Analytics 1]--------------------------------

List(Analytic(0,List(Token(0,I,i,PRP), Token(1,ca,ca,MD), Token(2,n't,n't,RB), Token(3,believe,believe,VB), Token(4,that,that,IN), Token(5,Harry,harry,NNP), Token(6,has,have,VBZ), Token(7,done,do,VBN), Token(8,nothing,nothing,NN), Token(9,for,for,IN), Token(10,our,our,PRP$), Token(11,project,project,NN), Token(12,.,.,.))), Analytic(1,List(Token(0,Everyone,everyone,NN), Token(1,has,have,VBZ), Token(2,been,be,VBN), Token(3,working,work,VBG), Token(4,diligently,diligently,RB), Token(5,on,on,IN), Token(6,it,it,PRP), Token(7,except,except,IN), Token(8,him,him,PRP), Token(9,.,.,.))), Analytic(2,List(Token(0,He,he,PRP), Token(1,has,have,VBZ), Token(2,n't,n't,RB), Token(3,even,even,RB), Token(4,started,start,VBN), Token(5,.,.,.))), Analytic(3,List(Token(0,If,if,IN), Token(1,he,he,PRP), Token(2,does,do,VBZ), Token(3,n't,n't,RB), Token(4,lift,lift,VB), Token(5,his,his,PRP$), Token(6,game,game,NN), Token(7,this,this,DT), Token(8,week,week,NN), To

In [72]:
//Extract personal pronouns for each of these sets of analytics
val personalPronouns = analytics.map(Pronouns.extractPersonal(_))

//The final pronouns for each text
printOut(personalPronouns.map(_.mkString(", ")),"Student")


---[Student 1]--------------------------------

I, our, it, him, He, he, his, I, our, it, I, us, he

---[Student 2]--------------------------------

Our, we, we, them, our, I, we


#### Formulating a hypothesis

When performing writing analytics, we use observations coupled a knowledge of the context and relevant theory to formulate a hypothesis about how the text might be analysed to yield insights that are of practical benefit.

**Hypothesis:** Is the use of singular or plural pronouns an indicator of group efficacy?

We can then test this on the data.

In [73]:
//Get the terms from the analytics and markup the personal pronouns
val markedUpTerms:List[String] = analytics.map { student => 
    val terms = student.flatMap( analytic => analytic.tokens.map(_.term))
    Pronouns.markupPersonal(terms)
}

//Set some CSS for the HTML markup
val css = """
<style>
  .otherone { color: red;   font-weight: bold; }
  .group    { color: green; font-weight: bold; }    
  .self     { color: blue;  font-weight: bold; }
  .others   { border-bottom: 2px red solid; }
</style>
"""

//Create the final HTML Snippet and display it
val finalText = markedUpTerms.zipWithIndex.map { case (text,idx) =>
    s"<p><b>Student ${idx+1}: </b>$text</p>"
}.mkString("\n")

displayHtml(css+finalText)