# Python OSEMN process
---

---

1.- [Scrub](#scrub)

&nbsp;
    1.1.- [Pre-test](#pre)
    
&nbsp;
        1.1.1.- [Crear tabla de tiempos](#timeTable)
        
&nbsp;
        1.1.2.- [Obtener eventos en etapa de búsqueda y selección](#searchAndSelect)
        
&nbsp;
        1.1.3.- [Identificadores de acción](#actionsId)

&nbsp;
        1.2.- [Post-test](#post)

    
&nbsp;
        1.2.1.- [Crear tabla de tiempos](#timeTablePost)
        
&nbsp;
        1.2.2.- [Obtener eventos en etapa de búsqueda y selección](#searchAndSelectPost)
        
&nbsp;
        1.2.3.- [Identificadores de acción](#actionsIdPost)

2.- [Explore](#explore)
    
3.- [Model](#model)
    
4.- [Interpret](#interpret)

# 1.- Scrub 
<a id="scrub"></a>

---

## 1.1.- Pre-test
<a id="pre"></a>

Importar pandas y numpy para manejo de dataframe

In [None]:
import pandas as pd
import numpy as np

### 1.1.1.- Crear tabla de tiempos
<a id="timeTable"></a>

Importar tabla de links visitados en la etapa de pretest

In [2]:
visitedLinksPre = pd.read_csv('Tablas generadas/Pre-test/VisitedLinks.PreTest.csv')

Revisar primeros datos de la tabla

In [3]:
visitedLinksPre.head()

Unnamed: 0,username,userId,X_id,state,url,localTimestamp,serverTimestamp
0,101BSCE120003,KnqPytrKdYvoWobR6,XnghJcJWpt9kawj6D,PageExit,/login,1488795765263,1488795765417
1,101BSCE120003,KnqPytrKdYvoWobR6,sikmeettZJj8XEhTp,PageEnter,/start,1488795765264,1488795765584
2,101BSCE120003,KnqPytrKdYvoWobR6,Dx664JM6NAwmhEpSx,PageExit,/start,1488795781045,1488795781689
3,101BSCE120003,KnqPytrKdYvoWobR6,NYhgvvR2Ky2SKwbbZ,PageEnter,/affective?stage=begin,1488795781051,1488795781699
4,101BSCE120003,KnqPytrKdYvoWobR6,ye6WFzgGgnfGb87Mp,PageExit,/affective?stage=begin,1488795820262,1488795820483


Comprobar cantidad de usuarios y filas

In [4]:
print(len(visitedLinksPre["username"].unique())," usuarios")
print(len(visitedLinksPre), " filas")

512  usuarios
56750  filas


Obtener tabla con tiempos iniciales en la etapa de búsqueda y selección

In [5]:
startTimePre = visitedLinksPre.loc[(visitedLinksPre["state"]=="PageExit") & 
                               (visitedLinksPre["url"]=="/tutorial?stage=search"),["username","serverTimestamp"]]
print(len(startTimePre["username"].unique())," usuarios")
print(len(startTimePre)," filas")

512  usuarios
515  filas


Debido a error en el sistema de captura de datos, algunos usuaris poseen duplicados en la fase tutorial.  Se eliminan todos excepto la primera ocurrencia

In [6]:
startTimePre = startTimePre.drop_duplicates(subset='username',keep='first')
print(len(startTimePre["username"].unique())," usuarios")
print(len(startTimePre)," filas")
startTimePre.columns = ["username","start"]
startTimePre.head()

512  usuarios
512  filas


Unnamed: 0,username,start
16,101BSCE120003,1488795953781
82,101BSCE120004,1488795974227
201,101BSCE120008,1488795937180
278,101BSCE120012,1488796022746
368,101BSCE120014,1488796497576


Obtener tabla con tiempos iniciales en la etapa de búsqueda y selección

In [7]:
finishTimePre = visitedLinksPre.loc[(visitedLinksPre["state"]=="PageEnter") & 
                                    (visitedLinksPre["url"]=="/collection"),["username","serverTimestamp"]]
print(len(finishTimePre["username"].unique())," usuarios")
print(len(finishTimePre)," filas")

512  usuarios
524  filas


Se observa el mismo problema que la tabla anterior, por lo que se eliminan duplicados conservando el último elemento

In [8]:
finishTimePre = finishTimePre.drop_duplicates(subset='username',keep='last')
print(len(finishTimePre["username"].unique())," usuarios")
print(len(finishTimePre)," filas")
finishTimePre.columns = ["username","finish"]
finishTimePre.head()

512  usuarios
512  filas


Unnamed: 0,username,finish
51,101BSCE120003,1488796554066
156,101BSCE120004,1488796840407
247,101BSCE120008,1488796645263
337,101BSCE120012,1488796806432
415,101BSCE120014,1488797036983


Construir una tabla donde por cada usuario se tenga el tiempo donde inició y finalizó la tarea de búsqueda y selección (join)

In [9]:
timeTablePre = pd.merge(startTimePre, finishTimePre, on='username', how='inner')
timeTablePre.head()

Unnamed: 0,username,start,finish
0,101BSCE120003,1488795953781,1488796554066
1,101BSCE120004,1488795974227,1488796840407
2,101BSCE120008,1488795937180,1488796645263
3,101BSCE120012,1488796022746,1488796806432
4,101BSCE120014,1488796497576,1488797036983


### 1.1.2.- Obtener eventos en etapa de búsqueda y selección
<a id="searchAndSelect"></a>

Obtener tabla de usuarios, eventos (consultas, ingreso y salida de páginas, bookmarks, entre otros), scrolls, clicks  y entradas de texto

In [10]:
usersPre = pd.read_csv('Tablas generadas/Pre-Test/Users.PreTest.csv')
print(len(usersPre), " usuarios")
usersPre.head()

512  usuarios


Unnamed: 0,child.ID,userName,T.Inicial,T.Final,Total.Time,Stay.Pages,Stay.Pag.Relv,Stay.Pag.NotRelv,Total.Cover,Doc.Relv.vist,...,Recall,F1,Score,Pos,Cal,Ask1,Ask2,Sex,Group,class
0,1109,101BSCE120003,1488795953616,1488796447580,8233,3112,1943,1169,4,2,...,667,572,3333,-2,1,2,3,2,0,A
1,1121,101BSCE120004,1488795974073,1488796824341,14171,5837,544,5293,9,2,...,667,333,1667,1,3,3,2,2,0,R
2,1118,101BSCE120008,1488795936998,1488796634284,11621,5521,2739,2782,5,3,...,1,75,5,-3,3,4,2,2,0,A
3,1112,101BSCE120012,1488796022585,1488796762579,12333,5655,947,4708,9,2,...,667,333,1429,-1,0,4,3,2,0,R
4,1106,101BSCE120014,1488796497561,1488796914443,6948,2051,587,1464,6,3,...,1,667,3,2,2,3,3,1,0,R


In [11]:
eventlogsPre = pd.read_csv('Tablas generadas/Pre-test/EventLogs.PreTest.csv')
eventlogsPre = eventlogsPre[["username","actionId","clientTimestamp","serverTimestamp","action"]]
eventlogsPre.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
print(len(eventlogsPre)," eventos")
eventlogsPre.head()

131474  eventos


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCE120003,,1488795764636,1488795764636,StatusOnline
1,101BSCE120003,ixZRM67XDCcPHCdgm,1488795764777,1488795764949,Login
2,101BSCE120003,XnghJcJWpt9kawj6D,1488795765263,1488795765417,PageExit
3,101BSCE120003,sikmeettZJj8XEhTp,1488795765264,1488795765584,PageEnter
4,101BSCE120003,,1488795765603,1488795765603,StatusOnline


In [12]:
mouseClicksPre = pd.read_csv("Tablas generadas/Pre-test/MouseClicks.PreTest.csv")
mouseClicksPre = mouseClicksPre[["username","X_id","localTimestamp","serverTimestamp"]]
actions = np.repeat('Click',len(mouseClicksPre))
mouseClicksPre['action'] = actions
print(len(mouseClicksPre)," clicks")
mouseClicksPre.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
mouseClicksPre.head()

11573  clicks


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCE120003,ksPnwtEMzNFyjDoPb,1488796295459,1488796295703,Click
1,101BSCE120003,WwR8e66RKxWB8ab7e,1488796296027,1488796296249,Click
2,101BSCE120003,srCWodk9K6YsrxTML,1488796296403,1488796296650,Click
3,101BSCE120003,8TsGqXjMRxCDZ47wQ,1488796298115,1488796298422,Click
4,101BSCE120003,js36dPcccXmZH4fp7,1488796355318,1488796355445,Click


In [13]:
scrollsPre = pd.read_csv("Tablas generadas/Pre-Test/ScrollMoves.PreTest.csv")
scrollsPre = scrollsPre[["username","X_id","localTimestamp","serverTimestamp"]]
actions = np.repeat('Scroll',len(scrollsPre))
scrollsPre['action'] = actions
scrollsPre.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
print(len(scrollsPre), "scrolls")
scrollsPre.head()

78080 scrolls


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCE120003,8hMHWgfKp3XAifx7m,1488796059472,1488796059765,Scroll
1,101BSCE120003,GTmc6XXe4JHkvyM3S,1488796064039,1488796064204,Scroll
2,101BSCE120003,s6uWaQxpTjzFMFzGj,1488796065388,1488796065551,Scroll
3,101BSCE120003,Saw37kgjSe3XD6sBs,1488796104153,1488796104456,Scroll
4,101BSCE120003,NCeRGGvzrXayjjd6G,1488796192735,1488796192980,Scroll


En el caso de la entrada de texto se contabilizan cuando se presionan flechas para detectar actividad del usuario en la página.

In [14]:
keystrokesPre = pd.read_csv("Tablas generadas/Pre-test/Keystrokes.PreTest.csv")
print(len(keystrokesPre), " keystrokes")
arrowPressPre = keystrokesPre[(keystrokesPre["keyCode"] == 38) | (keystrokesPre["keyCode"] == 40)]
arrowPressPre = arrowPressPre[["username","userId","localTimestamp","serverTimestamp"]]
actions = np.repeat('ArrowKey',len(arrowPressPre))
arrowPressPre['action'] = actions
arrowPressPre.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
print(len(arrowPressPre)," arrow press")
arrowPressPre.head()

1135698  keystrokes
4010  arrow press


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
373,101BSCE120003,KnqPytrKdYvoWobR6,1488797000000.0,1488797082493,ArrowKey
34100,101BSSA110011,gLDBxT8bE4hQHJwN3,1487848000000.0,1487848109770,ArrowKey
36089,101BSSA110011,gLDBxT8bE4hQHJwN3,1487849000000.0,1487849006600,ArrowKey
57581,102BSCE120003,YEtPcxMzxZWganuJf,1487588000000.0,1487587807180,ArrowKey
78087,102BSCE120017,LqLxDjphjYvhu9pPA,1487588000000.0,1487588019105,ArrowKey


Unir elementos en una tabla

In [15]:
actionsInSearchTask = pd.concat([eventlogsPre,mouseClicksPre,scrollsPre,arrowPressPre])

Obtener conjunto de elementos que ocurren en la etapa de búsqueda y selección

In [16]:
def getEventsInSearchTask (users,events,timeTable):
    columns = ['username','actionId','localTimestamp','serverTimestamp','action']
    acumulator = pd.DataFrame(columns=columns)
    for i in range(len(users["userName"])):
        actionsPerUser = events[
            (events['serverTimestamp'] > timeTable['start'][i]) &
            (events['serverTimestamp'] < timeTable['finish'][i]) &
            (events['username'] == users['userName'][i])
        ]
        acumulator = pd.concat([acumulator,actionsPerUser])
    return acumulator

In [17]:
actionsInSearchTask = getEventsInSearchTask(usersPre,actionsInSearchTask,timeTablePre)
actionsInSearchTask.head()

Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
46,101BSCE120003,8GExxyQeMScDeQgbB,1488796000000.0,1488795953964,PageEnter
47,101BSCE120003,BAmeJRb6f9Lrzuuoo,1488796000000.0,1488795990319,Query
48,101BSCE120003,,1488796000000.0,1488796000582,StatusAway
49,101BSCE120003,rGaDb5iAEwqngG7Li,1488796000000.0,1488796005639,Query
50,101BSCE120003,,1488796000000.0,1488796006289,StatusAway


Aún existen algunos datos que no serán de utilidad para la investigación del comportamiento del usuario. Estos son removidos.

In [24]:
actionsInSearchTask = actionsInSearchTask[
    (actionsInSearchTask["action"]!="StatusAway") & 
    (actionsInSearchTask["action"]!="StatusOnline") &
    (actionsInSearchTask["action"]!="BookmarkSelected") &
    (actionsInSearchTask["action"]!="StatusOffline") &
    (actionsInSearchTask["action"]!="TutorialSelected") &
    (actionsInSearchTask["action"]!="FormResponse") &
    (actionsInSearchTask["action"]!="Login") &
    (actionsInSearchTask["action"]!="SubtaskSelected") &
    (actionsInSearchTask["action"]!="Logout") &
    (actionsInSearchTask["action"]!="TimeoutTriggered")
                                         ]
actionsInSearchTask = actionsInSearchTask.sort_values(["username","serverTimestamp"])
actionsInSearchTask = actionsInSearchTask.reset_index(drop=True)
actionsInSearchTask.head(10)

Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCE120003,8GExxyQeMScDeQgbB,1.488796e+12,1488795953964,PageEnter
1,101BSCE120003,BAmeJRb6f9Lrzuuoo,1.488796e+12,1488795990319,Query
2,101BSCE120003,rGaDb5iAEwqngG7Li,1.488796e+12,1488796005639,Query
3,101BSCE120003,AXEv4fGMmYkmhebxu,1.488796e+12,1488796037940,Query
4,101BSCE120003,FajRFTo5JeMzJWhca,1.488796e+12,1488796047520,Query
5,101BSCE120003,,1.488796e+12,1488796051949,SearchResultSelected
6,101BSCE120003,vNw9Suw42CF2hq9gN,1.488796e+12,1488796051953,PageExit
7,101BSCE120003,edkYh8cwLdzmN8nmr,1.488796e+12,1488796051964,PageEnter
8,101BSCE120003,8hMHWgfKp3XAifx7m,1.488796e+12,1488796059765,Scroll
9,101BSCE120003,GTmc6XXe4JHkvyM3S,1.488796e+12,1488796064204,Scroll


### 1.1.3.- Identificadores de acción
<a id="actionsId"></a>

Función para asignar a cada acción una letra que lo identifica

In [41]:
def characterIdGenerator(events,relevantList):
    actionLetter = np.array
    for i in range(len(events)):
        if events["action"][i] == "Scroll":
            actionLetter = np.append(actionLetter,"S")
        elif events["action"][i] == "Click":
            actionLetter = np.append(actionLetter,"C")
        elif events["action"][i] == "BackButtonSelected":
            actionLetter = np.append(actionLetter,"K")
        elif events["action"][i] == "BookmarkListSelected":
            actionLetter = np.append(actionLetter,"L")
        elif events["action"][i] == "PageEnter":
            actionLetter = np.append(actionLetter,"E")
        elif events["action"][i] == "BookmarkScore":
            actionLetter = np.append(actionLetter,"O")
        elif events["action"][i] == "PageExit":
            actionLetter = np.append(actionLetter,"X")
        elif events["action"][i] == "Query":
            actionLetter = np.append(actionLetter,"Q")
        elif events["action"][i] == "ReadyButtonSelected":
            actionLetter = np.append(actionLetter,"R")
        elif events["action"][i] == "SearchResultSelected":
            actionLetter = np.append(actionLetter,"H")
        elif events["action"][i] == "TaskSelected":
            actionLetter = np.append(actionLetter,"T")
        elif events["action"][i] == "Unbookmark":
            actionLetter = np.append(actionLetter,"U")
        elif events["action"][i] == "ArrowKey":
            actionLetter = np.append(actionLetter,"W")
        elif events["action"][i] == "BackButtonSelected":
            actionLetter = np.append(actionLetter,"K")
        elif events["action"][i] == "Bookmark":
            if events['actionId'][i] in relevantList:
                actionLetter = np.append(actionLetter,"B")
            else:
                actionLetter = np.append(actionLetter,"b")
        else:
            actionLetter = np.append(actionLetter,"D")
    return(actionLetter)

Identificar bookmarks relevantes.

In [26]:
bookmarksPre = pd.read_csv("Tablas generadas/Pre-Test/Bookmarks.PreTest.csv")
bookmarksPre.head()

Unnamed: 0,username,userId,url,docId,action,X_id,relevant,localTimestamp,serverTimestamp,userMade
0,101BSCE120003,KnqPytrKdYvoWobR6,/page/PBDC3CgALvd6sh73m,PBDC3CgALvd6sh73m,Bookmark,83hA48zwoGzRJHmQj,True,1488796108619,1488796110289,True
1,101BSCE120003,KnqPytrKdYvoWobR6,/page/huuiENfZH4roDYB7s,huuiENfZH4roDYB7s,Bookmark,kKb3fmoqSoqFEhoou,True,1488796232599,1488796232734,True
2,101BSCE120003,KnqPytrKdYvoWobR6,/page/PBDC3CgALvd6sh73m,PBDC3CgALvd6sh73m,Unbookmark,3pd5jmAMc3p4XABWo,True,1488796463823,1488796463924,False
3,101BSCE120003,KnqPytrKdYvoWobR6,/page/huuiENfZH4roDYB7s,huuiENfZH4roDYB7s,Unbookmark,M4KgNZ4urmfeqvbTQ,True,1488796463823,1488796463932,False
4,101BSCE120003,KnqPytrKdYvoWobR6,/page/PBDC3CgALvd6sh73m,PBDC3CgALvd6sh73m,Bookmark,LTt6FvZ2qqu3J3H9J,True,1488796465072,1488796465338,False


In [27]:
relevantBookmarks = bookmarksPre.loc[(bookmarksPre["relevant"]==True),["X_id"]]
relevantBookmarks = np.asarray(relevantBookmarks)
print(relevantBookmarks)

[['83hA48zwoGzRJHmQj']
 ['kKb3fmoqSoqFEhoou']
 ['3pd5jmAMc3p4XABWo']
 ...
 ['mkAt8L92S7LQSYJsL']
 ['7vRKaznwzqX86Aph5']
 ['xpF2iwYSq6PGW4zKm']]


In [56]:
letterList = characterIdGenerator(actionsInSearchTask,relevantBookmarks)
letterList = np.delete(letterList,0)
print(letterList)

['E' 'Q' 'Q' ... 'E' 'R' 'X']


In [57]:
actionsInSearchTask["charId"] = letterList

In [59]:
actionsInSearchTask.head(10)

Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action,charId
0,101BSCE120003,8GExxyQeMScDeQgbB,1488796000000.0,1488795953964,PageEnter,E
1,101BSCE120003,BAmeJRb6f9Lrzuuoo,1488796000000.0,1488795990319,Query,Q
2,101BSCE120003,rGaDb5iAEwqngG7Li,1488796000000.0,1488796005639,Query,Q
3,101BSCE120003,AXEv4fGMmYkmhebxu,1488796000000.0,1488796037940,Query,Q
4,101BSCE120003,FajRFTo5JeMzJWhca,1488796000000.0,1488796047520,Query,Q
5,101BSCE120003,,1488796000000.0,1488796051949,SearchResultSelected,H
6,101BSCE120003,vNw9Suw42CF2hq9gN,1488796000000.0,1488796051953,PageExit,X
7,101BSCE120003,edkYh8cwLdzmN8nmr,1488796000000.0,1488796051964,PageEnter,E
8,101BSCE120003,8hMHWgfKp3XAifx7m,1488796000000.0,1488796059765,Scroll,S
9,101BSCE120003,GTmc6XXe4JHkvyM3S,1488796000000.0,1488796064204,Scroll,S


# 1.2.- Post-test
<a id="post"></a>

## 1.2.1.- Crear tabla de tiempos
<a id="timeTablePost"></a>

Importar tabla de links visitados

In [45]:
visitedLinksPost = pd.read_csv('Tablas generadas/Post-test/VisitedLinks.PostTest.csv')
visitedLinksPost.head()

Unnamed: 0,username,userId,X_id,state,url,localTimestamp,serverTimestamp
0,101BSCA210001,84wLsixL5RDcG4mnS,gHCwzPyjWFnYTdMMz,PageExit,/login,1494239000000.0,1494238579797
1,101BSCA210001,84wLsixL5RDcG4mnS,gSoee6bwQeNMcYMmv,PageEnter,/start,1494239000000.0,1494238579963
2,101BSCA210001,84wLsixL5RDcG4mnS,KKmtFDFMBfPDSuNeF,PageExit,/start,1494239000000.0,1494238580713
3,101BSCA210001,84wLsixL5RDcG4mnS,73urXKdzBJqPJRm44,PageEnter,/affective?stage=begin,1494239000000.0,1494238580727
4,101BSCA210001,84wLsixL5RDcG4mnS,pETGgCzqeMAjTcHt7,PageExit,/affective?stage=begin,1494239000000.0,1494238630703


Obtener tiempos iniciales

In [48]:
startTimePost = visitedLinksPost.loc[(visitedLinksPost['state'] == "PageEnter") & 
                                 (visitedLinksPost['url'] == "/search" ),["username","serverTimestamp"]]
print(len(startTimePost["username"].unique())," usuarios")
print(len(startTimePost)," filas")

546  usuarios
609  filas


Eliminar datos repetidos

In [50]:
startTimePost = startTimePost.drop_duplicates(subset='username',keep='first')
print(len(startTimePost["username"].unique())," usuarios")
print(len(startTimePost)," filas")
startTimePost.columns = ["username","start"]
startTimePost.head()

546  usuarios
546  filas


Unnamed: 0,username,start
13,101BSCA210001,1494238695692
76,101BSCA210002,1494239021269
169,101BSCA210003,1494238644597
221,101BSCA210004,1494238887562
312,101BSCA210005,1494239142759


Obtener tiempos finales 

In [133]:
finishTimePost = visitedLinksPost.loc[(visitedLinksPost['state'] == "PageExit") &
                                     (visitedLinksPost['url'] == "/collection"),["username","serverTimestamp"]]
print(len(finishTimePost["username"].unique())," usuarios")
print(len(finishTimePost)," filas")

545  usuarios
548  filas


Eliminar duplicados

In [108]:
finishTimePost = finishTimePost.drop_duplicates(subset='username',keep='last')
print(len(finishTimePost["username"].unique())," usuarios")
print(len(finishTimePost)," filas")
finishTimePost.columns = ["username","finish"]
finishTimePost.head()

545  usuarios
545  filas


Unnamed: 0,username,finish
40,101BSCA210001,1494238989094
133,101BSCA210002,1494239474395
189,101BSCA210003,1494238832878
280,101BSCA210004,1494239605802
388,101BSCA210005,1494239990230


In [109]:
timeTablePost = pd.merge(startTimePost, finishTimePost, on='username', how='inner')
timeTablePost.head()

Unnamed: 0,username,start,finish
0,101BSCA210001,1494238695692,1494238989094
1,101BSCA210002,1494239021269,1494239474395
2,101BSCA210003,1494238644597,1494238832878
3,101BSCA210004,1494238887562,1494239605802
4,101BSCA210005,1494239142759,1494239990230


## 1.2.2.- Obtener eventos en etapa de búsqueda y selección
<a id = "searchAndSelectPost"></a>

Tabla de usuario

In [63]:
usersPost = pd.read_csv('Tablas generadas/Post-Test/Users.PostTest.csv')
print(len(usersPost), " usuarios")
usersPost.head()

546  usuarios


Unnamed: 0,child.ID,userName,T.Inicial,T.Final,Total.Time,Stay.Pages,Stay.Pag.Relv,Stay.Pag.NotRelv,Total.Cover,Doc.Relv.vist,...,Recall,F1,Score,Pos,Cal,Ask1,Ask2,Sex,Group,class
0,1115,101BSCA210001,1494238695502,1494238968981,4558,3584,674,291,4,3,...,1,857,5,0,3,3,2,1,0,A
1,1124,101BSCA210002,1494239021107,1494239443538,7041,3385,2494,891,6,3,...,1,667,5,1,2,4,3,1,0,A
2,1120,101BSCA210003,1494238644316,1494238817727,289,1232,744,487,4,3,...,1,857,375,-2,0,2,4,1,0,A
3,1123,101BSCA210004,1494238887195,1494239565956,11313,5443,2724,2719,7,3,...,667,4,3333,1,3,3,3,1,0,A
4,1121,101BSCA210005,1494239142660,1494239708989,9439,3754,519,3235,13,1,...,333,125,385,3,4,3,2,2,0,R


In [132]:
usersPost = usersPost[usersPost["userName"] != (set(usersPost["userName"]) - set(timeTablePost["username"]))]
len(usersPost)

546

In [123]:
usersPost["userName"] in timeTablePost["username"]

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Tabla de eventos

In [65]:
eventlogsPost = pd.read_csv('Tablas generadas/Post-test/EventLogs.PostTest.csv')
eventlogsPost = eventlogsPost[["username","actionId","clientTimestamp","serverTimestamp","action"]]
eventlogsPost.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
print(len(eventlogsPost)," eventos")
eventlogsPost.head()

138840  eventos


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCA210001,,1494239000000.0,1494238568531,StatusOnline
1,101BSCA210001,6wuBk3eii4FhbxhwS,1494239000000.0,1494238568847,Login
2,101BSCA210001,,1494239000000.0,1494238569877,StatusOnline
3,101BSCA210001,,1494239000000.0,1494238575116,StatusOnline
4,101BSCA210001,,1494239000000.0,1494238579018,StatusOnline


Tabla de clicks

In [66]:
mouseClicksPost = pd.read_csv("Tablas generadas/Post-test/MouseClicks.PostTest.csv")
mouseClicksPost = mouseClicksPost[["username","X_id","localTimestamp","serverTimestamp"]]
actions = np.repeat('Click',len(mouseClicksPost))
mouseClicksPost['action'] = actions
print(len(mouseClicksPost)," clicks")
mouseClicksPost.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
mouseClicksPost.head()

10090  clicks


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCA210001,xd4jPieYG4yErNQSd,1494239013324,1494239013411,Click
1,101BSCA210001,pn8n9tHFpePQxebuY,1494239021969,1494239022076,Click
2,101BSCA210001,pcZTa8eyQJ9ZxKyLE,1494239035153,1494239035375,Click
3,101BSCA210001,sJ2sWtD2eqiGf7K6H,1494239035584,1494239035827,Click
4,101BSCA210001,6vKNme6EXFj7CEyLd,1494239064503,1494239064880,Click


Tabla de scrolls

In [68]:
scrollsPost = pd.read_csv("Tablas generadas/Post-Test/ScrollMoves.PostTest.csv")
scrollsPost = scrollsPost[["username","X_id","localTimestamp","serverTimestamp"]]
actions = np.repeat('Scroll',len(scrollsPost))
scrollsPost['action'] = actions
scrollsPost.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
print(len(scrollsPost), "scrolls")
scrollsPost.head()

72284 scrolls


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCA210001,YuFQjsnQC3M5q8ynY,1494238799546,1494238799655,Scroll
1,101BSCA210001,btt2s66dv48iHjSwe,1494238863267,1494238863546,Scroll
2,101BSCA210001,7kjLcZTi3DN6xSzxJ,1494238863395,1494238863549,Scroll
3,101BSCA210001,bBHGww376bbx7gr2t,1494238863562,1494238863745,Scroll
4,101BSCA210001,GPcGumjcQwciTmt9i,1494238863661,1494238863747,Scroll


Tabla de entradas de texto

In [69]:
keystrokesPost = pd.read_csv("Tablas generadas/Post-test/Keystrokes.PostTest.csv")
print(len(keystrokesPost), " keystrokes")
arrowPressPost = keystrokesPost[(keystrokesPost["keyCode"] == 38) | (keystrokesPost["keyCode"] == 40)]
arrowPressPost = arrowPressPost[["username","userId","localTimestamp","serverTimestamp"]]
actions = np.repeat('ArrowKey',len(arrowPressPost))
arrowPressPost['action'] = actions
arrowPressPost.columns = ["username","actionId","localTimestamp","serverTimestamp","action"]
print(len(arrowPressPost)," arrow press")
arrowPressPost.head()

1144585  keystrokes
4764  arrow press


Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
101580,103BSCA210001,mMRXvwnMMM7iArezX,1494582363004,1494582370041,ArrowKey
101581,103BSCA210001,mMRXvwnMMM7iArezX,1494582363009,1494582370246,ArrowKey
101582,103BSCA210001,mMRXvwnMMM7iArezX,1494582363881,1494582370943,ArrowKey
101583,103BSCA210001,mMRXvwnMMM7iArezX,1494582363885,1494582371184,ArrowKey
101584,103BSCA210001,mMRXvwnMMM7iArezX,1494582364267,1494582371329,ArrowKey


Unir elementos en una tabla

In [110]:
actionsInSearchTaskPost = pd.concat([eventlogsPost,mouseClicksPost,scrollsPost,arrowPressPost])
actionsInSearchTaskPost.head()

Unnamed: 0,username,actionId,localTimestamp,serverTimestamp,action
0,101BSCA210001,,1494239000000.0,1494238568531,StatusOnline
1,101BSCA210001,6wuBk3eii4FhbxhwS,1494239000000.0,1494238568847,Login
2,101BSCA210001,,1494239000000.0,1494238569877,StatusOnline
3,101BSCA210001,,1494239000000.0,1494238575116,StatusOnline
4,101BSCA210001,,1494239000000.0,1494238579018,StatusOnline


Obtener elementos en etapa de búsqueda y selección

In [111]:
actionsInSearchTaskPost = getEventsInSearchTask(usersPost,actionsInSearchTaskPost,timeTablePost)
actionsInSearchTaskPost.head() 

KeyError: 545

In [None]:
actionsInSearchTaskPost = actionsInSearchTaskPost[
    (actionsInSearchTask["action"]!="StatusAway") & 
    (actionsInSearchTask["action"]!="StatusOnline") &
    (actionsInSearchTask["action"]!="BookmarkSelected") &
    (actionsInSearchTask["action"]!="StatusOffline") &
    (actionsInSearchTask["action"]!="TutorialSelected") &
    (actionsInSearchTask["action"]!="FormResponse") &
    (actionsInSearchTask["action"]!="Login") &
    (actionsInSearchTask["action"]!="SubtaskSelected") &
    (actionsInSearchTask["action"]!="Logout") &
    (actionsInSearchTask["action"]!="TimeoutTriggered")
                                         ]
actionsInSearchTask = actionsInSearchTask.sort_values(["username","serverTimestamp"])
actionsInSearchTask = actionsInSearchTask.reset_index(drop=True)
actionsInSearchTask.head(10)

##  1.2.3.- Identificadores de acción
<a id="actionsIdPost"></a>

# 2.- Explore
<a id="explore"></a>

In [191]:
events = pd.read_csv('Tablas generadas/Pre-Test/fullEventLogs.PreTest.csv', sep=";")
events.head()

Unnamed: 0,username,id,action,localTimestamp,serverTimestamp,actionID,acumulator,aprovedArray
0,101BSCE120003,8GExxyQeMScDeQgbB,PageEnter,1488795953616,1488795953964,E,0.0,A
1,101BSCE120003,BAmeJRb6f9Lrzuuoo,Query,1488795990019,1488795990319,Q,0.0,A
2,101BSCE120003,rGaDb5iAEwqngG7Li,Query,1488796005341,1488796005639,Q,0.0,A
3,101BSCE120003,AXEv4fGMmYkmhebxu,Query,1488796037651,1488796037940,Q,0.0,A
4,101BSCE120003,FajRFTo5JeMzJWhca,Query,1488796047341,1488796047520,Q,0.0,A


In [199]:
studentList = pd.read_csv('Tablas generadas/Pre-Test/usersActions.PreTest.csv', sep=";")
studentList.head()

Unnamed: 0,userName,Score,class,group,value
0,101BSCE120003,3333,A,C,EQQQQHXESSSSBKXETQHXESSSSSSBKXEQQHXECCCSSSSCSS...
1,101BSCE120004,1667,R,C,EQQQHXECCSSSCCXEHXECCCCCCCCSSXEXEQHXESSSSSXEHX...
2,101BSCE120008,5,A,C,EQXEQHXESSSSSSSBKXEQHXESSSSSSSSSSSSSSSSSSSSSSS...
3,101BSCE120012,1429,R,C,EQTHXESXEXEQHXESSBXEHXESSbSXEHXESSSBROXEXEXEQH...
4,101BSCE120014,3,R,C,EQTHXESSSSSSSbKXEHXESSSBKXEQHXESSSSSSSSSSKXEXE...


In [149]:
import plotly.plotly as py
import plotly
plotly.tools.set_credentials_file(username='iOrellana', api_key='Y4Rfd2TsbhMzqI9vESb0')

def direction(studentList, traceSet, colorName, activities, bookmark):
    data = np.array(traceSet)
    for i in range(len(studentList)):
        x = np.array(0)
        y = np.array(0)
        z = np.array(0)
        lengthMarker = np.array(0)
        u=1
        word = studentList["value"][i]
        temp = activities.loc[(activities["username"] == studentList["username"][i]), ["acumulator"]]
        for j in range(len(word)):
            if(word[j] == "Q"):
                x = np.append(x,x[u-1]+1)
                y = np.append(y,y[u-1])
                z = np.append(z,z[u-1])
                lengthMarker = np.array(lengthMarker,temp[j])
    trace = go.Scatter3d(x=x, y=y, z=z)
    data = np.append(data,trace)
    return data

In [201]:
traceSet = go.Scatter3d(x=np.array(0), y=np.array(0), z=np.array(0))

In [151]:
lists <- direction(studentList,traceSet,'red',)

0


In [None]:
directions <- function(listR, traceSet, colorName, activities,bookmark){
  
  for(i in 1:nrow(listR)){
    x <- array()
    y <- array()
    z <- array() 
    lengthMarker <- array()
    x[1] <- 0
    y[1] <- 0
    z[1] <- 0
    lengthMarker[1] <- 0 
    u = 2
    word <- as.character(listR$value[i])
    temp <- subset(activities$acumulator, activities$username == listR$userName[i])
    for(j in 1:nchar(word)){
      switch (substring(word,j,j),
              Q = {
               x[u] <- x[u-1] + 1 
               y[u] <- y[u-1]
               z[u] <- z[u-1] 
               lengthMarker[u] <- temp[j]
               u = u + 1
              },
              H = {
                x[u] <- x[u-1]
                y[u] <- y[u-1] + 1
                z[u] <- z[u-1] 
                lengthMarker[u] <- temp[j]
                u = u + 1
               },
              R = {
                if(!bookmark){
                  x[u] <- x[u-1] 
                  y[u] <- y[u-1] 
                  z[u] <- z[u-1] + 1
                  lengthMarker[u] <- temp[j]
                  u = u + 1
                }
              },
              B = {
                if(bookmark){
                  x[u] <- x[u-1] 
                  y[u] <- y[u-1] 
                  z[u] <- z[u-1] + 1
                  lengthMarker[u] <- temp[j]
                  u = u + 1
                }
              },
              b = {
                if(bookmark){
                  x[u] <- x[u-1] 
                  y[u] <- y[u-1] 
                  z[u] <- z[u-1] - 1
                  lengthMarker[u] <- temp[j]
                  u = u + 1
                }
              }
              
      )
    }
    #print(lengthMarker)
    lengthMarker <- lengthMarker/7
    traceSet<-add_trace(traceSet, y=y, x=x, z=z , type="scatter3d", mode="lines+markers",
                        line = list(color = colorName, width = 1),
                        marker = list(size = lengthMarker))
    
  }
  return(traceSet)
}

In [183]:
x = [0,1,2]
y = [0,1,2]
y2 = [0,-1,-2]
z = [0,1,2]
x = np.asarray(x)
trace = go.Scatter3d(x=x, y=y, z=z)
trace2 = go.Scatter3d(x=x, y=y2, z=z)
data = np.array(trace)
data = np.append(data,trace2)
data2 = [trace,trace2]
print(list(data))
print(data2)

[Scatter3d({
    'x': array([0., 1., 2.]), 'y': [0, 1, 2], 'z': [0, 1, 2]
}), Scatter3d({
    'x': array([0., 1., 2.]), 'y': [0, -1, -2], 'z': [0, 1, 2]
})]
[Scatter3d({
    'x': array([0., 1., 2.]), 'y': [0, 1, 2], 'z': [0, 1, 2]
}), Scatter3d({
    'x': array([0., 1., 2.]), 'y': [0, -1, -2], 'z': [0, 1, 2]
})]


In [185]:
fig = dict(data=list(data))
py.iplot(fig, height=700)

In [198]:
x = "bhudajsi"
len(x)

8

In [194]:
f('s')

9