# Sequence Analysis

## I. Setup

### I.1 Load Libraries
First we have to load a few libraries:


In [None]:
library(TraMineR)
library(RColorBrewer)
library(cluster)

### I.2 Load
Afterwards we have to load the data using:

In [None]:
rawData = read.csv("Pedro_Review_22062015.csv",sep=";") 

Note that this is the very same file you sent me. I just saved it in `.csv`

The following code separates the rows of each dimension in separate variables:

In [None]:
companies = rawData[rawData[,5]=="Companies",]
titles = rawData[rawData[,5]=="Titles",]
membership = rawData[rawData[,5]=="Board Memberships",]

### I.3 Auxiliar Functions
Also, we'll define a few auxiliar functions. 

Right now the variables ``companies``, ``titles`` and ``membership`` are matrices containing all data. The function ``getSequence`` cuts the qualitative data and leaves only the sequence part of the matrices.

The ``trim`` function cleans the strings in the data. Probably due to errors during coding some of the data is for example "A " instead of "A", and the function ``trim`` removes those spaces

The last function, called ``makeSequencePlots``, just creates the 4 basic plots we look at. Just so I don't have to rewrite it everytime.


In [None]:
getSequence <- function(raw){
  return(raw[,-seq(1:5)])
}



trim <- function (x) gsub("^\\s+|\\s+$", "", x)
    
    
    
makeSequencePlots <- function(seq){
  seqIplot(seq,sortv="from.start",withlegend=TRUE,title="Companies - All sequences - Sort by start")
  
  seqIplot(seq,sortv="from.end",withlegend=TRUE,title="Companies - All sequences - Sort by end")
  
  seqfplot(seq, withlegend = TRUE, border = NA, title="10 Most Frequent Sequences")
  
  seqdplot(seq, withlegend = TRUE, border = NA, title="States Distribution Over Time")
  
  seqmtplot(seq, withlegend = TRUE, title="Mean Time spent in each State")
  return(TRUE)
}    

## II. Sequence Analysis

### II.1 Cleaning the data

First step is to create a sequence-only matrix. Afterwards we make sure all the letters are in upper case and remove some data that seems to be wrongly coded. It might be worth a double check on your end. These are:

* Partner
* Consultant
* President
* President, COO,CEO
* Senior Advisor
* sr. v.p., corp. Strategy
* President, COO
* CHECK

I just put ``Z`` on all of them.

Lastly, the ``for`` loops just trim all the strings in the sequences.

In [None]:

companiesSequence <- getSequence(companies)

companiesSequence[companiesSequence == "b"] <- "B"
companiesSequence[companiesSequence == "c"] <- "C"
companiesSequence[companiesSequence == "Partner"] <- "Z"
companiesSequence[companiesSequence == "Consultant"] <- "Z"
companiesSequence[companiesSequence == "President"] <- "Z"
companiesSequence[companiesSequence == "President, COO,CEO"] <- "Z"
companiesSequence[companiesSequence == "Senior Advisor"] <- "Z"
companiesSequence[companiesSequence == "sr. v.p., corp. Strategy"] <- "Z"
companiesSequence[companiesSequence == "President, COO"] <- "Z"
companiesSequence[companiesSequence == "CHECK"] <- "Z"

for (row in seq(1:dim(companiesSequence)[1])){
  for (col in seq(1:dim(companiesSequence)[2])){
    companiesSequence[row,col] <- trim(companiesSequence[row,col])
  }
}

### II.2 Companies, Single Dimension, Asynchoronous

We have to build the sequence. In this first step we'll use the simplest approach, indicating that ``Z`` are missing values and that's all. That's done in the first command.

Note that the sequence is asynchoronous, meaning that it is exactly as it was coded. 

In [None]:
companiesSequence.seq <- seqdef(companiesSequence,NULL,missing="Z")

#This line is just to set the colors to the charts
cpal(companiesSequence.seq) <- brewer.pal(3,"Spectral")

makeSequencePlots(companiesSequence.seq)


### II.3 Companies, Single Dimension, Synchoronous

Using the same data we build another sequence. This time the only differente is that all the missing values in the beggining and in the end of the sequence.

This implies that if a given sequence starts say in 1990 this will be considered $$\tau = 1$$

Arguably this approach makes sense, because we are not looking to observer changes in given years, all that matters to us is the path, not when it hapenned.

The options ``right="DEL"`` and ``left="DEL"`` make those deletions.

Again after creating the sequence we generate the plots.

In [None]:
companiesSequence.seq <- seqdef(companiesSequence,NULL,missing="Z",right="DEL",left="DEL")

#This line is just to set the colors to the charts
cpal(companiesSequence.seq) <- brewer.pal(3,"Spectral")

makeSequencePlots(companiesSequence.seq)