#### Fixing the vector issue

Upon reading the Scores obtained from the [Decision tree implementation](https://github.com/Galeforse/Advanced-Cyber-Analytics-for-Attack-Detection/blob/main/Alex/Decision%20Tree%20Implementation.ipynb) used for improving models' understanding of how to distinguish between red_team anomalies and general traffic anomalies, I discovered that the initial uploaded vector of scores only contained data from Days 57 through Day 83. This small appendix is aimed at fixing that issue.

I read the data locally.

In [2]:
# setwd("D://LA//ATI Data//Summaries//Just_Auth")
df<- read.table(file="LA.txt", header=T, sep=",")
Score <- read.table(file="AuthScores.txt", header=T)

Remove the unnecessary column.

In [None]:
df<- df[, -1]

Evaluate the current dimensions.

In [8]:
head(Score)
head(df)
cat("\n")
dim(Score)
dim(df)

c
0
0
0
0
0
0


UserName,SrcDevice,DstDevice,AuthType,Failure,DailyCount
User035855,Comp808475,Comp081330,TGS,0,17
Comp655251$,Comp655251,ActiveDirectory,NetworkLogon,0,350
User762066,Comp306129,ActiveDirectory,TGS,0,22
User384215,Comp095190,EnterpriseAppServer,NetworkLogon,0,35
User043263,Comp883307,Comp384394,TGS,0,2
User631552,Comp621781,Comp915658,NetworkLogon,0,1





The number of rows do not match, since the Scores don't contain the normal traffic days. The quick fix consists of adding the 'zero' scores at the start, as well as at the end of the current 'Scores' vector respectively.

In [10]:
N1<- nrow(df)*56/90
Auth1 <- df[as.numeric(rownames(df))<=N1, ]
N3 <- nrow(df)*83/90
Auth3 <- df[as.numeric(rownames(df))>=N3, ]

c1 <- as.data.frame(rep(0, times=nrow(Auth1)))
c2<- as.data.frame(rep(0, times=nrow(Auth3)))
colnames(c1) <- c("c") 
colnames(c2) <- c("c")

Score <- rbind(c1, Score)
Score<- rbind(Score, c2)

The issue should now be resolved.

In [11]:
identical(nrow(Score), nrow(df))   ## Should be TRUE

This is what the new dataset looks like:

In [16]:
df<- cbind(df, 'Score' = Score)
head(df)

UserName,SrcDevice,DstDevice,AuthType,Failure,DailyCount,c
User035855,Comp808475,Comp081330,TGS,0,17,0
Comp655251$,Comp655251,ActiveDirectory,NetworkLogon,0,350,0
User762066,Comp306129,ActiveDirectory,TGS,0,22,0
User384215,Comp095190,EnterpriseAppServer,NetworkLogon,0,35,0
User043263,Comp883307,Comp384394,TGS,0,2,0
User631552,Comp621781,Comp915658,NetworkLogon,0,1,0


Lastly, we save our results. DO NOT run the following line of code, there is no need. The [old upload is still available here](https://github.com/Galeforse/Advanced-Cyber-Analytics-for-Attack-Detection/blob/main/Data/AuthScores.txt).

In [17]:
# write.csv(x=Score, file="AuthScores.csv")

The resulting vector of scores can be [found here](). We chose to upload the vector only rather than the entire dataset for memory preservation.