# Systems Immunogenetics Project

## Buxco Parsing Workflow

### McWeeney Lab, Oregon Health & Science University

** Authors: Gabrielle Choonoo (choonoo@ohsu.edu) and Michael Mooney (mooneymi@ohsu.edu) **

## Introduction

This is the step-by-step workflow for parsing the Buxco data into databases for each batch.

Required Files:
* Buxco Data
* This notebook (notebook.ipynb): [[Download here]](https://raw.githubusercontent.com/gchoonoo/Buxco_notebook/master/notebook.ipynb)

Required R packages:
- `plethy`
- `plyr`

**Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included): [[Download R script here]](https://raw.githubusercontent.com/gchoonoo/Buxco_notebook/master/parse_buxco.r)

** All code is available on GitHub: [https://github.com/gchoonoo/Buxco_notebook](https://github.com/gchoonoo/Buxco_notebook) **

# Load Buxco Data

In [None]:
read.delim(file="./Buxco_Data/2013_iAugust - buxco.txt", sep=",", skip=1, header=T, colClasses=c(rep(NA,21),"NULL")) -> aug_2013_data

# Check Unique Sample Names

In [None]:
# The default lines that are removed are "Measurement", "Create measurement", "Waiting for", and "Site Acknowledgement Changed".
# "Subject" and blanks are also removed.
# Note any others that do not have the form Mating RIX Virus (i.e. "Responding to")
unique(aug_2013_data[,"Subject"])

# Create Buxco Database

In [None]:
# Set the file path to the buxco data
aug_2013 <- "./Buxco_Data/2013_iAugust - buxco.txt"
# Set the file size the number of rows in the file
chunk.size <- dim(aug_2013_data)[1]
# Set the file path of where to save the data base
db.name <- file.path("./Buxco_Data/Database/August2013_database.db")
# Parse the data, add "Responding to" in the burn.in.lines if necessary
parse.buxco(file.name=aug_2013, chunk.size=chunk.size, db.name=db.name, verbose=FALSE, burn.in.lines=c("Measurement", "Create measurement", "Waiting for","Site Acknowledgement Changed"))
# Note any parsing warnings that get printed (Sample Name and Break number), none in this case

# Add Annotation

In [None]:
# Read in the data base that was created
August2013_database.db <- makeBuxcoDB(db.name=file.path("./Buxco_Data/Database/August2013_database.db"))

# Add the Day and Break type level (EXP, ACC, ERR, or UNK)
addAnnotation(August2013_database.db, query=day.infer.query, index=TRUE)  
addAnnotation(August2013_database.db, query=break.type.query, index=TRUE)

# Check Break type levels
annoLevels(August2013_database.db)

# Save parsing warnings, error, and unknown rows

In [None]:
acc.aug2013 <- retrieveData(August2013_database.db, variables=variables(August2013_database.db), Break_type_label = 'ACC')
exp.aug2013 <- retrieveData(August2013_database.db, variables=variables(August2013_database.db), Break_type_label = 'EXP')
err.aug2013 <- retrieveData(August2013_database.db, variables=variables(August2013_database.db), Break_type_label = 'ERR')

# Subset the parsing warning rows in each file
# Example:
# acc.feb2013[which(acc.feb2013[,1] == "16513x16188 f105 FLU" & acc.feb2013[,"Break_number"] == 158),] -> acc.feb2013_warning