# Creating vectors/factors and dataframes
#### We are performing RNA-Seq on cancer samples being treated with three different types of treatment (A, B, and P). You have 12 samples total, with 4 replicates per treatment. Write the R code you would use to construct your metadata table as described below.

In [31]:
meta <- data.frame(
    sex = rep(c("M", "F"), 6),
    stage = rep(c("I", "II", "II"), 4),
    treatment = rep(c("A", "B", "P"), each=4),
    myc = c(2343, 457, 4593, 9035,3450,3524,958,1053,8674,3424,463,5105)
)
row.names(meta)<-paste("sample",row.names(meta), sep="")
print(meta)

         sex stage treatment  myc
sample1    M     I         A 2343
sample2    F    II         A  457
sample3    M    II         A 4593
sample4    F     I         A 9035
sample5    M    II         B 3450
sample6    F    II         B 3524
sample7    M     I         B  958
sample8    F    II         B 1053
sample9    M    II         P 8674
sample10   F     I         P 3424
sample11   M    II         P  463
sample12   F    II         P 5105


# Subsetting vectors/factors and dataframes
####  Using the meta data frame from question #1, write out the R code you would use to perform the following operations (questions DO NOT build upon each other):

return only the treatment and sex columns using []:
return the treatment values for samples 5, 7, 9, and 10 using []:
use filter() to return all data for those samples receiving treatment P:
use filter()/select()to return only the stage and treatment columns for those samples with myc > 5000:
remove the treatment column from the dataset using []:
remove samples 7, 8 and 9 from the dataset using []:
keep only samples 1-6 using []:
add a column called pre_treatment to the beginning of the dataframe with the values T, F, F, F, T, T, F, T, F, F, T, T (Hint: use cbind()):
change the names of the columns to: “A”, “B”, “C”, “D”:

In [32]:
#return only the treatment and sex columns using []
print(meta[,c(3,1)])

         treatment sex
sample1          A   M
sample2          A   F
sample3          A   M
sample4          A   F
sample5          B   M
sample6          B   F
sample7          B   M
sample8          B   F
sample9          P   M
sample10         P   F
sample11         P   M
sample12         P   F


In [33]:
#return the treatment values for samples 5, 7, 9, and 10 using []:
print(meta[c(5,7,9,10),c(3)])

[1] B B P P
Levels: A B P


In [36]:
#use filter() to return all data for those samples receiving treatment P
library(tidyverse)
print(meta)
#Another way to find:meta %>% filter(treatment == "P")
filter(meta, treatment == "P")


         sex stage treatment  myc
sample1    M     I         A 2343
sample2    F    II         A  457
sample3    M    II         A 4593
sample4    F     I         A 9035
sample5    M    II         B 3450
sample6    F    II         B 3524
sample7    M     I         B  958
sample8    F    II         B 1053
sample9    M    II         P 8674
sample10   F     I         P 3424
sample11   M    II         P  463
sample12   F    II         P 5105


sex,stage,treatment,myc
M,II,P,8674
F,I,P,3424
M,II,P,463
F,II,P,5105


In [37]:
#use filter()/select()to return only the stage and treatment columns for those samples with myc > 5000
  filter(meta, myc > 5000) %>% select(stage, treatment)


stage,treatment
I,A
II,P
II,P


In [15]:
#remove the treatment column from the dataset using []
  meta[, -3]


Unnamed: 0,sex,stage,myc
sample1,M,I,2343
sample2,F,II,457
sample3,M,II,4593
sample4,F,I,9035
sample5,M,II,3450
sample6,F,II,3524
sample7,M,I,958
sample8,F,II,1053
sample9,M,II,8674
sample10,F,I,3424


In [16]:
#remove samples 7, 8 and 9 from the dataset using []
  meta[-7:-9, ]


Unnamed: 0,sex,stage,treatment,myc
sample1,M,I,A,2343
sample2,F,II,A,457
sample3,M,II,A,4593
sample4,F,I,A,9035
sample5,M,II,B,3450
sample6,F,II,B,3524
sample10,F,I,P,3424
sample11,M,II,P,463
sample12,F,II,P,5105


In [17]:
#keep only samples 1-6 using []
  meta [1:6, ]


Unnamed: 0,sex,stage,treatment,myc
sample1,M,I,A,2343
sample2,F,II,A,457
sample3,M,II,A,4593
sample4,F,I,A,9035
sample5,M,II,B,3450
sample6,F,II,B,3524


In [18]:
#add a column called pre_treatment to the beginning of the dataframe with 
#the values T, F, F, F, T, T, F, T, F, F, T, T (Hint: use cbind())
  pre_treatment <- c(T, F, F, F, T, T, F, T, F, F, T, T)
  cbind(pre_treatment, meta)


Unnamed: 0,pre_treatment,sex,stage,treatment,myc
sample1,True,M,I,A,2343
sample2,False,F,II,A,457
sample3,False,M,II,A,4593
sample4,False,F,I,A,9035
sample5,True,M,II,B,3450
sample6,True,F,II,B,3524
sample7,False,M,I,B,958
sample8,True,F,II,B,1053
sample9,False,M,II,P,8674
sample10,False,F,I,P,3424


In [21]:
#change the names of the columns to: “A”, “B”, “C”, “D”
  colnames(meta) <- c("A", "B", "C", "D")
  print(meta)

         A  B C    D
sample1  M  I A 2343
sample2  F II A  457
sample3  M II A 4593
sample4  F  I A 9035
sample5  M II B 3450
sample6  F II B 3524
sample7  M  I B  958
sample8  F II B 1053
sample9  M II P 8674
sample10 F  I P 3424
sample11 M II P  463
sample12 F II P 5105
