# PyGnocchi

Statistical associations using the ADAM genomics analysis platform. The currently supported operations are Genome Wide Association using Linear and Logistic models with either Dominant or Additive assumptions.

## Installation and Usage

(TODO) Make this a pip package

Download the Gnocchi [source code](https://github.com/nathanielparke/gnocchi) and package the project
``` mvn package ```

Start a virtual environment and build the Python files
``` 
virtualenv gnocchi
. gnocchi/bin/activate
mvn -Ppython package
```

Start a Jupyter notebook by running the Pygnocchi script
```
. bin/pygnocchi-notebook
```

## Using GnocchiSession

In [127]:
from bdgenomics.gnocchi.gnocchiSession import GnocchiSession
from bdgenomics.gnocchi.linearGnocchiModel import LinearGnocchiModel
from bdgenomics.gnocchi.logisticGnocchiModel import LogisticGnocchiModel
from bdgenomics.gnocchi.regressPhenotypes import RegressPhenotypes

In [128]:
genotypesPath = "../examples/testData/1snp10samples.vcf"
phenotypesPath = "../examples/testData/10samples1Phenotype.txt"

## Create GnocchiSession in Python

`GnocchiSession` handles a lot of the pipelining functionality with regards to loading and preparing raw genotype and phenotype data

In [3]:
gs = GnocchiSession(spark)

In [129]:
# Returns CalledVariantDataset which is a Python wrapper
# around a Scala Dataset[CalledVariant]
genos = gs.loadGenotypes(genotypesPath)
phenos = gs.loadPhenotypes(phenotypesPath, "SampleID", "pheno1", "\t")

## Build a LinearGnocchiModel

We can use the loaded genotypes and phenotypes to build a GnocchiModel which packages all the GWAS outputs and be merged with other models.

In [130]:
lgm = LinearGnocchiModel.New(spark, genos, phenos, ["AD"], ["GI"])

## Regress Phenotypes on full data

While GnocchiModels do provide functionality for packaging the operations in a portable fashion, in order to directly see GWAS outputs use `RegressPhenotypes`. This essentially takes a string of arguments canonical with regular Gnocchi command line flags and runs the specified regression.

In [7]:
rp = RegressPhenotypes(spark)
rp.apply("../examples/testData/1snp10samples.vcf ../examples/testData/10samples1Phenotype.txt ADDITIVE_LINEAR ../examples/testData/DELETEME -saveAsText -sampleIDName SampleID -phenoName pheno1 -overwriteParquet")

In [8]:
# Verify that the Python run regression is concordant 
# with the results of the gnocchi CLI 
! bash ../bin/gnocchi-submit regressPhenotypes ../examples/testData/1snp10samples.vcf ../examples/testData/10samples1Phenotype.txt ADDITIVE_LINEAR ../examples/testData/DELETEME2 -saveAsText -sampleIDName SampleID -phenoName pheno1 -overwriteParquet

Using GNOCCHI_MAIN=org.bdgenomics.gnocchi.cli.GnocchiMain
Using SPARK_SUBMIT=/Users/adithya/spark-2.1.0-bin-hadoop2.7/bin/spark-submit
ADAM invoked with args: regressPhenotypes ../examples/testData/1snp10samples.vcf ../examples/testData/10samples1Phenotype.txt ADDITIVE_LINEAR ../examples/testData/DELETEME2 -saveAsText -sampleIDName SampleID -phenoName pheno1 -overwriteParquet
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/12 17:09:33 INFO SparkContext: Running Spark version 2.2.0
17/11/12 17:09:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/12 17:09:33 INFO SparkContext: Submitted application: regressPhenotypes
17/11/12 17:09:33 INFO SecurityManager: Changing view acls to: adithya
17/11/12 17:09:33 INFO SecurityManager: Changing modify acls to: adithya
17/11/12 17:09:33 INFO SecurityManager: Changing view acls groups to: 
17/11/12 17:09:33 INFO SecurityManager: 

17/11/12 17:09:38 INFO DAGScheduler: ResultStage 0 (load at GnocchiSession.scala:235) finished in 0.588 s
17/11/12 17:09:38 INFO DAGScheduler: Job 0 finished: load at GnocchiSession.scala:235, took 0.719806 s
17/11/12 17:09:38 INFO CodeGenerator: Code generated in 10.887024 ms
17/11/12 17:09:38 INFO FileSourceStrategy: Pruning directories with: 
17/11/12 17:09:38 INFO FileSourceStrategy: Post-Scan Filters: 
17/11/12 17:09:38 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
17/11/12 17:09:38 INFO FileSourceScanExec: Pushed Filters: 
17/11/12 17:09:38 INFO CodeGenerator: Code generated in 6.394978 ms
17/11/12 17:09:38 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 277.8 KB, free 365.7 MB)
17/11/12 17:09:39 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 23.5 KB, free 365.7 MB)
17/11/12 17:09:39 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.142.169.233:56322 (size: 23.5 KB, free: 366.2

17/11/12 17:09:40 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 1396 bytes result sent to driver
17/11/12 17:09:40 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 91 ms on localhost (executor driver) (1/1)
17/11/12 17:09:40 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
17/11/12 17:09:40 INFO DAGScheduler: ResultStage 3 (first at GnocchiSession.scala:85) finished in 0.091 s
17/11/12 17:09:40 INFO DAGScheduler: Job 3 finished: first at GnocchiSession.scala:85, took 0.108835 s
17/11/12 17:09:40 INFO CodeGenerator: Code generated in 20.400573 ms
17/11/12 17:09:40 INFO CodeGenerator: Code generated in 62.220674 ms
17/11/12 17:09:40 INFO CodeGenerator: Code generated in 21.089921 ms
17/11/12 17:09:40 INFO CodeGenerator: Code generated in 14.641595 ms
17/11/12 17:09:40 INFO SparkContext: Starting job: collect at GnocchiSession.scala:94
17/11/12 17:09:40 INFO DAGScheduler: Registering RDD 22 (collect at GnocchiSession.scala:94)


17/11/12 17:09:41 INFO Executor: Finished task 0.0 in stage 6.0 (TID 6). 1716 bytes result sent to driver
17/11/12 17:09:41 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 24 ms on localhost (executor driver) (1/1)
17/11/12 17:09:41 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
17/11/12 17:09:41 INFO DAGScheduler: ShuffleMapStage 6 (count at GnocchiSession.scala:95) finished in 0.025 s
17/11/12 17:09:41 INFO DAGScheduler: looking for newly runnable stages
17/11/12 17:09:41 INFO DAGScheduler: running: Set()
17/11/12 17:09:41 INFO DAGScheduler: waiting: Set(ResultStage 7)
17/11/12 17:09:41 INFO DAGScheduler: failed: Set()
17/11/12 17:09:41 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[31] at count at GnocchiSession.scala:95), which has no missing parents
17/11/12 17:09:41 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 7.0 KB, free 365.2 MB)
17/11/12 17:09:41 INFO MemoryStore: Block b

17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11

17/11/12 17:09:42 INFO TaskSetManager: Finished task 12.0 in stage 10.0 (TID 22) in 19 ms on localhost (executor driver) (14/200)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 15.0 in stage 10.0 (TID 25) in 17 ms on localhost (executor driver) (15/200)
17/11/12 17:09:42 INFO Executor: Finished task 8.0 in stage 10.0 (TID 18). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO Executor: Running task 22.0 in stage 10.0 (TID 32)
17/11/12 17:09:42 INFO TaskSetManager: Starting task 23.0 in stage 10.0 (TID 33, localhost, executor driver, partition 23, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:42 INFO TaskSetManager: 

17/11/12 17:09:42 INFO Executor: Running task 39.0 in stage 10.0 (TID 49)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:42 INFO Executor: Running task 36.0 in stage 10.0 (TID 46)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO Shu

17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO Executor: Finished task 49.0 in stage 10.0 (TID 59). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 57.0 in stage 10.0 (TID 67, localhost, executor driver, partition 57, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO Executor: Running task 57.0 in stage 10.0 (TID 67)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 49.0 in stage 10.0 (TID 59) in 16 ms on localhost (executor driver) (50/200)
17/11/12 17:09:42 INFO Executor: Finished task 51.0 in stage 10.0 (TID 61). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 58.0 i

17/11/12 17:09:42 INFO TaskSetManager: Starting task 71.0 in stage 10.0 (TID 81, localhost, executor driver, partition 71, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 60.0 in stage 10.0 (TID 70) in 44 ms on localhost (executor driver) (64/200)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO Executor: Running task 71.0 in stage 10.0 (TID 81)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
1

17/11/12 17:09:42 INFO Executor: Finished task 78.0 in stage 10.0 (TID 88). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 87.0 in stage 10.0 (TID 97, localhost, executor driver, partition 87, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO Executor: Running task 87.0 in stage 10.0 (TID 97)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
17/11/12 17:09:42 INFO Executor: Finished task 81.0 in stage 10.0 (TID 91). 3096 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO TaskSetManager: Finished task 78.0 in stage 10.0 (TID 88) in 36 ms on localhost (executor driver) (80/200)
17/11/12 17:09:42 INFO Executor: Finished task 82.0 in stage 10.0 (TID 92). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlock

17/11/12 17:09:42 INFO Executor: Finished task 98.0 in stage 10.0 (TID 108). 3096 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 103.0 in stage 10.0 (TID 113, localhost, executor driver, partition 103, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 98.0 in stage 10.0 (TID 108) in 23 ms on localhost (executor driver) (96/200)
17/11/12 17:09:42 INFO Executor: Running task 103.0 in stage 10.0 (TID 113)
17/11/12 17:09:42 INFO Executor: Finished task 97.0 in stage 10.0 (TID 107). 3096 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 104.0 in stage 10.0 (TID 114, localhost, executor driver, partition 104, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 97.0 in stage 10.0 (TID 107) in 29 ms on localhost (executor driver) (97/200)
17/11/12 17:09:42 INFO Executor: Running task 104.0 in stage 10.0 (TID 114)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: 

17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/1

17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO Executor: Finished task 130.0 in stage 10.0 (TID 140). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO TaskSetManager: Starting task 136.0 in stage 10.0 (TID 146, localhost, executor driver, partition 136, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO Executor: Running task 136.0 in stage 10.0 (TID 146)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO TaskSetManager: Finished task 130.0 in stage 10.0 (TID 140) in 19 ms on localhost (execu

17/11/12 17:09:42 INFO TaskSetManager: Finished task 132.0 in stage 10.0 (TID 142) in 56 ms on localhost (executor driver) (143/200)
17/11/12 17:09:42 INFO Executor: Finished task 147.0 in stage 10.0 (TID 157). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO Executor: Running task 150.0 in stage 10.0 (TID 159)
17/11/12 17:09:42 INFO TaskSetManager: Starting task 152.0 in stage 10.0 (TID 161, localhost, executor driver, partition 152, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO Executor: Running task 152.0 in stage 10.0 (TID 161)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 147.0 in stage 10.0 (TID 157) in 14 ms on localhost (executor driver) (144/200)
17/11/12 17:09:42 INFO Executor: Finished task 146.0 in stage 10.0 (TID 156). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO Executor: Finished task 145.0 in stage 10.0 (TID 155). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 153.0 in stage 10.0 (TID 162, local

17/11/12 17:09:42 INFO Executor: Finished task 165.0 in stage 10.0 (TID 174). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 170.0 in stage 10.0 (TID 179, localhost, executor driver, partition 170, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 165.0 in stage 10.0 (TID 174) in 13 ms on localhost (executor driver) (162/200)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO Executor: Running task 170.0 in stage 10.0 (TID 179)
17/11/12 17:09:42 INFO Executor: Finished task 168.0 in stage 10.0 (TID 177). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 171.0 in stage 10.0 (TID 180, localhost, executor driver, partition 171, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 168.0 in stage 10.0 (T

17/11/12 17:09:42 INFO Executor: Finished task 183.0 in stage 10.0 (TID 192). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 189.0 in stage 10.0 (TID 198, localhost, executor driver, partition 189, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 183.0 in stage 10.0 (TID 192) in 23 ms on localhost (executor driver) (181/200)
17/11/12 17:09:42 INFO Executor: Running task 189.0 in stage 10.0 (TID 198)
17/11/12 17:09:42 INFO Executor: Finished task 188.0 in stage 10.0 (TID 197). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO Executor: Finished task 186.0 in stage 10.0 (TID 195). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO Executor: Finished task 187.0 in stage 10.0 (TID 196). 3053 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 190.0 in stage 10.0 (TID 199, localhost, executor driver, partition 190, PROCESS_LOCAL, 5043 bytes)
17/11/12 17:09:42 INFO TaskSetMan

17/11/12 17:09:42 INFO Executor: Finished task 199.0 in stage 10.0 (TID 208). 3096 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Finished task 199.0 in stage 10.0 (TID 208) in 33 ms on localhost (executor driver) (198/200)
17/11/12 17:09:42 INFO Executor: Finished task 198.0 in stage 10.0 (TID 207). 3139 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Finished task 198.0 in stage 10.0 (TID 207) in 35 ms on localhost (executor driver) (199/200)
17/11/12 17:09:42 INFO Executor: Finished task 148.0 in stage 10.0 (TID 209). 3225 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Finished task 148.0 in stage 10.0 (TID 209) in 48 ms on localhost (executor driver) (200/200)
17/11/12 17:09:42 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool 
17/11/12 17:09:42 INFO DAGScheduler: ShuffleMapStage 10 (cache at GnocchiSession.scala:313) finished in 0.726 s
17/11/12 17:09:42 INFO DAGScheduler: looking for 

17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO Executor: Finished task 10.0 in stage 11.0 (TID 220). 4235 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0

17/11/12 17:09:42 INFO Executor: Finished task 29.0 in stage 11.0 (TID 239). 4235 bytes result sent to driver
17/11/12 17:09:42 INFO TaskSetManager: Starting task 36.0 in stage 11.0 (TID 246, localhost, executor driver, partition 36, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:42 INFO Executor: Running task 36.0 in stage 11.0 (TID 246)
17/11/12 17:09:42 INFO TaskSetManager: Finished task 29.0 in stage 11.0 (TID 239) in 16 ms on localhost (executor driver) (29/200)
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO Executor: Finished task 30.0 in stage 11.0 (TID 240). 4235 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO TaskSetManager: Starting task 37.0 in stage 11.0 (TID 247, localhost, executor driver, partition 37, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:42 INFO Executor: Finished task 31.0 in stage 11.0 (TID 241). 4235 by

17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:42 INFO Executor: Finished task 52.0 in stage 11.0 (TID 262). 4235 bytes result sent to driver
17/11/12 17:09:42 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:42 INFO TaskSetManager: Starting task 60.0 in stage 11.0 (TID 270, localhost, executor driver, partition 60, PROCESS_LOCAL, 5103 bytes)
17/11/1

17/11/12 17:09:43 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 10.142.169.233:56322 in memory (size: 18.4 KB, free: 366.2 MB)
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 11 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherItera

17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:43 INFO Executor: Finished task 95.0 in stage 11.0 (TID 305). 4236 bytes result sent to driver
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO TaskSetManager: Starting task 105.0 in stage 11.0 (TID 315, localhost, executor driver, partition 105, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:43 INFO TaskSetManager: Finished task 95.0 in stage 11.0 (TID 305) in 19 ms on localhost (executor driver) (98/200)
17/11/12 17:09:43 INFO Executor: Running task 105.0 in stage 11.0 (TID 315)
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty bloc

17/11/12 17:09:43 INFO Executor: Running task 118.0 in stage 11.0 (TID 328)
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO Executor: Finished task 114.0 in stage 11.0 (TID 324). 4236 bytes result sent to driver
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out 

17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO Executor: Finished task 132.0 in stage 11.0 (TID 342). 4236 bytes result sent to driver
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO TaskSetManager: Starting task 139.0 in stage 11.0 (TID 349, localhost, executor driver, partition 139, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:43 INFO Executor: Running task 136.0 in stage 11.0 (TID 346)
17/11/12 17:09:43 INFO Executor: Running task 138.0 in stage 11.0 (TID 348)
17/11/12 17:09:43 INFO Executor: Running t

17/11/12 17:09:43 INFO Executor: Finished task 154.0 in stage 11.0 (TID 363). 4236 bytes result sent to driver
17/11/12 17:09:43 INFO TaskSetManager: Starting task 159.0 in stage 11.0 (TID 368, localhost, executor driver, partition 159, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:43 INFO TaskSetManager: Finished task 154.0 in stage 11.0 (TID 363) in 16 ms on localhost (executor driver) (151/200)
17/11/12 17:09:43 INFO Executor: Running task 159.0 in stage 11.0 (TID 368)
17/11/12 17:09:43 INFO Executor: Finished task 153.0 in stage 11.0 (TID 362). 4236 bytes result sent to driver
17/11/12 17:09:43 INFO TaskSetManager: Starting task 160.0 in stage 11.0 (TID 369, localhost, executor driver, partition 160, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:43 INFO TaskSetManager: Finished task 153.0 in stage 11.0 (TID 362) in 18 ms on localhost (executor driver) (152/200)
17/11/12 17:09:43 INFO Executor: Running task 160.0 in stage 11.0 (TID 369)
17/11/12 17:09:43 INFO ShuffleBlockFetcherIter

17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:43 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 

17/11/12 17:09:43 INFO Executor: Running task 192.0 in stage 11.0 (TID 401)
17/11/12 17:09:43 INFO Executor: Finished task 188.0 in stage 11.0 (TID 397). 4236 bytes result sent to driver
17/11/12 17:09:43 INFO Executor: Finished task 184.0 in stage 11.0 (TID 393). 4279 bytes result sent to driver
17/11/12 17:09:43 INFO TaskSetManager: Starting task 193.0 in stage 11.0 (TID 402, localhost, executor driver, partition 193, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:43 INFO TaskSetManager: Starting task 194.0 in stage 11.0 (TID 403, localhost, executor driver, partition 194, PROCESS_LOCAL, 5103 bytes)
17/11/12 17:09:43 INFO TaskSetManager: Finished task 188.0 in stage 11.0 (TID 397) in 16 ms on localhost (executor driver) (185/200)
17/11/12 17:09:43 INFO Executor: Running task 194.0 in stage 11.0 (TID 403)
17/11/12 17:09:43 INFO TaskSetManager: Finished task 184.0 in stage 11.0 (TID 393) in 30 ms on localhost (executor driver) (186/200)
17/11/12 17:09:43 INFO Executor: Running task 

17/11/12 17:09:43 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
17/11/12 17:09:43 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
17/11/12 17:09:43 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/11/12 17:09:43 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
17/11/12 17:09:43 INFO Executor: Finished task 148.0 in stage 11.0 (TID 409). 4319 bytes result sent to driver
17/11/12 17:09:43 INFO TaskSetManager: Finished task 148.0 in stage 11.0 (TID 409) in 657 ms on localhost (executor driver) (200/200)
17/11/12 17:09:43 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 
17/11/12 17:09:43 INFO DAGScheduler: ResultStage 11 (cache at GnocchiSession.scala:313) finished in 1.206 s
17/11/12 17:09:43 INFO DAGScheduler: Job 6 finished: cache at GnocchiSession.scala:313, took 2.059456 s
17/11/

17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 

17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 

17/11/12 17:09:44 INFO TaskSetManager: Starting task 35.0 in stage 15.0 (TID 445, localhost, executor driver, partition 35, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO TaskSetManager: Starting task 36.0 in stage 15.0 (TID 446, localhost, executor driver, partition 36, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 35.0 in stage 15.0 (TID 445)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 23.0 in stage 15.0 (TID 433) in 57 ms on localhost (executor driver) (28/200)
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO TaskSetManager: Finished task 31.0 in stage 15.0 (TID 441) in 36 ms on localhost (executor driver) (2

17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO Executor: Running task 47.0 in stage 15.0 (TID 457)
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO Executor: Finished task 43.0 in stage 15.0 (TID 453). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 48.0 in stage 15.0 (TID 458, localhost, executor driver, partition 48, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 48.0 in stage 15.0 (TID 458)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 43.0 in stage 15.0 (TID 453) in 21 ms on localhost (executor driver) (41/200)
1

17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO Executor: Finished task 53.0 in stage 15.0 (TID 463). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 57.0 in stage 15.0 (TID 467, localhost, executor driver, partition 57, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 57.0 in stage 15.0 (TID 467)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 53.0 in stage 15.0 (TID 463) in 23 ms on localhost (executor driver) (50/200)
17/11/12 17:09:44 INFO Executor: Finished task 52.0 in stage 15.0 (TID 462). 4390 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 58.0 in stage 15.0 (TID 468, localhost, executor driver, partition 58, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 58.0 in stage 15.0 (TID 468)
17/11/

17/11/12 17:09:44 INFO TaskSetManager: Finished task 61.0 in stage 15.0 (TID 471) in 36 ms on localhost (executor driver) (63/200)
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO Executor: Finished task 64.0 in stage 15.0 (TID 474). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 71.0 in stage 15.0 (TID 481, localhost, executor driver, partition 71, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 64.0 in stage 15.0 (TID 474) in 33 ms on localhost (executor driver) (64/200)
17/11/12 17:09:44 INFO Executor: Finished task 59.0 in stage 15.0 (TID 469). 4347 bytes result sent to driver
17/11

17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:44 INFO Executor: Finished task 86.0 in stage 15.0 (TID 496). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO Executor: Finished task 85.0 in stage 15.0 (TID 495). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO Executor: Finished task 84.0 in stage 15.0 (TID 494). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 89.0 in stage 15.0 (TID 499, localhost, executor driver, partition 89, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO TaskSetManager: Starting task 90.0 in stage 15.0 (TID 500, localhost, executor driver, partition 90, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO TaskSetManager: Starting task 91.0 in stage 15.0 (TID 501, localhost, executor driver, partition 91, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Exec

17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 

17/11/12 17:09:44 INFO Executor: Finished task 104.0 in stage 15.0 (TID 514). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 115.0 in stage 15.0 (TID 525, localhost, executor driver, partition 115, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 115.0 in stage 15.0 (TID 525)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 104.0 in stage 15.0 (TID 514) in 40 ms on localhost (executor driver) (108/200)
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0

17/11/12 17:09:44 INFO TaskSetManager: Starting task 124.0 in stage 15.0 (TID 534, localhost, executor driver, partition 124, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 116.0 in stage 15.0 (TID 526) in 39 ms on localhost (executor driver) (117/200)
17/11/12 17:09:44 INFO Executor: Finished task 119.0 in stage 15.0 (TID 529). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO TaskSetManager: Starting task 125.0 in stage 15.0 (TID 535, localhost, executor driver, partition 125, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Finished task 115.0 in stage 15.0 (TID 525). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 126.0 in stage 15.0 (TID 536, localhost, executor driver, partition 126, PROCESS_LOCAL, 5092 bytes)

17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 4
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 57
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 0
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 3
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 2
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 58
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 56
17/11/12 17:09:44 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 10.142.169.233:56322 in memory (size: 23.5 KB, free: 366.2 MB)
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:44 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 54
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 1
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator 55
17/11/12 17:09:44 INFO ContextCleaner: Cleaned accumulator

17/11/12 17:09:44 INFO Executor: Finished task 136.0 in stage 15.0 (TID 546). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 143.0 in stage 15.0 (TID 553, localhost, executor driver, partition 143, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 143.0 in stage 15.0 (TID 553)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 136.0 in stage 15.0 (TID 546) in 24 ms on localhost (executor driver) (136/200)
17/11/12 17:09:44 INFO Executor: Finished task 139.0 in stage 15.0 (TID 549). 4347 bytes result sent to driver
17/11/12 17:09:44 INFO TaskSetManager: Starting task 144.0 in stage 15.0 (TID 554, localhost, executor driver, partition 144, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:44 INFO Executor: Running task 144.0 in stage 15.0 (TID 554)
17/11/12 17:09:44 INFO TaskSetManager: Finished task 139.0 in stage 15.0 (TID 549) in 21 ms on localhost (executor driver) (137/200)
17/11/12 17:09:44 INFO Executor: Finished task

17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 bl

17/11/12 17:09:45 INFO Executor: Finished task 169.0 in stage 15.0 (TID 578). 4347 bytes result sent to driver
17/11/12 17:09:45 INFO TaskSetManager: Starting task 175.0 in stage 15.0 (TID 584, localhost, executor driver, partition 175, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:45 INFO Executor: Running task 175.0 in stage 15.0 (TID 584)
17/11/12 17:09:45 INFO TaskSetManager: Finished task 169.0 in stage 15.0 (TID 578) in 20 ms on localhost (executor driver) (167/200)
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0

17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO Executor: Finished task 185.0 in stage 15.0 (TID 594). 4347 bytes result sent to driver
17/11/12 17:09:45 INFO Executor: Finished task 183.0 in stage 15.0 (TID 592). 4347 bytes result sent to driver
17/11/12 17:09:45 INFO TaskSetManager: Starting task 189.0 in stage 15.0 (TID 598, localhost, executor driver, partition 189, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:45 INFO TaskSetManager: Starting task 190.0 in stage 15.0 (TID 599, localhost, executor driver, partition 190, PROCESS_LOCAL, 5092 bytes)
17/11/12 17:09:45 INFO TaskSetManager: Finished task 185.0 in stage 15.0 (TID 594) in 23 ms on localho

17/11/12 17:09:45 INFO Executor: Running task 148.0 in stage 15.0 (TID 609)
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 200 blocks
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/11/12 17:09:45 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks
17/11/12 17:09:45 IN

Finished submitting to Gnocchi.


In [10]:
# Verify the files are identifical
! diff ../examples/testData/DELETEME/part-00000-fab16dec-e163-448c-a5f5-be921bf52584-c000.csv ../examples/testData/DELETEME2/part-00000-8cf524fc-b4a4-43a2-85b7-26822a089110-c000.csv

diff: ../examples/testData/DELETEME/part-00000-fab16dec-e163-448c-a5f5-be921bf52584-c000.csv: No such file or directory
diff: ../examples/testData/DELETEME2/part-00000-8cf524fc-b4a4-43a2-85b7-26822a089110-c000.csv: No such file or directory


## Filter out variants

In addition to just loading genotypes and phenotypes, GnocchiSession can also filter variants and samples. The API is the same for the Scala shell and we verify that the Datasets output have reasonable properties.

In [106]:
filteredGenos = gs.filterVariants(genos, 0.0, 0.5)
unfilteredGenos = gs.filterVariants(genos, 0.0, 0.0)

In [108]:
assert genos.get().count() != filteredGenos.get().count(), "Counts are same"
assert filteredGenos.get().count() == 0, "All items not filtered out"

In [110]:
assert genos.get().count() == unfilteredGenos.get().count(), "Counts are same"

## Another example with filter samples

Here we replicate the Gnocchi Scala example involving the time dataset (`time_genos_1.vcf` and `tab_time_phenos.txt`) as it demonstrates the full suite of GnocchiSession functionality. We load phenotypes and genotypes, we then filter the samples and pass those into a filter by variant. We can access the underlying Dataset and verify the output.

In [114]:
genotypesPath = "../examples/testData/time_genos_1.vcf"
phenotypesPath = "../examples/testData/tab_time_phenos_1.txt"

In [115]:
geno = gs.loadGenotypes(genotypesPath)
pheno = gs.loadPhenotypes(phenotypesPath, "IID", "pheno_1", "\t", phenotypesPath, ["pheno_4", "pheno_5"])

In [117]:
filteredGenos = gs.filterSamples(geno, 0.1, 2)
filteredGenosVariants = gs.filterVariants(filteredGenos, 0.1, 0.1)

In [125]:
assert filteredGenosVariants.get().head().position() == 75094266