Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP run from within the pipeline fails for GRCh38 and VEP99 #196

Closed
NLSVTN opened this issue Feb 21, 2020 · 2 comments
Closed

VEP run from within the pipeline fails for GRCh38 and VEP99 #196

NLSVTN opened this issue Feb 21, 2020 · 2 comments

Comments

@NLSVTN
Copy link
Contributor

NLSVTN commented Feb 21, 2020

I run successfully VEP locally, the very same command that fails from within my current Hail 0.2 pipeline:

/vep/ensembl-tools-release-99/vep -i batch1.vcf --format vcf --json --everything --allele_number --no_stats --cache --offline --minimal --verbose --assembly GRCh38 --dir_cache /var/lib/spark/vep/vep_cache --fasta /vep/homo_sapiens/GRCh38/hg38.fa --plugin LoF,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o vep_output

The output from the pipeline is very uninformative: just when mt.write() happens, VEP fails, the command (shown above, but without -i batch1.vcf and with -o STDOUT) printed, and the code of the error is 2. Here is the output:

ERROR: [pid 4129] Worker Worker(...)
Traceback (most recent call last):
  File "/.conda/envs/py37/lib/python3.7/site-packages/luigi/worker.py", line 199, in run
    new_deps = self._run_get_new_deps()
  File "/.conda/envs/py37/lib/python3.7/site-packages/luigi/worker.py", line 141, in _run_get_new_deps
    task_gen = self.task.run()
  File "/hail-elasticsearch-pipelines/luigi_pipeline/seqr_loading.py", line 54, in run
    self.read_vcf_write_mt()
  File "/hail-elasticsearch-pipelines/luigi_pipeline/seqr_loading.py", line 90, in read_vcf_write_mt
    mt.write(self.output().path, overwrite=True)
  File "</.conda/envs/py37/lib/python3.7/site-packages/decorator.py:decorator-gen-1036>", line 2, in write
  File "/hail-elasticsearch-pipelines/hail_builds/v02/hail-0.2-3a68be23cb82d7c7fb5bf72668edcd1edf12822e.zip/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/hail-elasticsearch-pipelines/hail_builds/v02/hail-0.2-3a68be23cb82d7c7fb5bf72668edcd1edf12822e.zip/hail/matrixtable.py", line 2508, in write
    Env.backend().execute(MatrixWrite(self._mir, writer))
  File "/hail-elasticsearch-pipelines/hail_builds/v02/hail-0.2-3a68be23cb82d7c7fb5bf72668edcd1edf12822e.zip/hail/backend/backend.py", line 109, in execute
    result = json.loads(Env.hc()._jhc.backend().executeJSON(self._to_java_ir(ir)))
  File "/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/hail-elasticsearch-pipelines/hail_builds/v02/hail-0.2-3a68be23cb82d7c7fb5bf72668edcd1edf12822e.zip/hail/utils/java.py", line 225, in deco
    'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
hail.utils.java.FatalError: HailException: VEP command '/vep/ensembl-tools-release-99/vep --format vcf --json --everything --allele_number --no_stats --cache --offline --minimal --verbose --assembly GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/GRCh38/hg38.fa --plugin LoF,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o STDOUT' failed with non-zero exit status 2
  VEP Error output:


Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 7.0 failed 4
 times, most recent failure: Lost task 2.3 in stage 7.0 (TID 3017, 137.187.60.61, executor 2): 
is.hail.utils.HailException: VEP command '/vep/ensembl-tools-release-99/vep --format vcf --
json --everything --allele_number --no_stats --cache --offline --minimal --verbose --assembly 
GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/GRCh38/hg38.fa --plugin 
LoF,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_i
ntron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,loftee_path:/vep/loftee_grch3
8,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,ru
n_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o STDOUT' failed with non-zero exit 
status 2

Error code 2 may be related to permissions, but I just changed all VEP folders permissions and all the subfolders and files to 777. I checked hadoop file permissions too, they are fine. I am not even sure how to proceed debugging the issue. Could you suggest anything I could further look into?

I checked hadoop with hdfs dfsadmin -report and there is 200Gb available (DFS Remaining%: 20.37%), but cache usage is 100%, not sure whether it is ok or not, but hadoop seems to be working well otherwise.

I asked it also on the Hail forum but they are not sure:

https://discuss.hail.is/t/no-vep-debug-output/1302/6

I am using new VEP99.

@NLSVTN
Copy link
Contributor Author

NLSVTN commented Feb 24, 2020

I am wondering whether there is any way of redirecting the VEP error output to a file to see what is wrong.

@NLSVTN
Copy link
Contributor Author

NLSVTN commented Mar 2, 2020

Hail team solved this issue, it seems, which was in Hail itself when some debug output is not present in Hail log:

hail-is/hail#8146

I was able to look into spark work directory log and gradually make it working. A plenty of various packages (Red Hat yum, cpanm) and PERL5LIB environment variable needed to be installed manually for VEP to start working. Not as easy as it used to be. So, the pipeline works with VEP99 fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant