-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastaFile iteration fails with multiple processes #409
Comments
Hi, I am the original poster of the the question in google groups. I have done a little bit of testing in my own in order to solve my problem and reach the possible issue. I have notice that pysam fails with multiple processes if I give the full path to the genome to the function I want to run in parallel. However, if I change my working directory inside the function and I use string formatting to open the fastaFile, the multiple processes will read the file correctly and everything will work as expected I write a small example of the issue willing that it will be helpful This code giving the complete path to the genome fails opening the file import pysam as ps
import multiprocessing as mp
import os
genome_fa = "/home/inigo/msc_thesis/genome_data/hg38.fa"
#working_dir = '/home/inigo/msc_thesis/genome_data/'
number_of_cores = 3
def get_fasta(genome_fasta):
"""function for getting fasta sequence from a genome"""
# change working directory
#os.chdir(working_dir)
#some coodinates
chr1 = "chr1"
start = 200000
end = 200050
#open the file
fastafile = ps.FastaFile("%s" % genome_fasta)
# get the sequence
fasta = fastafile.fetch(chr1, start, end)
print(fasta)
return(None)
# test the function in non-parallel mode
print(get_fasta(genome_fa))
# function in parallel
if __name__ == '__main__':
jobs = []
# init the processes
for i in range(number_of_cores):
print(i)
p = mp.Process(target=get_fasta, args=(genome_fa))
jobs.append(p)
p.start()
# kill the process
for p in jobs:
p.join() This is the error output
This code giving the working directory to the function and changing the working directory inside the function will work as expected import pysam as ps
import multiprocessing as mp
import os
genome_fa = "hg38.fa"
working_dir = '/home/inigo/msc_thesis/genome_data/'
number_of_cores = 3
def get_fasta(genome_fasta,working_dir):
"""function for getting fasta sequence from a genome"""
# change working directory
os.chdir(working_dir)
#some coodinates
chr1 = "chr1"
start = 200000
end = 200050
#open the file
fastafile = ps.FastaFile("%s" % genome_fasta)
# get the sequence
fasta = fastafile.fetch(chr1, start, end)
print(fasta)
return(None)
# test the function in non-parallel mode
print(get_fasta(genome_fa,working_dir))
# function in parallel
if __name__ == '__main__':
jobs = []
# init the processes
for i in range(number_of_cores):
print(i)
p = mp.Process(target=get_fasta, args=(genome_fa,working_dir))
jobs.append(p)
p.start()
# kill the process
for p in jobs:
p.join() With the following expected output
I hope this is helpful for the developers or for anybody with the same problem best, |
Hi, took a while, the issue is a typo, use p = mp.Process(target=get_fasta, args=(genome_fa,)) note the ',' to ensure you pass a tuple |
see https://groups.google.com/forum/#!topic/pysam-user-group/bRPZoGQEcLc
The text was updated successfully, but these errors were encountered: