Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate sequence outputs after HMMER #149

Closed
linzhi2013 opened this issue Jun 7, 2022 · 4 comments
Closed

duplicate sequence outputs after HMMER #149

linzhi2013 opened this issue Jun 7, 2022 · 4 comments

Comments

@linzhi2013
Copy link
Owner

E004/mt_assembly/megahit/E004.megahit.hmmtblout.besthit.sim:

k141_3217       ATP6    1       656     387     1055    5126    +       abun=44.9709
**k141_3036**       ATP6    1       90      1623    1728    1822    +       abun=16.6466
**k141_3036**       ATP8    1       62      1480    1564    1822    +       abun=16.6466
**k141_949**        COX1    10      1318    1320    1       1954    -       abun=30.6360
k141_3036       COX1    814     1532    3       731     1822    +       abun=16.6466
k141_1118       COX1    10      296     40      340     342     +       abun=6.0000
k141_3036       COX2    4       640     731     1378    1822    +       abun=16.6466
k141_3217       COX2    528     654     5       153     5126    +       abun=44.9709
k141_3217       COX3    8       785     1057    1845    5126    +       abun=44.9709
k141_597        CYTB    204     1094    1125    214     1125    -       abun=24.4634
k141_3758       CYTB    482     920     3       469     528     +       abun=5.0000
k141_1726       CYTB    55      440     411     4       916     -       abun=28.0000
k141_2817       ND1     25      876     681     1560    1781    +       abun=24.0000
**k141_949**        ND2     673     726     1734    1641    1954    -       abun=30.6360
k141_2118       ND2     673     724     285     194     504     -       abun=8.0303
k141_163        ND2     75      376     518     166     2425    -       abun=57.0000
k141_491        ND2     458     506     202     112     362     -       abun=9.0000
k141_3217       ND2     458     506     4766    4677    5126    -       abun=44.9709
k141_3217       ND3     114     347     1969    2214    5126    +       abun=44.9709
k141_3823       ND3     144     345     353     151     468     -       abun=22.0673
k141_3217       ND4     394     1165    5125    4322    5126    -       abun=44.9709
k141_491        ND4     586     918     362     11      362     -       abun=9.0000
k141_1408       ND4     308     410     535     663     663     +       abun=34.0000
k141_1408       ND4L    56      110     39      140     663     +       abun=34.0000
k141_3217       ND5     142     1087    4059    3019    5126    -       abun=44.9709
k141_1726       ND6     158     225     751     644     916     -       abun=28.0000

E004/mt_assembly/megahit/E004.megahit.hmmtblout.besthit.sim.fa:

>k141_491 abun=9.0000
>k141_597 abun=24.4634
**>k141_3036 abun=16.6466**
>k141_1118 abun=6.0000
>k141_163 abun=57.0000
>k141_2118 abun=8.0303
>k141_1726 abun=28.0000
>k141_3217 abun=44.9709
>k141_3758 abun=5.0000
>k141_2817 abun=24.0000
>k141_1408 abun=34.0000
>k141_3823 abun=22.0673
**>k141_949 abun=30.6360**
**>k141_949 abun=30.6360**
@linzhi2013
Copy link
Owner Author

mitoz/findmitoscaf/script/extract_fasta.py

	count = len(seqids)
	firstline = 1
	seqti  = ""
	seq = ""
	for i in fh:
		#i = i.strip()
		if i.startswith(">"):
			if firstline:
				firstline = 0
			else:
				if seqid in seqids:
					print(seqti, seq, sep="", end="", file=fhout)
					count -= 1
					if count <= 0: # the bug is here. we should reset seqti before the 'break' clause!! 
						break

			seqti = i
			seqid = i.strip().split()[0]
			seqid = seqid.replace(">", "")
			seq = ""

		else:
			seq += i

	# the last seq
	if seqid in seqids:
		print(seqti, seq, sep="", end="", file=fhout)

@linzhi2013
Copy link
Owner Author

linzhi2013 commented Jun 7, 2022

this bug does not happen, as long as the last sequence in the input fasta file is not what we want.

will be fixed in the next release. For now, you can modify the script (mitoz/findmitoscaf/script/extract_fasta.py) manually:

					if count <= 0:
						seqti = seq = ''
						break

You have to use the <TAB> key for indentation

@linzhi2013
Copy link
Owner Author

the issue here: #130

@linzhi2013
Copy link
Owner Author

fixed in MitoZ 3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant