Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional Annotation: Blast Output vs Krona Output #225

Open
Kyxsune opened this issue Jul 16, 2015 · 2 comments
Open

Functional Annotation: Blast Output vs Krona Output #225

Kyxsune opened this issue Jul 16, 2015 · 2 comments

Comments

@Kyxsune
Copy link

Kyxsune commented Jul 16, 2015

Hey Guys,

Thanks for helping me out with my previous issue, the current one comes after running the pipeline successfully (awesome btw). I use the functional annotation with blast, and get a lot of annotated scaffolds, but only a fraction of those are reported in the krona output. I was wondering why this is the case and if there was something I was doing wrong (as usual)

For comparison the entire krona.ec.input file:
D1BS28 6.3.2.n2 0.0
Q59800 1.2.1.12 2e-132
Q5YTD8 4.-.-.- 1e-154
C5C5T3 4.2.3.5 3e-123
C5C687 6.3.5.5 0.0
C5C692 2.7.7.6 5e-34
B8H6T9 2.5.1.9 2e-34
Q6AF75 3.6.1.31 6e-26
A1RAW1 3.5.4.13 1e-91
P0CH00 1.17.4.1 0.0
Q9CBQ2 1.17.4.1 3e-166
A0R7G6 5.5.1.4 3e-166
C5C0K2 2.7.7.6 0.0
C5C0G1 2.7.7.6 5e-161
C5C0S6 1.6.99.5 2e-80
Q9X8I9 3.6.1.1 1e-62
C5C046 6.1.1.10 0.0
A6WDJ3 3.6.5.n1 0.0
C5C3D1 4.2.1.33 6e-100
C5C2I9 1.1.1.85 8e-139
C5C2I2 1.1.1.86 5e-27
C5C2I2 1.1.1.86 1e-94
A0JXZ9 4.2.1.9 0.0
C5BWR2 2.7.7.8 0.0
Q0SEI1 2.1.1.45 1e-129
C5C1U6 3.6.3.14 0.0
A6W7G9 3.6.3.14 0.0
A1SNY6 4.6.1.12 1e-07
C5C093 4.2.1.11 0.0
C5C1C4 2.7.1.30 0.0
A0JUS8 1.17.7.1 0.0
B2GKD7 6.3.4.5 0.0
A1R559 4.2.1.19 2e-75
P03631 3.1.21.-;6.5.1.1 2e-160

And a mere portion of the blast.out file:
scaffold_2_106306_106737_+ B7GQS6 74.02 127 33 0 16 142 10 136 4e-51 199
scaffold_2_136137_137282_+ O07147 55.28 360 154 4 1 358 18 372 6e-86 317
scaffold_2_138042_139004_+ Q5YTD8 90.72 291 27 0 29 319 16 306 1e-154 545
scaffold_2_139698_140882_+ O58489 34.76 374 238 4 1 370 17 388 5e-58 225
scaffold_2_140884_141510_+ A1SJA2 66.01 203 65 1 3 205 1 199 9e-71 266
scaffold_2_141595_142350_+ C5C5Q3 84.06 251 39 1 1 250 2 252 7e-113 406
scaffold_2_142357_143103_+ A4FBA6 68.39 155 49 0 59 213 1 155 2e-47 189
scaffold_2_143224_143832_+ C5C5Q7 70.62 177 47 1 1 177 6 177 8e-49 193
scaffold_2_143846_144850_+ Q47N43 76.85 311 70 1 23 333 32 340 1e-118 426
scaffold_2_144946_145359_+ P65026 43.24 74 39 1 4 77 10 80 4e-07 53.5
scaffold_2_145447_147390_+ Q53955 50.94 373 172 5 190 557 198 564 1e-74 281
scaffold_2_145447_147390_+ Q53955 41.76 91 50 2 39 126 8 98 7e-12 72.8
scaffold_2_147468_148493_+ Q53956 48.64 294 147 3 1 293 27 317 5e-61 234
scaffold_2_148514_149053_+ A5CSP7 58.60 157 64 1 21 177 20 175 8e-31 132
scaffold_2_149197_151521_+ P52560 73.44 768 186 4 16 765 77 844 0.0 1159
scaffold_2_154038_154814_- P46697 44.85 136 58 5 97 221 129 258 3e-22 105
scaffold_2_154966_155667_+ Q49649 35.06 231 130 5 3 232 6 217 2e-20 99.0
scaffold_2_155673_157043_+ Q6A8J7 60.22 455 172 4 1 451 2 451 6e-141 501
scaffold_2_157094_157618_- P30536 47.54 122 62 1 50 169 33 154 5e-17 87.0
scaffold_2_157724_158791_+ P65024 66.56 299 96 1 1 295 2 300 2e-88 325
scaffold_2_158800_160830_+ C5C5S1 78.92 593 115 4 66 648 1 593 0.0 867
scaffold_2_164023_164610_+ Q5NZT6 42.13 197 99 3 1 184 29 223 4e-28 124
scaffold_2_165763_168012_- O88022 49.33 223 113 0 524 746 466 688 1e-44 182
scaffold_2_165763_168012_- O88022 38.21 301 156 5 44 344 35 305 3e-21 103
scaffold_2_168117_169550_+ Q50739 67.63 207 67 0 260 466 241 447 3e-70 266
scaffold_2_168117_169550_+ Q50739 62.42 165 51 3 2 164 9 164 4e-41 169
scaffold_2_169757_170383_+ C5C5S5 78.26 207 45 0 1 207 2 208 5e-88 323
scaffold_2_171490_174186_+ A6WCF7 64.13 895 312 3 1 895 2 887 0.0 938
scaffold_2_174255_174818_+ Q9KXQ0 54.84 155 63 2 2 156 16 163 3e-25 114
scaffold_2_174953_175978_+ O34758 27.98 336 204 9 14 325 25 346 9e-25 114
scaffold_2_176002_176868_+ Q5SJF8 49.09 110 51 3 1 110 7 111 2e-16 86.3
scaffold_2_176986_178659_+ P45792 44.38 525 285 4 36 554 42 565 3e-122 439
scaffold_2_178964_180139_+ P24559 53.73 335 153 2 20 354 12 344 1e-97 356
scaffold_2_180190_181365_+ P22609 36.77 378 231 2 13 389 3 373 4e-65 248
scaffold_2_184930_185724_+ P45794 36.73 275 144 4 9 259 18 286 6e-23 107
scaffold_2_185809_186873_+ P57309 22.40 183 118 2 142 300 160 342 9e-06 51.6
scaffold_2_188478_189668_+ C5C5T3 80.51 395 76 1 1 395 2 395 3e-123 441
scaffold_2_189761_190237_+ Q47QY8 51.13 133 60 1 2 134 34 161 2e-20 98.2
scaffold_2_190288_191364_+ B8H8V5 68.12 345 106 1 1 345 22 362 3e-124 445
scaffold_2_192870_193520_+ A0JX80 79.68 187 38 0 29 215 1 187 2e-86 318
scaffold_2_193522_193938_+ A0JX79 62.04 137 50 1 1 137 2 136 6e-34 142
scaffold_2_194135_194728_+ Q9KXR1 66.28 172 55 1 20 191 14 182 5e-46 183
scaffold_2_194776_195792_+ C5C683 77.93 299 58 3 1 298 19 310 6e-116 417
scaffold_2_195846_197141_+ Q9KXR3 75.39 386 95 0 27 412 40 425 7e-166 583
scaffold_2_197782_198906_+ Q9KXR5 72.29 350 78 3 1 350 45 375 5e-142 504
scaffold_2_199011_202229_+ C5C687 80.53 1063 199 2 1 1058 37 1096 0.0 1605
scaffold_2_199011_202229_+ C5C687 33.66 407 240 11 521 914 9 398 2e-38 162
scaffold_2_202259_203083_+ Q9KXR8 68.21 151 48 0 2 152 14 164 2e-46 186
scaffold_2_203719_204294_+ Q47R17 65.93 182 61 1 6 187 14 194 6e-56 216
scaffold_2_204330_204596_+ C5C692 86.75 83 11 0 1 83 2 84 5e-34 142
scaffold_2_204613_205827_+ P67733 59.61 406 157 5 1 401 10 413 2e-71 269
scaffold_2_205857_207062_+ B1W470 74.23 388 96 2 5 389 4 390 9e-160 563
scaffold_2_207068_209266_+ Q9CCQ3 50.00 428 198 5 315 730 227 650 1e-46 188
scaffold_2_207068_209266_+ Q9CCQ3 59.62 104 42 0 53 156 2 105 1e-19 98.6
scaffold_2_209284_209883_+ Q8K370 32.65 98 59 3 82 176 136 229 1e-05 49.7
scaffold_2_209895_210836_+ Q827P7 65.70 309 104 2 1 308 2 309 4e-106 384
scaffold_2_210938_212320_+ P71675 49.89 459 205 8 1 459 24 457 2e-53 210
scaffold_2_212385_213074_+ Q9L0Z5 63.89 216 78 0 9 224 3 218 4e-64 244
scaffold_2_213340_214440_+ P71677 52.09 311 143 3 47 356 29 334 4e-53 208
scaffold_2_214455_215102_+ P65327 63.07 176 60 3 27 198 23 197 3e-37 154
scaffold_2_215129_216403_+ A8LY38 57.31 424 160 4 1 422 4 408 1e-122 439
scaffold_2_216487_216870_+ B8H6T9 84.09 88 12 1 22 109 58 143 2e-34 144
scaffold_2_217653_217916_+ Q6AF75 82.56 86 15 0 1 86 2 87 6e-26 115
scaffold_2_217954_218802_+ B8H6U1 68.31 284 86 2 1 281 2 284 1e-95 349
scaffold_2_218812_219309_+ P64848 35.90 78 45 1 64 141 59 131 6e-04 43.5
scaffold_2_220700_221743_+ P54076 44.60 213 113 1 8 215 7 219 1e-34 147
scaffold_2_221778_222716_+ P54075 35.69 283 165 8 1 274 15 289 2e-27 122
scaffold_2_223173_223502_+ C0ZZU1 59.46 37 15 0 21 57 20 56 2e-06 51.6
scaffold_2_223510_224325_+ P66895 33.57 283 165 6 3 269 15 290 5e-30 131
scaffold_2_224343_225248_+ O53526 36.60 235 132 6 67 297 85 306 2e-26 119
scaffold_2_225258_228137_+ Q10701 47.16 634 320 5 314 940 263 888 8e-113 409
scaffold_2_225258_228137_+ Q10701 69.20 224 66 2 42 264 14 235 5e-81 303
scaffold_2_230205_231857_+ Q8D124 27.52 505 320 17 36 517 10 491 1e-20 102
scaffold_2_231872_232636_+ O14466 37.56 221 124 6 2 216 6 218 3e-33 142
scaffold_2_233228_234325_- Q82PX1 51.23 367 166 5 1 362 2 360 3e-84 312
scaffold_2_234475_235035_+ P54570 25.71 175 121 4 4 176 7 174 2e-06 52.4
scaffold_2_236621_236980_+ P28267 43.14 51 29 0 48 98 48 98 3e-05 47.0
scaffold_2_238270_239031_+ O66489 37.25 255 156 3 1 252 3 256 6e-38 157
scaffold_2_240898_241617_- A3PSZ4 42.31 78 44 1 12 89 59 135 7e-05 47.8
scaffold_2_241915_242858_- O31020 23.68 321 213 10 3 309 3 305 2e-15 83.6
scaffold_2_243704_244156_+ P67748 44.09 127 71 0 6 132 2 128 2e-21 100
scaffold_2_244182_245135_+ P77735 32.22 329 168 12 6 309 10 308 1e-26 120
scaffold_2_245174_245539_+ O06008 42.34 111 62 1 7 115 4 114 1e-15 81.6
scaffold_2_259771_260640_- P64982 29.76 168 98 7 51 203 68 230 1e-04 47.8
scaffold_2_261187_261978_- P69167 49.38 243 112 4 3 245 4 235 4e-48 191
scaffold_2_262092_262639_+ P39897 40.98 61 36 0 23 83 23 83 7e-06 50.1
scaffold_2_263729_264412_+ P25150 27.59 87 59 1 49 131 78 164 1e-04 47.0
scaffold_2_264521_265990_+ P39886 30.07 439 277 5 23 448 25 446 2e-20 100
scaffold_2_265995_266948_+ P96662 37.63 287 176 2 28 313 3 287 1e-55 216
scaffold_2_267305_268558_+ P55183 25.48 208 108 2 200 386 199 380 4e-08 59.7
scaffold_2_268573_269214_+ P55184 38.86 211 123 3 1 211 12 216 2e-22 105
scaffold_2_271028_271573_- Q11063 30.64 173 83 6 16 178 19 164 4e-04 44.3
scaffold_2_271626_272462_+ O05730 34.83 201 131 0 1 201 8 208 5e-25 115
scaffold_2_273123_274223_+ Q8MZR6 29.14 405 237 15 1 361 43 441 7e-30 131
scaffold_2_274333_275829_+ P0A0J9 28.48 302 216 0 38 339 20 321 2e-08 61.2
scaffold_2_280147_281052_+ O52866 28.62 290 182 8 19 295 5 282 3e-13 75.9
scaffold_2_282402_283583_- O53656 54.24 59 27 0 315 373 291 349 2e-09 63.5
scaffold_2_284145_285140_+ B6EH86 37.25 298 166 7 5 289 3 292 5e-50 198
scaffold_2_285266_286543_+ Q0H904 33.33 168 99 4 93 252 75 237 7e-19 95.5
scaffold_2_287303_288142_+ Q60283 27.36 201 136 4 34 225 10 209 1e-13 77.4
scaffold_2_288399_289117_- P67672 74.42 129 33 0 34 162 64 192 2e-52 205
scaffold_2_289259_289822_- Q50604 43.86 114 63 1 54 167 50 162 2e-19 95.1

scaffold_3_10979_11410_+ P37424 37.70 122 70 3 1 119 13 131 1e-10 65.1
scaffold_3_11468_13114_- P54744 45.27 243 119 7 35 271 7 241 4e-40 166
scaffold_3_14611_15516_- Q8A1M1 43.50 200 113 0 54 253 2 201 8e-44 177
scaffold_3_18605_19567_- Q58619 28.07 171 108 4 79 242 15 177 5e-10 65.5
scaffold_3_20645_22375_+ P38569 60.33 552 207 3 20 571 4 543 3e-173 608
scaffold_3_22732_23409_+ Q88A30 30.87 230 138 5 6 219 13 237 1e-12 73.2
scaffold_3_29859_31631_+ P07003 47.06 561 285 4 22 582 21 569 2e-127 456
scaffold_3_31675_32421_- P39367 55.33 244 109 0 4 247 5 248 1e-75 283
scaffold_3_32639_34021_- Q46892 38.61 417 239 4 52 458 44 453 3e-47 189
scaffold_3_34071_34652_- P46859 45.95 148 78 2 31 176 10 157 3e-33 140
scaffold_3_34723_35436_+ P31460 33.65 211 124 6 12 215 12 213 3e-10 65.5
scaffold_3_37737_38447_+ P41780 33.04 115 67 3 130 234 225 339 4e-09 62.0
scaffold_3_38942_40348_+ Q797A7 26.74 344 224 7 10 335 18 351 2e-16 87.4
scaffold_3_43031_43843_+ Q04605 33.33 102 68 0 122 223 6 107 8e-13 74.3
scaffold_3_43922_44386_+ P94562 42.57 148 84 1 1 148 3 149 8e-16 82.8
scaffold_3_44966_46180_- P10482 22.01 368 247 10 21 351 54 418 1e-11 71.2
scaffold_3_46228_47028_- Q9KEE9 37.17 191 117 2 24 211 35 225 4e-29 128
scaffold_3_47106_48086_- O32155 35.58 267 168 3 62 324 26 292 3e-28 126
scaffold_3_48184_49443_- Q8FVS7 26.33 376 245 15 41 400 35 394 5e-16 85.9
scaffold_3_49592_50578_- P24242 29.76 289 188 6 1 281 12 293 2e-19 96.7
scaffold_3_50712_51731_+ P82594 57.14 287 111 3 4 288 52 328 1e-84 313
scaffold_3_51783_57656_- Q8NXX6 30.94 265 142 6 1537 1801 793 1016 1e-19 100
scaffold_3_51783_57656_- Q8NXX6 30.27 294 155 8 1527 1820 560 803 5e-19 98.6
scaffold_3_51783_57656_- Q8NXX6 27.68 271 137 8 1536 1797 902 1122 1e-12 77.0
scaffold_3_51783_57656_- Q8NXX6 28.85 260 143 8 1537 1795 682 900 2e-12 76.6
scaffold_3_61532_62455_- Q9KP71 32.84 204 129 3 85 286 7 204 3e-12 72.4
scaffold_3_62861_63664_- Q8XH28 33.82 68 45 0 54 121 54 121 2e-06 53.1
scaffold_3_66079_66972_- P77716 29.25 253 177 1 46 296 27 279 7e-15 81.3
scaffold_3_66996_67940_- O32155 35.51 245 151 4 46 288 30 269 2e-36 152
scaffold_3_68011_69276_- O32156 21.91 388 253 14 46 411 52 411 3e-07 56.6
scaffold_3_69465_70466_+ Q65TP0 29.48 329 226 4 2 326 3 329 1e-34 147
scaffold_3_70676_71749_+ Q9X1E2 42.75 138 58 1 200 316 372 509 1e-19 97.8
scaffold_3_73807_75186_- P05656 47.37 323 151 9 25 335 35 350 2e-78 293
scaffold_3_75188_76174_- O31520 31.80 239 143 4 94 326 72 296 6e-29 128
scaffold_3_76176_77123_- O32155 32.67 251 164 4 68 314 42 291 5e-21 101
scaffold_3_78663_79688_- Q87QW9 30.77 338 220 7 1 334 3 330 1e-26 120
scaffold_3_81051_81473_+ Q52996 32.89 76 47 1 22 97 26 97 1e-04 45.4
scaffold_3_81584_82759_+ Q52997 44.60 361 176 7 2 356 37 379 1e-40 167
scaffold_3_83962_84900_- P65050 33.33 306 126 9 25 309 17 265 5e-24 112
scaffold_3_84972_86210_- Q9P6J2 44.34 106 57 2 145 250 16 119 7e-05 48.9
scaffold_3_86809_87819_- P96253 33.53 334 211 8 2 329 5 333 1e-32 140
scaffold_3_87972_88835_- P39315 45.32 278 151 1 2 279 2 278 2e-49 196
scaffold_3_88962_89378_+ P0ACN3 52.43 103 49 0 24 126 8 110 2e-26 117
scaffold_3_89500_90435_+ P42458 40.52 116 68 1 195 309 3 118 8e-16 84.7
scaffold_3_90553_92640_+ A7NR66 36.61 691 424 8 8 692 10 692 8e-99 362
scaffold_3_92683_93186_- P44558 36.89 122 76 1 31 152 2 122 1e-15 82.4
scaffold_3_93950_94648_+ P0AFR6 47.32 205 108 0 26 230 1 205 1e-53 209
scaffold_3_94713_95540_+ P64786 57.08 219 84 3 55 271 54 264 2e-62 239
scaffold_3_95645_96538_- P0AG84 57.93 290 120 2 7 295 5 293 2e-90 332
scaffold_3_98452_99375_+ C6A3T5 30.77 260 151 8 1 236 15 269 3e-11 69.7
scaffold_3_99510_100367_+ O13963 30.26 152 93 4 99 242 163 309 3e-06 52.4
scaffold_3_101655_102629_+ Q9KWF6 57.37 319 135 1 3 321 33 350 1e-75 283
scaffold_3_103945_105069_+ P54550 43.51 370 170 7 3 370 5 337 3e-76 285
scaffold_3_105580_107223_+ P64778 37.55 229 129 5 267 487 26 248 2e-14 80.9
scaffold_3_110392_111720_+ O53522 29.02 224 132 7 174 379 110 324 2e-05 50.8
scaffold_3_111785_112648_- P0AEQ3 31.86 204 134 4 70 272 44 243 2e-17 90.1
scaffold_3_112705_113403_- P0AE36 33.87 186 100 2 30 192 42 227 3e-25 115
scaffold_3_113451_114248_- P54537 48.78 246 118 2 19 263 1 239 7e-64 243
etc

@Kyxsune
Copy link
Author

Kyxsune commented Jul 16, 2015

For the functional annotation step the following commands were run:

|2015-07-14 16:14:22|# [FUNCTIONALANNOTATION]
|2015-07-14 16:26:43| [Path to Directory]/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/blastall -p blastp -i [Path to Directory]/Desktop/GenomeTK2/Mix2/FindORFS/out/proba.faa -d [Path to Directory]/Desktop/metAMOS-1.5rc3/Utilities/DB//uniprot_sprot.fasta -a 39 -e 0.001 -m 8 -b 1 > [Path to Directory]/Desktop/GenomeTK2/Mix2/FunctionalAnnotation/out/blast.out
|2015-07-14 16:26:44|[Path to Directory]/Desktop/metAMOS-1.5rc3/KronaTools/bin/ktImportEC [Path to Directory]/Desktop/GenomeTK2/Mix2/FunctionalAnnotation/out/krona.ec.input

Since these two commands ran without incidence I can only imagine that the bottleneck would be in the krona.ec.input file. I do not know perl, but i get the feeling the ktImportEC command only pulls the annotations for the ones with a parsed ec score?

@Kyxsune
Copy link
Author

Kyxsune commented Jul 16, 2015

Found it (I think).

The bottleneck is inside the fannotate.py file.(https://github.com/marbl/metAMOS/blob/v1.5rc3/src/fannotate.py) When parsing the data from the blast.out file it only writes those that have a length over 50 , and a percent id over 80. I actually dont know why the cutoff was set at that level, but at least we know where it is now. If you could explain why, I would be grateful I am quite new to bioinformatics after all ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant