Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLUSTALW and MSF format #32

Closed
ErisonChen opened this issue Apr 28, 2022 · 4 comments
Closed

CLUSTALW and MSF format #32

ErisonChen opened this issue Apr 28, 2022 · 4 comments

Comments

@ErisonChen
Copy link

hi @rcedgar
with muscle v5. How could i output aligned file with the clustalw or GCG MSC format.

Thanks a lot

@ErisonChen
Copy link
Author

And another questions:
how could i manually set the cpu and cores the muslce used.

muscle 5.1.linux64 [] 65.6Gb RAM, 40 cores

this is prettty huge of 65.6Gb and 40 cores.

Thanks

@rcedgar
Copy link
Owner

rcedgar commented Apr 28, 2022

  1. Muscle v5 only supports FASTA and EFA output formats, you need to use a third-party tool to convert formats.

  2. That line reports the amount of RAM and CPU cores available, not the number actually used, this will be shown later as the command progresses. You can use the -threads N option to set a maximum number of cores.

@ErisonChen
Copy link
Author

  1. That line reports the amount of RAM and CPU cores available, not the number actually used, this will be shown later as the command progresses. You can use the -threads N option to set a maximum number of cores.

hi @rcedgar;
How can we limit the maximum amount of RAM used.
Today there was a program that spiked from 5G to 200G and then was killed and terminated.
Our compute nodes can't handle such a high memory spike.

Thanks so much.

@rcedgar
Copy link
Owner

rcedgar commented May 5, 2022

Currently, muscle v5 does not provide command line options to specify the maximum amount of RAM. If you are using -align, then -super5 generally uses less RAM. By trying some examples, you can estimate roughly how much RAM will be used by your input data as a function of sequence length and number of sequences, then you can filter out datasets which are too big. Most cluster job managers and cloud services can set a maximum for a process.

@rcedgar rcedgar closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants