Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error PEMA ASV inference: Directory swarm does not exists #59

Closed
savvas-paragkamian opened this issue Oct 9, 2023 · 5 comments
Closed

Comments

@savvas-paragkamian
Copy link
Contributor

savvas-paragkamian commented Oct 9, 2023

Error with the PEMA ASV inference. Possibly due to spelling error.

The line that possibly was skipped because of spelling error in the parameters file.

} else if (paramsDereplication{'clusteringAlgo'} == 'algo_Swarm') {

In the parameters file I write clusteringAlgo algo_swarm, while it is suggested to write (write "Swarm" or "vsearch" or "CROP" after algo_).

In the initialize.bds script there is a line that creates the folder Swarm.

The error

Fatal error: /home/modules/taxAssignment.bds, line 11. Directory '/mnt/analysis/isd_crete_2016_20230823/7.mainOutput/gene_16S/swarm' does not exists
pema_latest.bds, line 156 :     if ( paramsForTaxAssign{'custom_ref_db'} != 'Yes'){
pema_latest.bds, line 158 :        if ( paramsForTaxAssign{'gene'} == 'gene_16S') {
pema_latest.bds, line 170 :           if (paramsForTaxAssign{'taxonomyAssignmentMethod'} != 'phylogeny') {
pema_latest.bds, line 172 :              crestAssign(paramsForTaxAssign, globalVars)
taxAssignment.bds, line 4 :     string crestAssign(string{} params, string{} globalVars) {
taxAssignment.bds, line 6 :        if ( params{'custom_ref_db'} != 'Yes') {
taxAssignment.bds, line 9 :           if ( (params{'gene'} == 'gene_16S' || params{'gene'} == 'gene_18S') && params{'taxonomyAssignmentMethod'} != 'phylogeny' ) {
taxAssignment.bds, line 11 :             globalVars{'assignmentPath'}.chdir()

The parameters file:
parameters0f.isd_crete_2016_20230823.txt

@hariszaf
Copy link
Owner

hariszaf commented Oct 9, 2023

Hi @savvas-paragkamian and thanks for sharing.

I am a bit confused though, have you tried using clusteringAlgo algo_Swarm as suggested ?

If yes, do you also get an error then ?

@savvas-paragkamian
Copy link
Contributor Author

Yes I write clusteringAlgo algo_swarm, but it is suggested to write clusteringAlgo algo_Swarm.

In the initialize script also it is written:

      } else if ( params{'clusteringAlgo'} == 'algo_Swarm' ) {
         string algo = 'Swarm'
         algo.mkdir()
      }

Maybe that's why the folder wasn't created?

Currently, I run the analysis from the dereplication step checkpoint. I have manually created both folders (777 permissions) i.e. Swarm, swarm in the directory 7.mainOutput/gene_16.

In addition I changed the parameters file to algo_Swarm.

My question is, if I change the parameters file and continue the analysis from the checkpoint, PEMA reads the new changes of the parameters?

@hariszaf
Copy link
Owner

hariszaf commented Oct 9, 2023

Let's break this down to steps! 😉
so, first, could you please give it a shot with just a few numbers of samples (2-3)
setting the clusteringAlgo to algo_Swarm ?

If you have already tried so, did you have an error ?

if I change the parameters file and continue the analysis from the checkpoint, PEMA reads the new changes of the parameters?

in the beginning of each checkpoint, pema reads the parameters file thanks to the readParameterFile() function

@savvas-paragkamian
Copy link
Contributor Author

The new job with 4 samples and with the correct algo_Swarm parameter works!

The job from the checkpoint failed with the following error:

bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/home/scripts/dereplicateSwarm.sh: line 99: 29689 Killed                  awk 'BEGIN {FS = "[>_]"}

     # Parse the sample files
     /^>/ {contingency[$2][FILENAME] = $3
           amplicons[$2] += $3
           if (FNR == 1) {
               samples[++i] = FILENAME
           }
          }

     END {# Create table header
          printf "amplicon"
          s = length(samples)
          for (i = 1; i <= s; i++) {
              printf "\t%s", samples[i]
          }
          printf "\t%s\n", "total"

          # Sort amplicons by decreasing total abundance (use a coprocess)
          command = "LC_ALL=C sort -k1,1nr -k2,2d"
          for (amplicon in amplicons) {
               printf "%d\t%s\n", amplicons[amplicon], amplicon |& command
          }
          close(command, "to")
          FS = "\t"
          while ((command |& getline) > 0) {
              amplicons_sorted[++j] = $2
          }
          close(command)

          # Print the amplicon occurrences in the different samples
          n = length(amplicons_sorted)
          for (i = 1; i <= n; i++) {
               amplicon = amplicons_sorted[i]
               printf "%s", amplicon
               for (j = 1; j <= s; j++) {
                   printf "\t%d", contingency[amplicon][samples[j]]
               }
               printf "\t%d\n", amplicons[amplicon]
          }}' linearized.dereplicate* > ../amplicon_contingency_table.tsv
Fatal error: /home/modules/preprocess.bds, line 449, pos 5. Exec failed.
	Exit value : 137
	Command    :  bash /home/scripts/dereplicateSwarm.sh
pema_latest.bds, line 103 :	if ( paramsDereplication{'clusteringAlgo'} == 'algo_Swarm' ) {
pema_latest.bds, line 106 :	   swarmDereplicate(paramsDereplication, globalVars)
preprocess.bds, line 445 :	string swarmDereplicate(string{} params, string{} globalVars){
preprocess.bds, line 449 :	    sys bash $globalVars{'path'}/scripts/dereplicateSwarm.sh

@hariszaf
Copy link
Owner

hariszaf commented Nov 1, 2023

Thanks for sharing @savvas-paragkamian .
The issue is with the global parametes that are set in the initialization and do not change after each checkpoint.

I ll fix that as part of pema v.2.1.5 and reach back as soon as it's released

hariszaf added a commit that referenced this issue Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants