Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline fails with mag_depth script error when bins are empty #630

Closed
felipemachado85 opened this issue Jun 25, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@felipemachado85
Copy link

felipemachado85 commented Jun 25, 2024

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lsetp6nu because the default path (/users/f/s/fsantann/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Hi all!

First of all, thank you for this amazing tool, it's really refreshing to have such powerful pipeline at hands.

I just want to share this "bug" (quoted because it's entirely related to my sample), that I've encountered yesterday. I was running the pipeline with three samples, and I got this error message:

Caused by:
  Process `NFCORE_MAG:MAG:DEPTHS:MAG_DEPTHS_PLOT (MEGAHIT-MaxBin2-MAS)` terminated with an error exit status (1)


Command executed:

  plot_mag_depths.py --bin_depths MEGAHIT-MaxBin2-MAS-binDepths.tsv                     --groups sample_groups.tsv                     --out "MEGAHIT-MaxBin2-MAS-binDepths.heatmap.png"
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:DEPTHS:MAG_DEPTHS_PLOT":
      python: $(python --version 2>&1 | sed 's/Python //g')
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
      seaborn: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('seaborn').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lsetp6nu because the default path (/users/f/s/fsantann/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  Traceback (most recent call last):
    File "/users/f/s/fsantann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 83, in <module>
      sys.exit(main())
    File "/users/f/s/fsantann/.nextflow/assets/nf-core/mag/bin/plot_mag_depths.py", line 70, in main
      sns.clustermap(
    File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1402, in clustermap
      return plotter.plot(metric=metric, method=method,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1220, in plot
      self.plot_dendrograms(row_cluster, col_cluster, metric, method,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 1065, in plot_dendrograms
      self.dendrogram_row = dendrogram(
    File "/usr/local/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 784, in dendrogram
      plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis,
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 594, in __init__
      self.linkage = self.calculated_linkage
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 661, in calculated_linkage
      return self._calculate_linkage_scipy()
    File "/usr/local/lib/python3.9/site-packages/seaborn/matrix.py", line 629, in _calculate_linkage_scipy
      linkage = hierarchy.linkage(self.array, method=self.method,
    File "/usr/local/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1068, in linkage
      n = int(distance.num_obs_y(y))
    File "/usr/local/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2572, in num_obs_y
      raise ValueError("The number of observations cannot be determined on "
  ValueError: The number of observations cannot be determined on an empty distance matrix.

Work dir:
  /gpfs1/home/f/s/fsantann/work/64/858b740fe77cd5e95503ae127a3a77

Upon closer look, I checked the QC on these samples and found out that only one of them had actual bins (bin_summary.tsv). I re-ran the pipeline with the binned sample and it worked fine. I am unsure whether or not the missing bins might've affected the pipeline. I attached the nextflow.log for closer inspection.

Thanks!

Best,

Felipe

Command used and terminal output

No response

Relevant files

06_24_nextflow.log

System information

Nextflow version: 24.04.2
HPC
slurm
Singularity
Linux
nf-core/am version 3.0.1

@felipemachado85 felipemachado85 added the bug Something isn't working label Jun 25, 2024
@jfy133 jfy133 changed the title Matplotlib is not a writable directory Pipeline fails with mag_depth script error when bins are empty Jun 26, 2024
@jfy133
Copy link
Member

jfy133 commented Jun 26, 2024

I think this is more to do with how we handle samples when they don't result in any bins at all.

My feeling is we need to add a filter to remove such samples, plus a warning that it has happened.

@maxibor
Copy link
Member

maxibor commented Jul 4, 2024

Fixed with #635

@maxibor maxibor closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants