Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while installing remora #22

Closed
AzlanNI opened this issue Jun 3, 2022 · 16 comments
Closed

Error while installing remora #22

AzlanNI opened this issue Jun 3, 2022 · 16 comments

Comments

@AzlanNI
Copy link

AzlanNI commented Jun 3, 2022

Hello Everyone,

I am currently trying to get remora and the Basecaller Bonito on our HPC. I am using the pip install command but i always get the Error :

      ############################
      # Package would be ignored #
      ############################
      Python recognizes 'remora.trained_models' as an importable package, however it is
      included in the distribution as "data".
      This behavior is likely to change in future versions of setuptools (and
      therefore is considered deprecated).
  
      Please make sure that 'remora.trained_models' is included as a package by using
      setuptools' `packages` configuration field or the proper discovery methods
      (for example by using `find_namespace_packages(...)`/`find_namespace:`
      instead of `find_packages(...)`/`find:`).
  
      You can read more about "package discovery" and "data files" on setuptools
      documentation page.
  
  
  !!
  
    check.warn(importable)
  error: command 'icc' failed: No such file or directory
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for ont-remora
Failed to build ont-remora
ERROR: Could not build wheels for ont-remora, which is required to install pyproject.toml-based projects

Maybe this is a known issue or someone can help me out. I am using a PyPi mirror currently since the HPC has no net connection.

I would appreciate any help!

kind regards,

Azlan

@marcus1487
Copy link
Collaborator

I've not encountered this error on installation. Is this the entire error message? If not would you mind sending the complete error message? Could you also share your OS and python versions and the full command submitted?

To venture a guess though, this does appear to be a system compiler issue. icc is the Intel compiler which python is identifying at the C compiler configured to use on this system. Remora requires portions of the code to be compiled and thus requires a valid compiler. You may have some luck setting the compiler at Remora install time (e.g. CC=/path/to/gcc pip install ont-remora).

@AzlanNI
Copy link
Author

AzlanNI commented Jun 14, 2022

Hallo @marcus1487

The OS on our HPC Cluster is Linux and i am using Python 3.8.3.

I've used a PyPi Mirror since we don't have access to the internet from the HPC to download pip by using following command:

PIP_CONFIG_FILE=/software/python/pip.conf pip install --user ont-bonito

So one solution could be to load the new compiler like intel/xe2020.4 and try the pip install command again ?

Thanks 4 ur help!

@cjw85
Copy link
Member

cjw85 commented Jun 14, 2022

Hi @AzlanNI

Is there a particular reason you want to use bonito for basecalling? You may wish to look at the production Guppy basecaller which implements a near identical algorithm to that used in bonito (a slightly earlier version of the remora algorithm).

@AzlanNI
Copy link
Author

AzlanNI commented Jun 14, 2022

I am using these basecaller for the detection of modified DNA Bases in a CpG context of cfDNA. I just saw a presentation from ONT in which they showed that the remora models are better in detecting modified bases since they don't sacrifice basecalling accuracy for canonical bases. This is the main reason i wanted to use megalodon or bonito basecaller to use the remora models.

@cjw85
Copy link
Member

cjw85 commented Jun 14, 2022

Can you provide details of that presentation, it sounds like it needs updating. It is no longer the case with Guppy that asking it to perform modified base calling of CpG will lead to lower canonical base accuracy: Guppy has used the Remora algorithm since v6.1.1, https://community.nanoporetech.com/downloads/guppy/release_notes.

@AzlanNI
Copy link
Author

AzlanNI commented Jun 14, 2022

I watched the London Calling 2022: Update from Oxford Nanopore Technologies in which i understood that using the remora models increases the accuracy of the modified basecallings. But maybe i understood it wrong that Remora would be the best option for modified basecalling if both of them are equal in strength and accuracy then Guppy would be a better choice since we are using Guppy 5.0.7 currently on the HPC. But the Version is kinda outdated maybe we should update to the newest Version.

@AzlanNI
Copy link
Author

AzlanNI commented Jun 15, 2022

Did i mix up stuff with Guppy and the remora models ? since my Bonito basecaller still is not working sadly on the HPC Cluster.

@AzlanNI
Copy link
Author

AzlanNI commented Jun 15, 2022

I also tried the megalodon basecaller but there i always get the Error: RROR: Guppy version string does not match expected pattern: "b'Intel MKL FATAL ERROR: Cannot load /software/guppy/5.0.7/cpu/bin/guppy_basecall_server.\n'"

I think this could also be cause i am utyring to use the newest version of megalodon 2.5 abd Guppy version 5.0.7 .

@marcus1487
Copy link
Collaborator

The Remora algorithms are now the backend for all modified base calling across the different basecaller implementations (megalodon/bonito/guppy). Megalodon and Bonito directly use the implementations from Remora python package, but these may be less stable as these are research demonstrators. The implementation in Guppy is the recommendation, but newer features may lag behind the research basecallers. The next version of Guppy will add support for version 1 Remora models (higher accuracy with a signal re-scaling stage).

Note that Guppy > 6.1 is required for running Remora models within Guppy.

@AzlanNI
Copy link
Author

AzlanNI commented Jun 17, 2022

Alright. I got it! If Guppy is the recommendation for modified basecalling then maybe we should just update the Guppy Version on the HPC. As already said the premiss of using bonito and megalodon was to use remora models. Since we taught that Guppy ist sacrificing canonical basepair accuracy. But can u currently use Remora models in Guppy version > 6.1 ?

Thanks a lot for the information and help!

@cjw85
Copy link
Member

cjw85 commented Jun 17, 2022

But can u currently use Remora models in Guppy version > 6.1 ?

Correct, if you update Guppy and then run with the the configuration dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg (or similar) and use the --bam_out and --align_ref, guppy will output BAM data of aligned reads annotated tags for modified bases defined in the SAM specification.

@AzlanNI
Copy link
Author

AzlanNI commented Jun 20, 2022

alright i will try using the remora models on Guppy ASAP. Is there a command to see the remora Models which are accessible by Guppy 6.1.7 ?

@AzlanNI
Copy link
Author

AzlanNI commented Jun 20, 2022

We now have Guppy 6.1.7 installed on the HPC and i wanted to test some remora model usage to detect modified basecalling. Is there a listing of custom tags for the models oder a list in which i could see which model would be the best matching. By using Guppy_basecaller --print_workflow i dont see any modbases models

@cjw85
Copy link
Member

cjw85 commented Jun 20, 2022

I found the reference to dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg simply by digging around in the data directory of the guppy installation. I'm not sure how one is supposed to do this but here is a listing of all the configuration files:

dna_r10.4_e8.1_modbases_5hmc_5mc_cg_fast.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_fast_prom.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_hac.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_hac_prom.cfg
dna_r10.4_e8.1_modbases_5hmc_5mc_cg_sup.cfg
dna_r10.4_e8.1_modbases_5mc_cg_fast.cfg
dna_r10.4_e8.1_modbases_5mc_cg_fast_prom.cfg
dna_r10.4_e8.1_modbases_5mc_cg_hac.cfg
dna_r10.4_e8.1_modbases_5mc_cg_hac_prom.cfg
dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_fast.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_fast_prom.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_hac.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_hac_prom.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_sup.cfg
dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_sup_prom.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_fast.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_fast_prom.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_hac.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_hac_prom.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_sup.cfg
dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_fast.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_fast_prom.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_hac.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_hac_prom.cfg
dna_r9.4.1_e8.1_modbases_5mc_cg_sup.cfg

The only ones likely of interest to you are the dna_r9.4.1... ones. The others are not widely released chemistries.

@AzlanNI
Copy link
Author

AzlanNI commented Jun 20, 2022

Great Thanks! I just tried to find something to list them up. Can u tell me if there is a documentation which shows what the custom tags mean e.g. hac mean High accuracy. So what means prom or sup ? Thanks for ur help!

@cjw85
Copy link
Member

cjw85 commented Jun 20, 2022

fast: fast basecaller
hac: high accuracy basecaller
sup: super accuracy basecaller
prom: promethion
(lack of) prom: MinION/GridION

The Guppy user guide can be found in the Nanopore community: https://community.nanoporetech.com/docs/prepare/library_prep_protocols/Guppy-protocol/v/gpb_2003_v1_revae_14dec2018

@cjw85 cjw85 closed this as completed Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants