Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive update of generalized code and improvements #100

Merged
merged 5 commits into from Sep 27, 2023

Conversation

Roleren
Copy link
Contributor

@Roleren Roleren commented Sep 20, 2023

This is a very large commit, with thousands of lines changed, preliminary tests worked, I added some new ones too.

Wait to accept this until I run the final tests, some key points are:

  1. Annotation did mean gff, but it should be both gff and gtf getter, with format specification, this is now fixed and generalized.
  2. Generalize more urls (it is still very unsafe, and bloated)
  3. Fix fungi collection
  4. Add in protists support
  5. update wrong paths
  6. remove EnsemblGenome (it is an artificial split, which user can not know which to use)
  7. Generalized code in general (e.g. getGTF is now just a wrapper to getGFF etc, with specified format)

@HajkD
Copy link
Member

HajkD commented Sep 20, 2023

Incredible, @Roleren! Very very well done and thank you so so much!

I will wait with the merge until your go and will have a closer look as well.

With many thanks and very best wishes,
Hajk

@Roleren
Copy link
Contributor Author

Roleren commented Sep 22, 2023

All right it is ready to go,
I am now up to 5000 lines changed, a new system is emerging, some key points are:

  1. Annotation did mean gff, but it should be both gff and gtf getter, with format specification, this is now fixed and
  2. Generalize more urls (it is still very unsafe, some HTTP errors most likely remain)
  3. Fix fungi fetching to work for all colelctions
  4. Added in protists support for ensembl
  5. updated wrong URLS
  6. remove EnsemblGenome (it is an artificial split, which user can not know which to use)
  7. Improved refseq and gencode functions in general
  8. Summaries of assemblies fetching is now simplified
  9. Generalized code in general (e.g. getGTF is now just a wrapper to getGFF etc, with specified format)
  10. Refseq checks md5 sum even if file exists and is already checked, can we avoid this ?
  11. Total rewrite of the refseq/genbank get summary file / kingdom file system
  12. Cache system for backend files (faster testing and for power users)
  13. Added more tests and improved many of them

Problems still remaining:

  • Much of the package does not use the generalized formats, it is simply too much to change in one go, as the whole package suffers from severe copy of code which have drifted apart over time.
  • Not all tests are correct, some still fails and some were never correct to begin with.

This is now ready to be merged

@HajkD
Copy link
Member

HajkD commented Sep 24, 2023

This is absolutely amazing, @Roleren!

Thank you so so much for all your efforts.

Would you also like to copy/paste and adjust a few of your key points in the NEWS.md file?
Once you are happy with having added all novelty there (and referenced to this PR), I will gladly merge right away.

With many thanks and very best wishes,
Hajk

@Roleren
Copy link
Contributor Author

Roleren commented Sep 27, 2023

Okay, done, please check the news file that looks right, I tried to follow your old formatting style.

I think it is ready to go, biomaRt is now bumped to version 1.05

@HajkD
Copy link
Member

HajkD commented Sep 27, 2023

Brilliant! Thank you so much! I will also have a quick look into fixing the getProtein() retrieval from UniProt and then we can push the new version to CRAN. In the meantime we could also start addressing the open issues.

With many thanks and very best wishes,
Hajk

@HajkD HajkD merged commit bbbfb56 into ropensci:master Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants