Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse ENA Non-coding product in Python #15

Closed
5 tasks done
AntonPetrov opened this issue Oct 13, 2017 · 2 comments
Closed
5 tasks done

Parse ENA Non-coding product in Python #15

AntonPetrov opened this issue Oct 13, 2017 · 2 comments
Assignees

Comments

@AntonPetrov
Copy link
Member

AntonPetrov commented Oct 13, 2017

The ENA data is provided to RNAcentral as EMBL-formatted files. They are processed into csv files that are subsequently loaded into a database. The code is written in Perl and relies on BioPerl and Ensembl Hive.

To streamline the production process I suggest moving from Perl to Python.

  • parse EMBL files into csv files using BioPython
  • create luigi pipeline
  • review how TPA entries are stored (for example, miRBase entries are identified by DR lines in TPAs)
  • combine EMBL files before processing to avoid launching tens of thousands of LSF jobs
  • once done, delete old Perl code from the repository
@blakesweeney
Copy link
Member

I am holding off on the last step of this, deleting old Perl code, because I want to do more than just delete code as part of this. I would like to change the structure of this to be a standard python package. The nice part of this will be we can stop doing export PYTHONPATH=$PYTHONPATH:luigi in order for this to run. We can also then place our tests in a more standard location. The downside is that I will have to change all imports to reflect their new location. Thus I want to wait until after the release for that task. Other thank that this issue is complete on the release-9 branch.

@AntonPetrov
Copy link
Member Author

Closed by #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants