Parse ENA Non-coding product in Python #15

AntonPetrov · 2017-10-13T09:46:45Z

The ENA data is provided to RNAcentral as EMBL-formatted files. They are processed into csv files that are subsequently loaded into a database. The code is written in Perl and relies on BioPerl and Ensembl Hive.

To streamline the production process I suggest moving from Perl to Python.

parse EMBL files into csv files using BioPython
create luigi pipeline
review how TPA entries are stored (for example, miRBase entries are identified by DR lines in TPAs)
combine EMBL files before processing to avoid launching tens of thousands of LSF jobs
once done, delete old Perl code from the repository

blakesweeney · 2018-03-20T15:54:14Z

I am holding off on the last step of this, deleting old Perl code, because I want to do more than just delete code as part of this. I would like to change the structure of this to be a standard python package. The nice part of this will be we can stop doing export PYTHONPATH=$PYTHONPATH:luigi in order for this to run. We can also then place our tests in a more standard location. The downside is that I will have to change all imports to reflect their new location. Thus I want to wait until after the release for that task. Other thank that this issue is complete on the release-9 branch.

AntonPetrov · 2018-04-20T16:24:14Z

Closed by #33

AntonPetrov added the new-feature label Oct 13, 2017

AntonPetrov assigned blakesweeney Oct 13, 2017

blakesweeney mentioned this issue Dec 21, 2017

Import ena with python #23

Merged

AntonPetrov closed this as completed Apr 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse ENA Non-coding product in Python #15

Parse ENA Non-coding product in Python #15

AntonPetrov commented Oct 13, 2017 •

edited

Loading

blakesweeney commented Mar 20, 2018

AntonPetrov commented Apr 20, 2018

Parse ENA Non-coding product in Python #15

Parse ENA Non-coding product in Python #15

Comments

AntonPetrov commented Oct 13, 2017 • edited Loading

blakesweeney commented Mar 20, 2018

AntonPetrov commented Apr 20, 2018

AntonPetrov commented Oct 13, 2017 •

edited

Loading