You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ENA data is provided to RNAcentral as EMBL-formatted files. They are processed into csv files that are subsequently loaded into a database. The code is written in Perl and relies on BioPerl and Ensembl Hive.
To streamline the production process I suggest moving from Perl to Python.
parse EMBL files into csv files using BioPython
create luigi pipeline
review how TPA entries are stored (for example, miRBase entries are identified by DR lines in TPAs)
combine EMBL files before processing to avoid launching tens of thousands of LSF jobs
once done, delete old Perl code from the repository
The text was updated successfully, but these errors were encountered:
I am holding off on the last step of this, deleting old Perl code, because I want to do more than just delete code as part of this. I would like to change the structure of this to be a standard python package. The nice part of this will be we can stop doing export PYTHONPATH=$PYTHONPATH:luigi in order for this to run. We can also then place our tests in a more standard location. The downside is that I will have to change all imports to reflect their new location. Thus I want to wait until after the release for that task. Other thank that this issue is complete on the release-9 branch.
The ENA data is provided to RNAcentral as EMBL-formatted files. They are processed into csv files that are subsequently loaded into a database. The code is written in Perl and relies on BioPerl and Ensembl Hive.
To streamline the production process I suggest moving from Perl to Python.
The text was updated successfully, but these errors were encountered: