Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support exporting lineage spreadsheet from LCA databases #1080

Open
ctb opened this issue Jul 3, 2020 · 2 comments
Open

support exporting lineage spreadsheet from LCA databases #1080

ctb opened this issue Jul 3, 2020 · 2 comments
Labels
code herein lies code plugin_todo Write a plugin for this! taxonomy

Comments

@ctb
Copy link
Contributor

ctb commented Jul 3, 2020

would be nice to be able to export the key bits of the lineage spreadsheet from an LCA database.

@ctb
Copy link
Contributor Author

ctb commented Jul 4, 2020

script export-lineage-csv-from-lca.py --

#! /usr/bin/env python
import sys
import csv

import sourmash
from sourmash.lca import lca_utils
from sourmash.lca import lca_db


db, _, _ = lca_db.load_single_database(sys.argv[1])
w = csv.writer(sys.stdout)
w.writerow(["name"] + list(lca_utils.taxlist()) )

for ident in db.ident_to_name:
    name = db.ident_to_name[ident]
    idx = db.ident_to_idx[ident]
    lid = db.idx_to_lid.get(idx)
    if lid is not None:
        lineage = db.lid_to_lineage[lid]

    outlist = [name] + lca_utils.zip_lineage(lineage)
    w.writerow(outlist)

use like so:

./export-lineage-csv-from-lca.py podar-ref.lca.json.gz > out.csv
sourmash lca index out.csv xyz.lca.json podar-ref.lca.json.gz

which yields

== This is sourmash version 3.3.2.dev9+g462bc387. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

Building LCA database with ksize=31 scaled=10000 moltype=DNA.
examining spreadsheet headers...
** assuming column 'name' is identifiers in spreadsheet
64 distinct identities in spreadsheet out of 64 rows.
64 distinct lineages in spreadsheet out of 64 rows.
... loaded 1 signatures.
loaded 19993 hashes at ksize=31 scaled=10000
64 assigned lineages out of 64 distinct lineages in spreadsheet.
64 identifiers used out of 64 distinct identifiers in spreadsheet.
saving to LCA DB: xyz.lca.json

👍

@ctb
Copy link
Contributor Author

ctb commented Jul 4, 2020

(note that the above command builds the new xyz.lca.json using the signatures loaded directly from the old podar-ref.lca.json.gz.)

I think #969 is relevant to this issue...

@ctb ctb added code herein lies code taxonomy labels Jul 4, 2020
@ctb ctb added the plugin_todo Write a plugin for this! label Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code herein lies code plugin_todo Write a plugin for this! taxonomy
Projects
None yet
Development

No branches or pull requests

1 participant