genEra v1.0 (C) Max Planck Society for the Advancement of Science Starting time of run: 2022年 08月 18日星期四 09:48:20 CST Your temporary files will be stored in /mnt/data4/disk/yxj/diamond_nr/tmp_9606_7699 STARTING STEP 1: SEARCHING FOR HOMOLOGS WITHIN THE DATABASE USING DIAMOND -------------------------------------------------- Matching the query genes against themselves -------------------------------------------------- Searching for homologs against the NR database -------------------------------------------------- Step 1 finished! The DIAMOND/MMseqs2 table can be found in /mnt/data4/disk/yxj/diamond_nr/tmp_9606_7699/9606_Diamond_results.bout This file is usually HUGE, please dispose of it if you no longer find it useful It can still be used (-p) in case the user wants to re-run genEra while skipping step 1 STARTING STEP 2: GENERATING TAXONOMIC DATABASE FOR THE PHYLOSTRATIGRAPHIC ASSIGNMENT OF YOUR GENES -------------------------------------------------- Running ncbitax2lin to generate a raw "ncbi_lineages" file from the NCBI taxdump 2022-08-22 00:30:47,317|INFO|time spent on load_nodes: 0:00:03.902392 2022-08-22 00:30:54,329|INFO|time spent on load_names: 0:00:07.011851 2022-08-22 00:30:58,683|INFO|# of tax ids: 2,329,066 2022-08-22 00:30:59,138|INFO|df.info: RangeIndex: 2329066 entries, 0 to 2329065 Data columns (total 4 columns): # Column Dtype --- ------ ----- 0 tax_id int64 1 parent_tax_id int64 2 rank object 3 rank_name object dtypes: int64(2), object(2) memory usage: 362.3 MB 2022-08-22 00:30:59,139|INFO|Generating a dictionary of taxonomy: tax_id => tax_unit ... 2022-08-22 00:31:08,280|INFO|size of taxonomy_dict: ~80 MB 2022-08-22 00:31:08,363|INFO|Finding all lineages ... 2022-08-22 00:31:08,364|INFO|will use 6 processes to find lineages for all 2,329,066 tax ids 2022-08-22 00:31:08,364|INFO|chunk_size = 388178 2022-08-22 00:31:08,382|INFO|chunked sizes: [388178, 388178, 388178, 388178, 388178, 388176] 2022-08-22 00:31:08,383|INFO|Starting 6 processes ... 2022-08-22 00:31:08,622|INFO|Joining 6 processes ... working on tax_id: 50000 working on tax_id: 100000 working on tax_id: 150000 working on tax_id: 200000 working on tax_id: 250000 working on tax_id: 300000 working on tax_id: 350000 working on tax_id: 400000 working on tax_id: 450000 working on tax_id: 1500000 working on tax_id: 1550000 working on tax_id: 1600000 working on tax_id: 1650000 working on tax_id: 1700000 working on tax_id: 1750000 working on tax_id: 1800000 working on tax_id: 1850000 working on tax_id: 1050000 working on tax_id: 1150000 working on tax_id: 1200000 working on tax_id: 1250000 working on tax_id: 1300000 working on tax_id: 1350000 working on tax_id: 1400000 working on tax_id: 1900000 working on tax_id: 1950000 working on tax_id: 2000000 working on tax_id: 2050000 working on tax_id: 2100000 working on tax_id: 2150000 working on tax_id: 2200000 working on tax_id: 2250000 working on tax_id: 2300000 working on tax_id: 2350000 working on tax_id: 500000 working on tax_id: 650000 working on tax_id: 700000 working on tax_id: 750000 working on tax_id: 850000 working on tax_id: 900000 working on tax_id: 1000000 working on tax_id: 2400000 working on tax_id: 2450000 working on tax_id: 2500000 working on tax_id: 2550000 working on tax_id: 2600000 working on tax_id: 2650000 working on tax_id: 2750000 working on tax_id: 2800000 2022-08-22 00:31:26,282|INFO|adding lineages from /tmp/tmpek2i4pez_ncbitax2lin/_lineages_0.pkl ... 2022-08-22 00:31:32,747|INFO|adding lineages from /tmp/tmpek2i4pez_ncbitax2lin/_lineages_1.pkl ... 2022-08-22 00:31:37,162|INFO|adding lineages from /tmp/tmpek2i4pez_ncbitax2lin/_lineages_2.pkl ... 2022-08-22 00:31:41,640|INFO|adding lineages from /tmp/tmpek2i4pez_ncbitax2lin/_lineages_3.pkl ... 2022-08-22 00:31:45,255|INFO|adding lineages from /tmp/tmpek2i4pez_ncbitax2lin/_lineages_4.pkl ... 2022-08-22 00:31:49,817|INFO|adding lineages from /tmp/tmpek2i4pez_ncbitax2lin/_lineages_5.pkl ... 2022-08-22 00:31:55,296|INFO|Preparings all lineages into a dataframe to be written to disk ... 2022-08-22 00:33:34,111|INFO|Writing lineages to ncbi_lineages_2022-08-21.csv.gz ... -------------------------------------------------- Finished generating a raw "ncbi_lineages" file named ncbi_lineages_2022-08-19.csv ncbi_lineages_2022-08-21.csv Keep it in case you want to run genEra with another species (-r) ERROR: genEra could not find the raw "ncbi_lineages" file: ncbi_lineages_2022-08-19.csv ncbi_lineages_2022-08-21.csv If step 1 ran succesfully, the user can resume from this step using the argument -p Exiting 4725766.75user 228295.81system 86:46:26elapsed 1585%CPU (0avgtext+0avgdata 72127756maxresident)k 77176inputs+15997328160outputs (0major+42940708812minor)pagefaults 0swaps