Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting features used by prediction path #75

Closed
Rkubinski opened this issue Dec 11, 2019 · 20 comments
Closed

Highlighting features used by prediction path #75

Rkubinski opened this issue Dec 11, 2019 · 20 comments

Comments

@Rkubinski
Copy link

Is there any way to remove highlighted features used by prediction path or print them differently like in a table along with their importances ? Please see picture below - the features are quite smushed together and I would like to present them in a different manner or not present them at all.
tree

@tlapusan
Copy link
Collaborator

Hi @Rkubinski, thanks for your feedback.
Indeed, feature names and their values aren't displayed properly in your picture above. Give us some time and we will came up with a better solution. Thanks.

@Rkubinski
Copy link
Author

Great ! Thanks :)

@parrt
Copy link
Owner

parrt commented Dec 11, 2019

Weird. Is that on windows? mac? Does it do same if vertical?

@Rkubinski
Copy link
Author

Rkubinski commented Dec 12, 2019

Same problem with vertical, this is on Ubuntu 18.10,
The problem is because of my feature names being quite long. These are bacterial names (20-30 characters) and I dont really have a way of concatenating them.
A nice solution would be to be able to generate a separate table with the features, how they contributed and maybe their importance.

@parrt
Copy link
Owner

parrt commented Dec 12, 2019

Ok, this was a font metrics problem when it was happening on mac/windows. Maybe we need to make that configurable. or if already is, please try diff font.

@Rkubinski
Copy link
Author

Where do I pass a font type option ? Dont see such a parameter for dtreeviz object

@tlapusan
Copy link
Collaborator

@Rkubinski can you provide the list of feature names ? I tried to reproduce the issue with long feature names and it works.

Screen Shot 2019-12-18 at 5 44 25 PM

@tlapusan
Copy link
Collaborator

Where do I pass a font type option ? Dont see such a parameter for dtreeviz object

dtreeviz() method has the 'fontname' parameter

@Rkubinski
Copy link
Author

Ok, I was not sure what you meant by font metrics.
Here are my commands:
colors_2_classes= [None, # 0 classes
None, # 1 class
['green','red']]
viz3 = dtreeviz(estimator,
X,
y,
target_name='Diagnosis',
feature_names=OTUs,
class_names=["nonIBD", "IBD"],
orientation='LR',
colors={'classes':colors_2_classes},
X=Xp
)

OTUs:
[
"Genus Methanobacterium",
"Genus Methanobrevibacter",
"Family Methanomethylophilaceae",
"Genus Bryocella",
"Genus Bryobacter",
"Order Subgroup 2",
"Genus Blastocatella",
"Family Blastocatellaceae",
"Order DS-100",
"Genus Luteitalea",
"Class Subgroup 6",
"Genus Iamia",
"Order Microtrichales",
"Genus Actinomyces",
"Genus Actinotignum",
"Genus Varibaculum",
"Genus Bifidobacterium",
"Genus Corynebacterium",
"Genus Corynebacterium 1",
"Genus Lawsonella",
"Family Corynebacteriaceae",
"Genus Rhodococcus",
"Genus Blastococcus",
"Genus Geodermatophilus",
"Genus Modestobacter",
"Genus Nakamurella",
"Genus Brevibacterium",
"Genus Actinotalea",
"Genus Cellulomonas",
"Genus Brachybacterium",
"Family Intrasporangiaceae",
"Genus Rathayibacter",
"Family Microbacteriaceae",
"Family Microbacteriaceae",
"Genus Citricoccus",
"Genus Glutamicibacter",
"Genus Kocuria",
"Genus Micrococcus",
"Genus Rothia",
"Family Micrococcaceae",
"Genus Marmoricola",
"Genus Nocardioides",
"Genus Cutibacterium",
"Genus Atopobium",
"Genus Collinsella",
"Family Coriobacteriales Incertae Sedis",
"Genus Eggerthella",
"Genus Gordonibacter",
"Genus Senegalimassilia",
"Genus Slackia",
"Family Eggerthellaceae",
"Order Coriobacteriales",
"Order Gaiellales",
"Order Gaiellales",
"Genus metagenome",
"Genus Solirubrobacter",
"Genus Bacteroides",
"Genus Barnesiella",
"Genus Coprobacter",
"Family Barnesiellaceae",
"Genus Dysgonomonas",
"Genus Proteiniphilum",
"Genus Butyricimonas",
"Genus Odoribacter",
"Genus Muribaculum",
"Family Muribaculaceae",
"Genus metagenome",
"Family Muribaculaceae",
"Family Muribaculaceae",
"Family Muribaculaceae",
"Family Muribaculaceae",
"Genus F0058",
"Genus Paludibacter",
"Genus Porphyromonas",
"Genus Alloprevotella",
"Genus Paraprevotella",
"Genus Prevotella",
"Genus Prevotella 2",
"Genus Prevotella 6",
"Genus Prevotella 7",
"Genus Prevotella 9",
"Genus Prevotellaceae Ga6A1 group",
"Genus Prevotellaceae NK3B31 group",
"Genus Prevotellaceae UCG-001",
"Genus Prevotellaceae UCG-003",
"Family Prevotellaceae",
"Genus Alistipes",
"Genus Rikenella",
"Genus Rikenellaceae RC9 gut group",
"Genus Parabacteroides",
"Genus Tannerella",
"Order Bacteroidales",
"Genus Cnuella",
"Family Saprospiraceae",
"Genus Cytophaga",
"Genus Hymenobacter",
"Genus Chryseolinea",
"Genus OLB12",
"Genus Siphonobacter",
"Family Microscillaceae",
"Genus Dyadobacter",
"Genus Capnocytophaga",
"Genus Flavobacterium",
"Family Flavobacteriaceae",
"Genus metagenome",
"Genus Chryseobacterium",
"Genus Cloacibacterium",
"Genus Bacteroidetes bacterium OLB10",
"Family NS11-12 marine group",
"Genus Pedobacter",
"Genus Sphingobacterium",
"Family env.OPS 17",
"Order OPB56",
"Family Anaerolineaceae",
"Family Caldilineaceae",
"Genus Chloronema",
"Genus Oscillochloris",
"Class Gitt-GS-136",
"Genus metagenome",
"Class KD4-96",
"Genus Candidatus Melainabacteria bacterium MEL.A1",
"Genus Clostridium sp. CAG:306",
"Order Gastranaerophilales",
"Order Gastranaerophilales",
"Genus Candidatus Obscuribacter phosphatis",
"Order Chloroplast",
"Genus Gloeocapsa PCC-7428",
"Genus Meiothermus",
"Genus Thermus",
"Genus Arcobacter",
"Genus Campylobacter",
"Genus Helicobacter",
"Genus Anoxybacillus",
"Genus Bacillus",
"Genus Geobacillus",
"Genus Virgibacillus",
"Genus Thermicanus",
"Genus Gemella",
"Genus Exiguobacterium",
"Genus Brochothrix",
"Genus Brevibacillus",
"Genus Paenibacillus",
"Genus Solibacillus",
"Genus Sporosarcina",
"Genus Staphylococcus",
"Genus Abiotrophia",
"Genus Aerococcus",
"Genus Facklamia",
"Genus Alloiococcus",
"Genus Atopostipes",
"Genus Granulicatella",
"Genus Enterococcus",
"Genus Lactobacillus",
"Genus Pediococcus",
"Genus Weissella",
"Genus Lactococcus",
"Genus Streptococcus",
"Genus Catabacter",
"Genus Christensenella",
"Genus Christensenellaceae R-7 group",
"Family Christensenellaceae",
"Genus Caloramator",
"Genus Clostridium sensu stricto 1",
"Genus Clostridium sensu stricto 13",
"Genus Clostridium sensu stricto 2",
"Genus Proteiniclasticum",
"Genus Sarcina",
"Genus Clostridiales bacterium enrichment culture clone 06-1235251-67",
"Family Clostridiales vadinBB60 group",
"Genus metagenome",
"Family Clostridiales vadinBB60 group",
"Family Clostridiales vadinBB60 group",
"Family Clostridiales vadinBB60 group",
"Genus Defluviitaleaceae UCG-011",
"Genus Acetobacterium",
"Genus Anaerofustis",
"Genus Eubacterium",
"Genus Anaerococcus",
"Genus Ezakiella",
"Genus Finegoldia",
"Genus Murdochiella",
"Genus Parvimonas",
"Genus Peptoniphilus",
"Genus Family XIII AD3011 group",
"Genus Family XIII UCG-001",
"Genus Mogibacterium",
"Genus S5-A14a",
"Genus [Eubacterium] brachy group",
"Genus [Eubacterium] nodatum group",
"Family Family XIII",
"Family Family XIII",
"Genus Acetitomaculum",
"Genus Agathobacter",
"Genus Anaerosporobacter",
"Genus Anaerostipes",
"Genus Blautia",
"Genus Butyrivibrio",
"Genus CAG-56",
"Genus CHKCI001",
"Genus Catonella",
"Genus Cellulosilyticum",
"Genus Coprococcus 1",
"Genus Coprococcus 2",
"Genus Coprococcus 3",
"Genus Cuneatibacter",
"Genus Dorea",
"Genus Eisenbergiella",
"Genus Epulopiscium",
"Genus Fusicatenibacter",
"Genus GCA-900066575",
"Genus GCA-900066755",
"Genus Howardella",
"Genus Hungatella",
"Genus Johnsonella",
"Genus Lachnoanaerobaculum",
"Genus Lachnoclostridium",
"Genus Lachnoclostridium 5",
"Genus Lachnospira",
"Genus Lachnospiraceae FCS020 group",
"Genus Lachnospiraceae ND3007 group",
"Genus Lachnospiraceae NK4A136 group",
"Genus Lachnospiraceae UCG-001",
"Genus Lachnospiraceae UCG-003",
"Genus Lachnospiraceae UCG-004",
"Genus Lachnospiraceae UCG-008",
"Genus Lachnospiraceae UCG-009",
"Genus Lachnospiraceae UCG-010",
"Genus Lactonifactor",
"Genus Marvinbryantia",
"Genus Moryella",
"Genus Oribacterium",
"Genus Robinsoniella",
"Genus Roseburia",
"Genus Sellimonas",
"Genus Shuttleworthia",
"Genus Stomatobaculum",
"Genus Tyzzerella",
"Genus Tyzzerella 3",
"Genus Tyzzerella 4",
"Genus UC5-1-2E3",
"Genus [Eubacterium] eligens group",
"Genus [Eubacterium] fissicatena group",
"Genus [Eubacterium] hallii group",
"Genus [Eubacterium] ruminantium group",
"Genus [Eubacterium] ventriosum group",
"Genus [Eubacterium] xylanophilum group",
"Genus [Ruminococcus] gauvreauii group",
"Genus [Ruminococcus] gnavus group",
"Genus [Ruminococcus] torques group",
"Family Lachnospiraceae",
"Family Lachnospiraceae",
"Family Lachnospiraceae",
"Genus Peptococcus",
"Family Peptococcaceae",
"Genus Clostridioides",
"Genus Intestinibacter",
"Genus Paeniclostridium",
"Genus Peptostreptococcus",
"Genus Romboutsia",
"Genus Terrisporobacter",
"Genus Anaerofilum",
"Genus Anaerotruncus",
"Genus Angelakisella",
"Genus Butyricicoccus",
"Genus CAG-352",
"Genus Candidatus Soleaferrea",
"Genus Caproiciproducens",
"Genus DTU089",
"Genus Faecalibacterium",
"Genus Fastidiosipila",
"Genus Flavonifractor",
"Genus Fournierella",
"Genus GCA-900066225",
"Genus Hydrogenoanaerobacterium",
"Genus Intestinimonas",
"Genus Negativibacillus",
"Genus Oscillibacter",
"Genus Oscillospira",
"Genus Phocea",
"Genus Pseudoflavonifractor",
"Genus Ruminiclostridium",
"Genus Ruminiclostridium 1",
"Genus Ruminiclostridium 5",
"Genus Ruminiclostridium 6",
"Genus Ruminiclostridium 9",
"Genus Ruminococcaceae NK4A214 group",
"Genus Ruminococcaceae UCG-002",
"Genus Ruminococcaceae UCG-003",
"Genus Ruminococcaceae UCG-004",
"Genus Ruminococcaceae UCG-005",
"Genus Ruminococcaceae UCG-008",
"Genus Ruminococcaceae UCG-009",
"Genus Ruminococcaceae UCG-010",
"Genus Ruminococcaceae UCG-013",
"Genus Ruminococcaceae UCG-014",
"Genus Ruminococcus 1",
"Genus Ruminococcus 2",
"Genus Subdoligranulum",
"Genus UBA1819",
"Genus [Eubacterium] coprostanoligenes group",
"Family Ruminococcaceae",
"Family Ruminococcaceae",
"Order Clostridiales",
"Order DTU014",
"Genus Thermoanaerobacterium",
"Genus Candidatus Stoquefichus",
"Genus Catenibacterium",
"Genus Coprobacillus",
"Genus Dielma",
"Genus Erysipelatoclostridium",
"Genus Erysipelotrichaceae UCG-003",
"Genus Erysipelotrichaceae UCG-004",
"Genus Faecalicoccus",
"Genus Faecalitalea",
"Genus Holdemanella",
"Genus Holdemania",
"Genus Merdibacter",
"Genus Solobacterium",
"Genus Turicibacter",
"Genus [Clostridium] innocuum group",
"Family Erysipelotrichaceae",
"Family Erysipelotrichaceae",
"Genus Acidaminococcus",
"Genus Phascolarctobacterium",
"Genus Succiniclasticum",
"Genus Allisonella",
"Genus Anaeroglobus",
"Genus Dialister",
"Genus Megamonas",
"Genus Megasphaera",
"Genus Mitsuokella",
"Genus Negativicoccus",
"Genus Selenomonas",
"Genus Selenomonas 3",
"Genus Veillonella",
"Family Veillonellaceae",
"Family Veillonellaceae",
"Genus Fusobacterium",
"Genus Leptotrichia",
"Genus Streptobacillus",
"Family Longimicrobiaceae",
"Genus Victivallis",
"Family vadinBE97",
"Family vadinBE97",
"Family vadinBE97",
"Genus TM7 phylum sp. oral clone DR034",
"Genus Pir4 lineage",
"Genus Pirellula",
"Genus Belnapia",
"Genus Rhodovarius",
"Genus Roseomonas",
"Genus Skermanella",
"Genus Asticcacaulis",
"Genus Brevundimonas",
"Genus Caulobacter",
"Genus Phenylobacterium",
"Family Caulobacteraceae",
"Family Caulobacteraceae",
"Genus Reyranella",
"Genus Methylobacterium",
"Genus Microvirga",
"Genus Roseiarcus",
"Family Beijerinckiaceae",
"Genus Devosia",
"Family Devosiaceae",
"Genus Allorhizobium-Neorhizobium-Pararhizobium-Rhizobium",
"Genus Mesorhizobium",
"Genus Phyllobacterium",
"Genus Bauldia",
"Genus Nordella",
"Genus Bradyrhizobium",
"Genus Pseudorhodoplanes",
"Genus Rhodopseudomonas",
"Family Xanthobacteraceae",
"Genus Gemmobacter",
"Genus Paracoccus",
"Genus Rubellimicrobium",
"Family Rhodobacteraceae",
"Genus Azospirillum sp. 47_25",
"Order Rhodospirillales",
"Order Rhodospirillales",
"Order Rhodospirillales",
"Family Mitochondria",
"Genus Altererythrobacter",
"Genus Qipengyuania",
"Genus Sphingobium",
"Genus Sphingomonas",
"Genus Sphingopyxis",
"Genus Bacteriovorax",
"Genus Peredibacter",
"Genus Bilophila",
"Genus Desulfovibrio",
"Family Desulfovibrionaceae",
"Genus Haliangium",
"Family Polyangiaceae",
"Genus metagenome",
"Order Myxococcales",
"Genus Alishewanella",
"Genus Shewanella",
"Genus Aquabacterium",
"Genus Brachymonas",
"Genus Burkholderia-Caballeronia-Paraburkholderia",
"Genus Caenimonas",
"Genus Duganella",
"Genus Hydrogenophaga",
"Genus Janthinobacterium",
"Genus Limnobacter",
"Genus Massilia",
"Genus Oxalobacter",
"Genus Parasutterella",
"Genus Ralstonia",
"Genus Ramlibacter",
"Genus Sutterella",
"Genus Tepidimonas",
"Genus Variovorax",
"Family Burkholderiaceae",
"Genus Formivibrio",
"Genus Methylophilus",
"Genus Eikenella",
"Genus Neisseria",
"Family Neisseriaceae",
"Family SC-I-84",
"Order CCD24",
"Genus Cellvibrio",
"Genus Escherichia-Shigella",
"Genus Hafnia-Obesumbacterium",
"Genus Morganella",
"Genus Pantoea",
"Genus Pectobacterium",
"Genus Proteus",
"Family Enterobacteriaceae",
"Genus Actinobacillus",
"Genus Aggregatibacter",
"Genus Gallibacterium",
"Genus Haemophilus",
"Family Pasteurellaceae",
"Genus Acinetobacter",
"Genus Alkanindiges",
"Genus Enhydrobacter",
"Genus Moraxella",
"Family Moraxellaceae",
"Genus Pseudomonas",
"Genus Steroidobacter",
"Genus Candidatus Tenderia",
"Order WD260",
"Family Rhodanobacteraceae",
"Genus Luteimonas",
"Genus Pseudoxanthomonas",
"Genus Xanthomonas",
"Order Rokubacteriales",
"Genus Brachyspira",
"Genus Treponema 2",
"Genus Cloacibacillus",
"Genus Jonquetella",
"Genus Anaeroplasma",
"Order Izimaplasmatales",
"Order Izimaplasmatales",
"Order Mollicutes RF39",
"Order Mollicutes RF39",
"Order Mollicutes RF39",
"Order Mollicutes RF39",
"Genus IMCC26134",
"Family Pedosphaeraceae",
"Genus Akkermansia",
"Genus Luteolibacter",
"Family Verrucomicrobiaceae"
]

@tlapusan
Copy link
Collaborator

@Rkubinski, indeed you have a long list of features :).
If you are training your model, let say with only 30% of the features, do you have the same issue ?

Meanwhile I will try to reproduce the issue with a list of features similar in length with yours ;)

@tlapusan
Copy link
Collaborator

I created a vertical version for instance table. It is configurable through instance_orientation parameter.
This vertical version maybe could help @Rkubinski. Also it's helpful when we have a large set of features and the horizontal table become too large...

@parrt do you have a better name for instance_orientation parameter ?

Screen Shot 2019-12-30 at 4 09 16 PM

@Rkubinski
Copy link
Author

Hi,
Sorry for the delay in responses. I have the same issue when I do a feature reduction, so it seems to be due to the feature names. When do you think the instance orientation update could be pushed ?

@tlapusan
Copy link
Collaborator

I plan to make a PR tomorrow :)

@parrt
Copy link
Owner

parrt commented Dec 30, 2019

instance_orientation seems pretty good. maybe test_orientation? prediction_orientation? hmm... maybe yours is best.

@tlapusan
Copy link
Collaborator

@Rkubinski I have just created the PR #78 . I hope it will solve your issue. Waiting for your feedback ;)

@Rkubinski
Copy link
Author

Hi guys. Thank you so much for the help. I am still new to this, I tried to update dtreeviz via pip but I guess that has not been updated yet. So to use this new version, I am assuming I would have to pull from github and run the scripts directly ?

@tlapusan
Copy link
Collaborator

tlapusan commented Jan 2, 2020

Hi @Rkubinski, yes, you can use the master version from repo until @parrt will make a new release.

@Rkubinski
Copy link
Author

@tlapusan Great! Ill get back to you when Ive tested it out

@tlapusan
Copy link
Collaborator

tlapusan commented Feb 25, 2020

hi @Rkubinski, any feedback ? I would like to close this issue.

@Rkubinski
Copy link
Author

Hi guys, sorry for the delay in feedback, it did work out in the end :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants