Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cran integration #7

Closed
pommedeterresautee opened this issue Jan 4, 2015 · 17 comments
Closed

Cran integration #7

pommedeterresautee opened this issue Jan 4, 2015 · 17 comments

Comments

@pommedeterresautee
Copy link
Contributor

I am working on a plot function for this ML package https://github.com/tqchen/xgboost (I want to plot the tree model generated). I think DiagrammeR is perfect for this job.

However it would require a new dependency from Xgboost R package to your package and I can't do it properly until your package is pushed to Cran. Therefore, do you plan to push your package to Cran?

Kind regards,
Michaël

@rich-iannone
Copy link
Owner

Yes. Plan to get that underway this week. Hope beyond hope that will be a smooth process... Would love to see a sample of that use later on if that's possible.

@pommedeterresautee
Copy link
Contributor Author

Not yet finished but of course I will send it to you when written.
Btw, I think it is not possible but I prefer to ask, is it possible to have several trees in the same image?

The ML package I am working on is from the ensemble tree family, meaning the model uses several decision trees to take a decision. Therefore I will need to plot several trees.

To give you an idea, this is a txt dump of a simple model:

booster[0]:
0:[f0<1.00001] yes=1,no=2,missing=2 gain=9.00675,cover=21
    1:[f3<62.5] yes=3,no=4,missing=3 gain=0.588164,cover=10.75
        3:leaf=-1.36842cover=8.5
        4:[f3<65.5] yes=7,no=8,missing=7 gain=0.307692,cover=2.25
            7:leaf=-0cover=1
            8:leaf=-0.666667cover=1.25
    2:[f3<39] yes=5,no=6,missing=5 gain=3.19787,cover=10.25
        5:leaf=-0.909091cover=1.75
        6:[f3<61.5] yes=9,no=10,missing=9 gain=5.62929,cover=8.5
            9:[f3<51.5] yes=11,no=12,missing=11 gain=0.405797,cover=4.75
                11:leaf=0.222222cover=1.25
                12:leaf=1.11111cover=3.5
            10:[f2<1.00001] yes=13,no=14,missing=14 gain=0.361134,cover=3.75
                13:leaf=-0.8cover=1.5
                14:[f3<67.5] yes=15,no=16,missing=15 gain=1.42308,cover=2.25
                    15:leaf=0.5cover=1
                    16:leaf=-0.666667cover=1.25
booster[1]:
0:[f3<53.5] yes=1,no=2,missing=1 gain=1.233,cover=16.2397
    1:[f0<1.00001] yes=3,no=4,missing=4 gain=0.235218,cover=5.902
        3:leaf=-0.722073cover=3.23434
        4:[f2<1.00001] yes=7,no=8,missing=8 gain=0.324221,cover=2.66767
            7:leaf=0.143253cover=1.06578
            8:leaf=-0.416187cover=1.60189
    2:[f3<57.5] yes=5,no=6,missing=5 gain=2.28004,cover=10.3377
        5:leaf=0.734631cover=2.08826
        6:[f3<64.5] yes=9,no=10,missing=9 gain=0.576289,cover=8.24939
            9:[f3<61.5] yes=11,no=12,missing=11 gain=0.520244,cover=4.85999
                11:leaf=-0.0681728cover=2.46091
                12:leaf=-0.703358cover=2.39908
            10:[f0<1.00001] yes=13,no=14,missing=14 gain=0.0666218,cover=3.3894
                13:leaf=-0.082765cover=1.37079
                14:leaf=0.145369cover=2.01861
booster[2]:
0:[f3<31.5] yes=1,no=2,missing=1 gain=0.994684,cover=14.4559
    1:leaf=-0.725791cover=1.20749
    2:[f0<1.00001] yes=3,no=4,missing=4 gain=0.43832,cover=13.2484
        3:[f7<1.00001] yes=5,no=6,missing=6 gain=0.221549,cover=6.05906
            5:[f3<61.5] yes=9,no=10,missing=9 gain=0.0545174,cover=2.72321
                9:leaf=-0.0736994cover=1.61054
                10:leaf=0.143768cover=1.11267
            6:[f6<1.00001] yes=11,no=12,missing=12 gain=0.00975752,cover=3.33585
                11:leaf=-0.402298cover=1.56072
                12:leaf=-0.135091cover=1.77513
        4:[f3<68.5] yes=7,no=8,missing=7 gain=0.715236,cover=7.18931
            7:[f7<1.00001] yes=13,no=14,missing=14 gain=0.284033,cover=6.038
                13:[f2<1.00001] yes=15,no=16,missing=16 gain=0.866298,cover=3.33827
                    15:leaf=-0.435249cover=1.17335
                    16:leaf=0.386262cover=2.16492
                14:leaf=0.488874cover=2.69973
            8:leaf=-0.372583cover=1.15131
booster[3]:
0:[f5<1.00001] yes=1,no=2,missing=2 gain=0.291621,cover=13.0014
    1:leaf=-0.437401cover=1.14007
    2:[f3<56.5] yes=3,no=4,missing=3 gain=0.296743,cover=11.8614
        3:[f3<32.5] yes=5,no=6,missing=5 gain=0.767718,cover=3.7931
            5:leaf=-0.312716cover=1.30799
            6:[f3<53.5] yes=9,no=10,missing=9 gain=0.0421358,cover=2.48511
                9:leaf=0.151712cover=1.42147
                10:leaf=0.569316cover=1.06365
        4:[f3<58.5] yes=7,no=8,missing=7 gain=0.288247,cover=8.06826
            7:leaf=-0.449068cover=1.2372
            8:[f3<60.5] yes=11,no=12,missing=11 gain=0.218973,cover=6.83106
                11:leaf=0.222998cover=1.51689
                12:[f2<1.00001] yes=13,no=14,missing=14 gain=0.216326,cover=5.31417
                    13:leaf=0.154285cover=1.28126
                    14:[f3<64.5] yes=15,no=16,missing=15 gain=0.0556476,cover=4.03291
                        15:leaf=-0.356278cover=1.26271
                        16:[f3<67.5] yes=17,no=18,missing=17 gain=0.0487459,cover=2.7702
                            17:leaf=0.0423429cover=1.23857
                            18:leaf=-0.173485cover=1.53163
booster[4]:
0:[f4<1.00001] yes=1,no=2,missing=2 gain=0.327443,cover=12.76
    1:leaf=0.32368cover=1.06381
    2:[f3<54.5] yes=3,no=4,missing=3 gain=0.391578,cover=11.6962
        3:[f0<1.00001] yes=5,no=6,missing=6 gain=0.100243,cover=3.19493
            5:leaf=-0.473333cover=1.77612
            6:leaf=-0.0766397cover=1.41881
        4:[f3<57.5] yes=7,no=8,missing=7 gain=0.233588,cover=8.50126
            7:leaf=0.307528cover=1.14447
            8:[f3<66.5] yes=9,no=10,missing=9 gain=0.193483,cover=7.3568
                9:[f3<63.5] yes=11,no=12,missing=11 gain=0.0894609,cover=5.31245
                    11:[f2<1.00001] yes=13,no=14,missing=14 gain=0.0373335,cover=3.80138
                        13:leaf=-0.124951cover=1.74452
                        14:[f3<60.5] yes=15,no=16,missing=15 gain=0.0176378,cover=2.05686
                            15:leaf=0.0971158cover=1.05058
                            16:leaf=-0.0390881cover=1.00628
                    12:leaf=-0.305334cover=1.51107
                10:leaf=0.15303cover=2.04435
booster[5]:
0:[f3<32.5] yes=1,no=2,missing=1 gain=0.202925,cover=12.3398
    1:leaf=-0.29677cover=1.26005
    2:[f3<47] yes=3,no=4,missing=3 gain=0.271826,cover=11.0798
        3:leaf=0.345466cover=1.23939
        4:[f7<1.00001] yes=5,no=6,missing=6 gain=0.110205,cover=9.84037
            5:[f2<1.00001] yes=7,no=8,missing=8 gain=0.13005,cover=5.43926
                7:leaf=-0.120634cover=1.8881
                8:[f3<58.5] yes=11,no=12,missing=11 gain=0.529277,cover=3.55116
                    11:leaf=-0.235791cover=1.3903
                    12:[f3<61.5] yes=13,no=14,missing=13 gain=0.111476,cover=2.16086
                        13:leaf=0.543864cover=1.04246
                        14:leaf=0.0710653cover=1.1184
            6:[f3<66.5] yes=9,no=10,missing=9 gain=0.150789,cover=4.40111
                9:leaf=-0.276771cover=2.27269
                10:leaf=0.0468063cover=2.12842
booster[6]:
0:[f4<1.00001] yes=1,no=2,missing=2 gain=0.170242,cover=12.0306
    1:leaf=0.260167cover=1.13877
    2:[f3<53.5] yes=3,no=4,missing=3 gain=0.283698,cover=10.8919
        3:leaf=-0.306286cover=2.11739
        4:[f3<57.5] yes=5,no=6,missing=5 gain=0.312135,cover=8.77446
            5:leaf=0.335615cover=1.6591
            6:[f3<64.5] yes=7,no=8,missing=7 gain=0.14184,cover=7.11536
                7:[f2<1.00001] yes=9,no=10,missing=10 gain=0.0247509,cover=4.01684
                    9:leaf=-0.0443511cover=1.81671
                    10:leaf=-0.223651cover=2.20013
                8:[f3<67.5] yes=11,no=12,missing=11 gain=0.100104,cover=3.09852
                    11:leaf=0.240929cover=1.10811
                    12:leaf=-0.0519394cover=1.99042
booster[7]:
0:[f3<32.5] yes=1,no=2,missing=1 gain=0.197457,cover=11.9813
    1:leaf=-0.265111cover=1.19593
    2:[f3<56.5] yes=3,no=4,missing=3 gain=0.306775,cover=10.7854
        3:[f6<1.00001] yes=5,no=6,missing=6 gain=0.0724137,cover=3.03167
            5:leaf=0.0705344cover=1.62101
            6:leaf=0.402162cover=1.41066
        4:[f3<58.5] yes=7,no=8,missing=7 gain=0.112676,cover=7.7537
            7:leaf=-0.241201cover=1.26529
            8:[f3<65.5] yes=9,no=10,missing=9 gain=0.0552068,cover=6.48841
                9:[f2<1.00001] yes=11,no=12,missing=12 gain=0.407982,cover=3.64093
                    11:leaf=-0.227347cover=1.54115
                    12:[f3<61.5] yes=15,no=16,missing=15 gain=0.00967593,cover=2.09978
                        15:leaf=0.386804cover=1.01897
                        16:leaf=0.0974195cover=1.08081
                10:[f3<68.5] yes=13,no=14,missing=13 gain=0.0909914,cover=2.84748
                    13:leaf=-0.190313cover=1.5918
                    14:leaf=0.0909942cover=1.25568
booster[8]:
0:[f2<1.00001] yes=1,no=2,missing=2 gain=0.243304,cover=11.8183
    1:[f3<58.5] yes=3,no=4,missing=3 gain=0.0995766,cover=3.14931
        3:leaf=0.347267cover=1.28762
        4:leaf=0.022496cover=1.86169
    2:[f3<58.5] yes=5,no=6,missing=5 gain=0.0824455,cover=8.66899
        5:[f3<56.5] yes=7,no=8,missing=7 gain=0.194386,cover=4.17696
            7:[f0<1.00001] yes=11,no=12,missing=12 gain=0.200633,cover=3.14686
                11:leaf=0.183377cover=1.65508
                12:leaf=-0.211774cover=1.49178
            8:leaf=-0.409993cover=1.03009
        6:[f3<62.5] yes=9,no=10,missing=9 gain=0.159963,cover=4.49203
            9:leaf=0.210931cover=1.48626
            10:[f3<67.5] yes=13,no=14,missing=13 gain=0.000921036,cover=3.00578
                13:leaf=-0.0424455cover=1.47902
                14:leaf=-0.136137cover=1.52676
booster[9]:
0:[f5<1.00001] yes=1,no=2,missing=2 gain=0.232486,cover=12.0491
    1:leaf=-0.316969cover=1.05291
    2:[f3<32.5] yes=3,no=4,missing=3 gain=0.203656,cover=10.9962
        3:leaf=-0.239403cover=1.14459
        4:[f3<49.5] yes=5,no=6,missing=5 gain=0.258366,cover=9.85156
            5:leaf=0.42213cover=1.02418
            6:[f0<1.00001] yes=7,no=8,missing=8 gain=0.122395,cover=8.82738
                7:[f3<62.5] yes=9,no=10,missing=9 gain=0.420141,cover=4.28074
                    9:[f3<56] yes=13,no=14,missing=13 gain=0.131082,cover=2.6222
                        13:leaf=-0.0102133cover=1.1535
                        14:leaf=-0.427704cover=1.4687
                    10:leaf=0.229973cover=1.65853
                8:[f3<68.5] yes=11,no=12,missing=11 gain=0.38422,cover=4.54665
                    11:[f2<1.00001] yes=15,no=16,missing=16 gain=0.104008,cover=3.49122
                        15:leaf=-0.00161279cover=1.0497
                        16:leaf=0.35645cover=2.44152
                    12:leaf=-0.258477cover=1.05543

Each booster is an independant decision tree usually focusing on a part of the data not learned by the previous trees. The f[number] is an id which will be replaced by the name of a feature used to split the tree, the yes=, no= ... is the key to understand the relation between the branch of the tree, and the gain is a metric of the importance of the feature in the decision tree.

@timelyportfolio
Copy link
Contributor

Would really like to see the functionality proposed and also like to see DiagrammeR extended to cover rpart or the more comprehensive partykit. See here as an experiment before DiagrammeR existed.

I do think thought that Suggests will be better than Imports, since I would say this is an enhancement rather than a requirement. See Package Dependencies.

@pommedeterresautee
Copy link
Contributor Author

I have tried and it was easy to have several graph on the same image. That s a very good thing.
@timelyportfolio first thanks for your post in your blog about DiagrammeR package, that is how I discovered it (and thanks to @rich-iannone for having built it). I have posted an image of the first tree here https://github.com/tqchen/xgboost/issues/123. Basically I parse the text model with some regex and convert it to a data.table. Then I built the markdown with some paste command using the data.table. I wait this package to be pushed to Cran before pushing my code to xgboost (and it gives me time to polishing my code). I am very pleased with the result.

@vnijs
Copy link
Contributor

vnijs commented Jan 4, 2015

@pommedeterresautee could you perhaps post just the code + example of "Basically I parse the text model with some regex and convert it to a data.table." somewhere? Sorry to thread-jack

@pommedeterresautee
Copy link
Contributor Author

@mostly-harmless my WIP code is here: https://github.com/pommedeterresautee/xgboost/blob/master/R-package/R/xgb.plot.tree.R

The file read is the one I posted 2 posts ago. Just put the content in a text file, change the path and generate the Viz.
All the trees are generated.

@rich-iannone @timelyportfolio
Does anyone know if in Shiny it's possible to collapse a branch of a generated tree? (like you click on a node and the branch after the node are collapsed)

@timelyportfolio
Copy link
Contributor

I like the direction this conversation is headed. To separate from CRAN integration, I thought it might be good to start issue #8 for

Does anyone know if in Shiny it's possible to collapse a branch of a generated tree? (like you click on a node and the branch after the node are collapsed)

@pommedeterresautee
Copy link
Contributor Author

@mostly-harmless function is complete.

@vnijs
Copy link
Contributor

vnijs commented Jan 5, 2015

Thanks @pommedeterresautee !

@rich-iannone
Copy link
Owner

Thanks again @pommedeterresautee, that's great!

@pommedeterresautee
Copy link
Contributor Author

@rich-iannone did you find time to submit your package to cran?

@rich-iannone
Copy link
Owner

@pommedeterresautee there is still a problem building the vignette. I need to resolve that issue before submitting to CRAN.

@rich-iannone
Copy link
Owner

Okay, @pommedeterresautee and @timelyportfolio, figured out the build issue with the vignette, I had a slightly older version of knitr. Once I updated that, I could build the vignette and building the source package was free of errors. I'll submit to CRAN.

@rich-iannone
Copy link
Owner

Now submitted to CRAN. Just need to wait for a reply from BDR.

@rich-iannone
Copy link
Owner

After a few rounds of fixes, it's now in CRAN.

@pommedeterresautee
Copy link
Contributor Author

Awesome I push my code on XGBoost! First reverse dependency for DiagrammeR :-)

@rich-iannone
Copy link
Owner

That is great to hear! Thanks @pommedeterresautee for all the help and interest so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants