Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is Y_train in classRF_train? #28

Closed
GoogleCodeExporter opened this issue Jan 23, 2016 · 7 comments
Closed

what is Y_train in classRF_train? #28

GoogleCodeExporter opened this issue Jan 23, 2016 · 7 comments

Comments

@GoogleCodeExporter
Copy link

in the function, classRF_train(X,Y,ntree,mtry, extra_options), what are X & Y?? 
as per readme file, they are X: data matrix, Y: target values. could you please 
explain more clearly their individual role.
as far i am getting, for xtrain and xtest, features are being taken as input, 
but what about ytrain and ytest? what should be the possible input their? is 
that a some kind of index? please correct me if i am wrong.
also tell me when to use RF_Class_C and when RF_Reg_C with some example....
thank you.

Original issue reported on code.google.com by abhi4emb...@gmail.com on 7 Mar 2012 at 3:54

@GoogleCodeExporter
Copy link
Author

hi 

the X and Y (for say the diabetes dataset) included in the package represents 
the data.

the description for X and Y in the diabetes dataset is explained here 
http://www-stat.stanford.edu/~tibs/ftp/lars.pdf (pg 2, table-1)

the goal is to predict the response Y based on the inputs X. 

RF is mostly used in a supervised learning setting where multiple features (in 
X) are used to predict a single response or target (in Y)

so in your setting, you have to group xtrain, ytrain together in the _train() 
functions and look at the performance of the RF algorithm by using only the 
xtest in _predict() and compare the results obtained from _predict() with ytest

you can run either classification (using RF_Class_C) or regression (using 
RF_Reg_C)

take a look at pg-11
http://www.cs.colorado.edu/~grudic/teaching/CSCI5622_2006/Introduction.pdf

its better if you take a look at an introductory statistics book

i am not responding to your que in issue-25 as its written all here

Original comment by abhirana on 7 Mar 2012 at 4:15

@GoogleCodeExporter
Copy link
Author

when i run  RF_tutorial.m, it loads data/twonorm.actually it load twonorm.mat 
producing two matrix named output and input. from where these values come 
from?what does value in output variable signify?is it taken random?

Original comment by abhi4emb...@gmail.com on 7 Mar 2012 at 8:50

@GoogleCodeExporter
Copy link
Author

these are the details of the twonorm.mat

http://www.cs.toronto.edu/~delve/data/twonorm/desc.html

the data in twonorm.mat is subsampled with about 300 examples from the twonorm 
distribution.

Original comment by abhirana on 7 Mar 2012 at 8:54

@GoogleCodeExporter
Copy link
Author

output (class labels/target values, a 1 dimensional vector) = Y
input (matrix from multiple features) = X

Original comment by abhirana on 7 Mar 2012 at 8:55

@GoogleCodeExporter
Copy link
Author

i have seem the tar file but could not figure out exactly what's there in 
output??
some combination of 1's and -1's but in what pattern?why they are only written 
so?any reasons behind or just tried to represent 1-D vector?but why in 
combination of 1 and -1?
would it give me wrong result if i put all values as '1' in output matrix......

Original comment by abhi4emb...@gmail.com on 7 Mar 2012 at 9:38

@GoogleCodeExporter
Copy link
Author

are you familiar with classification and regression problems where the goal is 
to learn a function from data? i think you need to brush that knowledge. i gave 
you the link so that you can know what distribution generates twonorm.


in simplest term i can generate a synthetic dataset as follows:
Yhat = (X1 + X2)^2, where X1 and X2 are two features and Y is the output, with 
the goal that the classifier can predict for future examples from these 
distribution

in classification, i can make a rule saying if Yhat > 2 its class-1 else its 
class-2. its no fun learning if all labels are the same. the pattern is not in 
Yhat or Y but in X and which the classifier is expected to learn. 

in regression i try to learn the rule for predicting Yhat values directly 
rather than via labels.


another example would be can you predict the chance of some disease (yes/no - 
classes) or amount of cholesterol (continous values) if you are given the 
height, weight, age, etc features. the goal is to learn patters from features 
like height etc and predict disease/cholesterol for future patients.

Original comment by abhirana on 7 Mar 2012 at 9:50

@GoogleCodeExporter
Copy link
Author

Original comment by abhirana on 31 Mar 2012 at 8:39

  • Changed state: Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant