what is Y_train in classRF_train? #28

GoogleCodeExporter · 2016-01-23T01:56:46Z

in the function, classRF_train(X,Y,ntree,mtry, extra_options), what are X & Y?? 
as per readme file, they are X: data matrix, Y: target values. could you please 
explain more clearly their individual role.
as far i am getting, for xtrain and xtest, features are being taken as input, 
but what about ytrain and ytest? what should be the possible input their? is 
that a some kind of index? please correct me if i am wrong.
also tell me when to use RF_Class_C and when RF_Reg_C with some example....
thank you.

Original issue reported on code.google.com by abhi4emb...@gmail.com on 7 Mar 2012 at 3:54

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2016-01-23T01:56:46Z

hi 

the X and Y (for say the diabetes dataset) included in the package represents 
the data.

the description for X and Y in the diabetes dataset is explained here 
http://www-stat.stanford.edu/~tibs/ftp/lars.pdf (pg 2, table-1)

the goal is to predict the response Y based on the inputs X. 

RF is mostly used in a supervised learning setting where multiple features (in 
X) are used to predict a single response or target (in Y)

so in your setting, you have to group xtrain, ytrain together in the _train() 
functions and look at the performance of the RF algorithm by using only the 
xtest in _predict() and compare the results obtained from _predict() with ytest

you can run either classification (using RF_Class_C) or regression (using 
RF_Reg_C)

take a look at pg-11
http://www.cs.colorado.edu/~grudic/teaching/CSCI5622_2006/Introduction.pdf

its better if you take a look at an introductory statistics book

i am not responding to your que in issue-25 as its written all here

Original comment by abhirana on 7 Mar 2012 at 4:15

GoogleCodeExporter · 2016-01-23T01:56:46Z

when i run  RF_tutorial.m, it loads data/twonorm.actually it load twonorm.mat 
producing two matrix named output and input. from where these values come 
from?what does value in output variable signify?is it taken random?

Original comment by abhi4emb...@gmail.com on 7 Mar 2012 at 8:50

GoogleCodeExporter · 2016-01-23T01:56:47Z

these are the details of the twonorm.mat

http://www.cs.toronto.edu/~delve/data/twonorm/desc.html

the data in twonorm.mat is subsampled with about 300 examples from the twonorm 
distribution.

Original comment by abhirana on 7 Mar 2012 at 8:54

GoogleCodeExporter · 2016-01-23T01:56:47Z

output (class labels/target values, a 1 dimensional vector) = Y
input (matrix from multiple features) = X

Original comment by abhirana on 7 Mar 2012 at 8:55

GoogleCodeExporter · 2016-01-23T01:56:47Z

i have seem the tar file but could not figure out exactly what's there in 
output??
some combination of 1's and -1's but in what pattern?why they are only written 
so?any reasons behind or just tried to represent 1-D vector?but why in 
combination of 1 and -1?
would it give me wrong result if i put all values as '1' in output matrix......

Original comment by abhi4emb...@gmail.com on 7 Mar 2012 at 9:38

GoogleCodeExporter · 2016-01-23T01:56:47Z

are you familiar with classification and regression problems where the goal is 
to learn a function from data? i think you need to brush that knowledge. i gave 
you the link so that you can know what distribution generates twonorm.


in simplest term i can generate a synthetic dataset as follows:
Yhat = (X1 + X2)^2, where X1 and X2 are two features and Y is the output, with 
the goal that the classifier can predict for future examples from these 
distribution

in classification, i can make a rule saying if Yhat > 2 its class-1 else its 
class-2. its no fun learning if all labels are the same. the pattern is not in 
Yhat or Y but in X and which the classifier is expected to learn. 

in regression i try to learn the rule for predicting Yhat values directly 
rather than via labels.


another example would be can you predict the chance of some disease (yes/no - 
classes) or amount of cholesterol (continous values) if you are given the 
height, weight, age, etc features. the goal is to learn patters from features 
like height etc and predict disease/cholesterol for future patients.

Original comment by abhirana on 7 Mar 2012 at 9:50

GoogleCodeExporter · 2016-01-23T01:56:47Z

Original comment by abhirana on 31 Mar 2012 at 8:39

Changed state: Done

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Jan 23, 2016

GoogleCodeExporter closed this as completed Jan 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is Y_train in classRF_train? #28

what is Y_train in classRF_train? #28

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

what is Y_train in classRF_train? #28

what is Y_train in classRF_train? #28

Comments

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016

GoogleCodeExporter commented Jan 23, 2016