Skip to content

Using RandomForestClassificatier on vectors for predicting labels gives "Found input variables with inconsistent numbers of samples" #115

@berserker1

Description

@berserker1

I am learning from Active Regression tutorial page but it has not taken up the case of applying learners to more than one dimension vectors ( I was not able to find a specific example in the doc for this, so please point if you know one ).

In the function named my_stuff

My learner is

regressor = ActiveLearner(
        estimator=RandomForestClassifier(),
        query_strategy=entropy_sampling,
        X_training=X_training, y_training=y_training.ravel()
    )

My dataset X is (13084, 50) ( meaning 13084 vectors each having 50 length ) and y is (13084, 1) ( similar meaning ).

Here X_training is (5, 50) and y_training is (5, 1).
In this section of the code( taken blatantly from the tutorial page mentioned above ):

for idx in range(n_queries):
        query_idx, query_instance = regressor.query(X)
        print(query_idx, 'query_idx', X_training.shape, y_training.shape)
        regressor.teach(X[query_idx].reshape(-1, 1), y[query_idx].reshape(-1, 1))

The program ended abruptly, so upon using python debugger I found the error:

ValueError: Found input variables with inconsistent numbers of samples: [50, 1]
> /path/to/file/predict.py(286)my_stuff()
-> regressor.teach(X[query_idx].reshape(-1, 1), y[query_idx].reshape(-1, 1))

regressor
Here X[query_idx].reshape(-1, 1) has shape (50, 1) and y[query_idx].reshape(-1, 1) has shape (1, 1).

What would be the correct procedure for the teach procedure?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions