The samples in this dataset correspond to 30×30m patches of forest in the US, collected for the task of predicting each patch's cover type, i.e. the dominant species of tree. There are seven covertypes, making this a multiclass classification problem. Each sample has 54 features, described on the dataset's homepage. Some of the features are boolean indicators, while others are discrete or continuous measurements.
Data Set Characteristics:
Classes 7
Samples total 581012
Dimensionality 54
Features int
sklearn.datasets.fetch_covtype
will load the covertype dataset; it returns a dictionary-like 'Bunch' object with the feature matrix in the data
member and the target values in target
. If optional argument 'as_frame' is set to 'True', it will return data
and target
as pandas data frame, and there will be an additional member frame
as well. The dataset will be downloaded from the web if necessary.