Skip to content

Commit

Permalink
Merge branch 'pr/3316'
Browse files Browse the repository at this point in the history
Fixes #3316.
  • Loading branch information
larsmans committed Jul 13, 2014
2 parents 851d914 + 6668adf commit fbfcdc5
Show file tree
Hide file tree
Showing 4 changed files with 100 additions and 18 deletions.
20 changes: 20 additions & 0 deletions sklearn/datasets/california_housing.py
Expand Up @@ -6,6 +6,10 @@
The data contains 20,640 observations on 9 variables.
This dataset contains the average house value as target variable
and the following input variables (features): average income,
housing average age, average rooms, average bedrooms, population,
average occupation, latitude, and longitude in that order.
References
----------
Expand Down Expand Up @@ -56,6 +60,22 @@ def fetch_california_housing(data_home=None, download_if_missing=True):
If False, raise a IOError if the data is not locally available
instead of trying to download the data from the source site.
Returns
-------
dataset : dict-like object with the following attributes:
dataset.data : ndarray, shape [20640, 8]
Each row corresponding to the 8 feature values in order.
dataset.target : numpy array of shape (20640,)
Each value corresponds to the average house value in units of 100,000.
dataset.feature_names : array of length 8
Array of ordered feature names used in the dataset.
dataset.DESCR : string
Description of the California housing dataset.
Notes
------
Expand Down
21 changes: 21 additions & 0 deletions sklearn/datasets/covtype.py
Expand Up @@ -2,6 +2,12 @@
A classic dataset for classification benchmarks, featuring categorical and
real-valued features.
The dataset page is available from UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/datasets/Covertype
Courtesy of Jock A. Blackard and Colorado State University.
"""

# Author: Lars Buitinck <L.J.Buitinck@uva.nl>
Expand Down Expand Up @@ -58,6 +64,21 @@ def fetch_covtype(data_home=None, download_if_missing=True,
shuffle : bool, default=False
Whether to shuffle dataset.
Returns
-------
dataset : dict-like object with the following attributes:
dataset.data : numpy array of shape (581012, 54)
Each row corresponds to the 54 features in the dataset.
dataset.target : numpy array of shape (581012,)
Each value corresponds to one of the 7 forest covertypes with values
ranging between 1 to 7.
dataset.DESCR : string
Description of the forest covertype dataset.
"""

data_home = get_data_home(data_home=data_home)
Expand Down
41 changes: 40 additions & 1 deletion sklearn/datasets/lfw.py
Expand Up @@ -30,7 +30,7 @@
import numpy as np

try:
import urllib.request as urllib #for backwards compatibility
import urllib.request as urllib #for backwards compatibility
except ImportError:
import urllib

Expand Down Expand Up @@ -260,6 +260,25 @@ def fetch_lfw_people(data_home=None, funneled=True, resize=0.5,
download_if_missing: optional, True by default
If False, raise a IOError if the data is not locally available
instead of trying to download the data from the source site.
Returns
-------
dataset : dict-like object with the following attributes:
dataset.data : numpy array of shape (13233, 2914)
Each row corresponds to a ravelled face image of original size 62 x 47
pixels.
dataset.images : numpy array of shape (13233, 62, 47)
Each row is a face image corresponding to one of the 5749 people in
the dataset.
dataset.target : numpy array of shape (13233,)
Labels associated to each face image. Those labels range from 0-5748
and correspond to the person IDs.
dataset.DESCR : string
Description of the Labeled Faces in the Wild (LFW) dataset.
"""
lfw_home, data_folder_path = check_fetch_lfw(
data_home=data_home, funneled=funneled,
Expand Down Expand Up @@ -402,6 +421,26 @@ def fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5,
download_if_missing: optional, True by default
If False, raise a IOError if the data is not locally available
instead of trying to download the data from the source site.
Returns
-------
The data is returned as a Bunch object with the following attributes:
data : numpy array of shape (2200, 5828)
Each row corresponds to 2 ravel'd face images of original size 62 x 67
pixels.
pairs : numpy array of shape (2200, 2, 62, 67)
Each row has 2 face images corresponding to same or different person
from the dataset containing 5749 people.
target : numpy array of shape (13233,)
Labels associated to each pair of images. The two label values being
different persons or the same person.
DESCR : string
Description of the Labeled Faces in the Wild (LFW) dataset.
"""
lfw_home, data_folder_path = check_fetch_lfw(
data_home=data_home, funneled=funneled,
Expand Down
36 changes: 19 additions & 17 deletions sklearn/datasets/species_distributions.py
Expand Up @@ -143,23 +143,8 @@ def fetch_species_distributions(data_home=None,
If False, raise a IOError if the data is not locally available
instead of trying to download the data from the source site.
Notes
------
This dataset represents the geographic distribution of species.
The dataset is provided by Phillips et. al. (2006).
The two species are:
- `"Bradypus variegatus"
<http://www.iucnredlist.org/apps/redlist/details/3038/0>`_ ,
the Brown-throated Sloth.
- `"Microryzomys minutus"
<http://www.iucnredlist.org/apps/redlist/details/13408/0>`_ ,
also known as the Forest Small Rice Rat, a rodent that lives in Peru,
Colombia, Ecuador, Peru, and Venezuela.
Returns
--------
The data is returned as a Bunch object with the following attributes:
coverages : array, shape = [14, 1592, 1212]
Expand All @@ -186,6 +171,23 @@ def fetch_species_distributions(data_home=None,
grid_size : float
The spacing between points of the grid, in degrees
Notes
------
This dataset represents the geographic distribution of species.
The dataset is provided by Phillips et. al. (2006).
The two species are:
- `"Bradypus variegatus"
<http://www.iucnredlist.org/apps/redlist/details/3038/0>`_ ,
the Brown-throated Sloth.
- `"Microryzomys minutus"
<http://www.iucnredlist.org/apps/redlist/details/13408/0>`_ ,
also known as the Forest Small Rice Rat, a rodent that lives in Peru,
Colombia, Ecuador, Peru, and Venezuela.
References
----------
Expand Down

0 comments on commit fbfcdc5

Please sign in to comment.