Machine learning datasets used in tutorials on MachineLearningMastery.com
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
HAR_Smartphones.names
HAR_Smartphones.zip
IndoorMovement.names
IndoorMovement.zip
README.md
abalone.csv
abalone.names Added more datasets. Mar 10, 2018
airline-passengers.csv
airline-passengers.names
auto-insurance.csv
auto-insurance.names
banknote_authentication.csv
banknote_authentication.names
breast-cancer-wisconsin.csv
breast-cancer-wisconsin.names
breast-cancer.csv
breast-cancer.names Added breast cancer and horse colic datasets. Mar 26, 2018
daily-max-temperatures.csv
daily-max-temperatures.names
daily-min-temperatures.csv
daily-min-temperatures.names
daily-total-female-births.csv
daily-total-female-births.names Added more time series datastes. Mar 10, 2018
glass.csv
glass.names
horse-colic.csv
horse-colic.data
horse-colic.names
household_power_consumption.names
household_power_consumption.zip
housing.csv
housing.data
housing.names
ionosphere.csv
ionosphere.names
iris.csv
iris.names
longley.csv
longley.names
monthly-car-sales.csv
monthly-car-sales.names
monthly-mean-temp.csv
monthly-mean-temp.names
monthly-robberies.csv
monthly-robberies.names
monthly-sunspots.csv
monthly-sunspots.names Added more time series datasets used in tutorials. Mar 13, 2018
monthly-writing-paper-sales.csv
monthly-writing-paper-sales.names
monthly_champagne_sales.csv
monthly_champagne_sales.names
pima-indians-diabetes.data.csv
pima-indians-diabetes.names
pollution.csv Added the pollution dataset. Mar 15, 2018
pollution.names
shampoo.csv Added time series datasets. Mar 10, 2018
shampoo.names
sonar.csv
sonar.names
wheat-seeds.csv
wheat-seeds.names
winequality-red.csv
winequality-white.csv
winequality.names
yearly-water-usage.csv
yearly-water-usage.names

README.md

Machine Learning Datasets

This repository contains a copy of machine learning datasets used in tutorials on MachineLearningMastery.com.

This repository was created to ensure that the datasets used in tutorials remain available and are not dependent upon unreliable third parties.

In many cases, tutorials will link directly to the raw dataset URL, therefore dataset filenames should not be changed once added to the repository.

Datasets

This section provides a summary of the datasets in this repository.

Binary Classification Datasets

  • Breast Cancer (Wisconsin)
  • Breast Cancer (Yugoslavia)
  • Bank Note Authentication
  • Horse Colic
  • Ionosphere
  • Pima Indians Diabetes
  • Sonar Returns

Multiclass Classification Datasets

  • Glass Identification
  • Iris Flower Species
  • Wheat Seeds
  • Abalone Age (or regression)
  • Wine Quality (or regression)

Regression Datasets

  • Boston Housing
  • Longley Economic
  • Auto Insurance Total Claims

Univariate Time Series Datasets

  • Daily Minimum Temperatures in Melbourne
  • Daily Maximum Temperatures in Melbourne
  • Daily Female Births in California
  • Monthly International Airline Passengers
  • Monthly Armed Robberies in Boston
  • Monthly Sunspots
  • Monthly Champagne Sales
  • Monthly Shampoo Sales
  • Monthly Car Sales
  • Monthly Mean Temperatures in Nottingham Castle
  • Monthly Specialty Writing Paper Sales
  • Yearly Water Usage in Baltimore

Multivariate Time Series Datasets

  • Hourly Pollution Levels in Beijing
  • Minutely Individual Household Electric Power Consumption
  • Human Activity Recognition Using Smartphones
  • Indoor Movement Prediction