# How To Load Machine Learning Data in Python
You must be able to load your data before you can start your machine learning project.

The most common format for machine learning data is CSV files. There are a number of ways to load a CSV file in Python.

In this post you will discover the different ways that you can use to load your machine learning data in Python.

Let’s get started.

## Considerations When Loading CSV Data

### CSV File Header
Does your data have a file header?

If so this can help in automatically assigning names to each column of data. If not, you may need to name your attributes manually.

Either way, you should explicitly specify whether or not your CSV file had a file header when loading your data.

### Comments
Does your data have comments?

Comments in a CSV file are indicated by a hash (“#”) at the start of a line.

If you have comments in your file, depending on the method used to load your data, you may need to indicate whether or not to expect comments and the character to expect to signify a comment line.

### Delimiter
The standard delimiter that separates values in fields is the comma (“,”) character.

Your file could use a different delimiter like tab (“\t”) in which case you must specify it explicitly.

### Quotes
Sometimes field values can have spaces. In these CSV files the values are often quoted.

The default quote character is the double quotation marks “\””. Other characters can be used, and you must specify the quote character used in your file.

## Machine Learning Data Loading Recipes
Each recipe is standalone.

This means that you can copy and paste it into your project and use it immediately.

If you have any questions about these recipes or suggested improvements, please leave a comment and I will do my best to answer.

### Load CSV with Python Standard Library
The Python API provides the module CSV and the function reader() that can be used to load CSV files.

Once loaded, you convert the CSV data to a NumPy array and use it for machine learning.


In [3]:
# Load CSV (using python)
import csv
import numpy
filename = 'pima-indians-diabetes.data.csv'
raw_data = open(filename, 'rt')
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE)
x = list(reader)
data = numpy.array(x).astype('float')
print(data.shape)

(768, 9)


### Load CSV File With NumPy
You can load your CSV data using NumPy and the numpy.loadtxt() function.

This function assumes no header row and all data has the same format. The example below assumes that the file pima-indians-diabetes.data.csv is in your current working directory.

In [4]:
# Load CSV
import numpy
filename = 'pima-indians-diabetes.data.csv'
raw_data = open(filename, 'rt')
data = numpy.loadtxt(raw_data, delimiter=",")
print(data.shape)

(768, 9)


### Load CSV File With Pandas
You can load your CSV data using Pandas and the pandas.read_csv() function.

This function is very flexible and is perhaps my recommended approach for loading your machine learning data. The function returns a pandas.DataFrame that you can immediately start summarizing and plotting.

The example below assumes that the ‘pima-indians-diabetes.data.csv‘ file is in the current working directory.

In [5]:
# Load CSV using Pandas
import pandas
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = pandas.read_csv(filename, names=names)
print(data.shape)

(768, 9)


## Summary
In this post you discovered how to load your machine learning data in Python.

You learned three specific techniques that you can use:
* Load CSV with Python Standard Library.
* Load CSV File With NumPy.
* Load CSV File With Pandas.
Your action step for this post is to type or copy-and-paste each recipe and get familiar with the different ways that you can load machine learning data in Python.