# Working with Data in OpenCV
Now that we have gotten a taste of the three most common types of machine learning problems, it is time to delve a little deeper into the different parts that make up a typical machine learning system.

In specific, we want to address the following questions:
- What does a typical machine learning workflow look like?
- What are training data, validation data, and test data - and what are they good for?
- How do I load, store, and work with such data in OpenCV using Python?

## Dealing with data in OpenCV and Python

Although raw data can come from a variety of sources and in a wide range of formats, it will help us to think of all data fundamentally as arrays of numbers. For example, images can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across area. Sound clips can be thought of one-dimensional arrays of intensity over time. For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to machine learning.

The way to do this in Python is to use the NumPy package (short for "Numerical Python"). Once we understand how to manipulate data in NumPy, we will turn our attention to OpenCV's own data containers.

## Reminder about built-in documentation

As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), as well as the documentation of various functions
(using the <tt>?</tt> character)

For example, to display all the contents of the numpy namespace, you can type this:

    $ ipython
    In [1]: np.<TAB>

And to display NumPy's built-in documentation, you can use this:

    $ ipython
    In [2]: np?

More detailed documentation, along with tutorials and other resources, can be found at http://www.numpy.org.

In [10]:
int_arr = np.array(int_list)
int_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
int_arr * 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In addition, every NumPy array comes with a list of attributes:

In [12]:
print("int_arr ndim", int_arr.ndim)
print("int_arr shape", int_arr.shape)
print("int_arr size", int_arr.size)
print("int_arr dtype", int_arr.dtype)

('int_arr ndim', 1)
('int_arr shape', (10L,))
('int_arr size', 10)
('int_arr dtype', dtype('int32'))


You can display all attributes and methods of a NumPy array by typing out the name of the array, add a period, and then hit <tt>&lt;TAB&gt;</tt>:

<tt>In [X]: int_arr.&lt;TAB&gt;</tt>

This will display a dropdown menu with a whole lot of attributes, such as the ones mentioned above.

## Accessing single array elements by indexing
 In a one-dimensional array, the i-th value (counting from zero) can be accessed by
specifying the desired index in square brackets, just as with Python lists:

In [13]:
int_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [14]:
int_arr[0]

0

In [15]:
int_arr[3]

3

To index from the back of the array, you can use negative indices:

In [16]:
int_arr[-1]

9

In [17]:
int_arr[-2]

8

There are a few other cool tricks.

For example, return all elements from index 2 up to index 5, but don't include index 5:

In [18]:
int_arr[2:5]

array([2, 3, 4])

Return all elements from the beginning of the array up to index 5 - 1:

In [19]:
int_arr[:5]

array([0, 1, 2, 3, 4])

Return all elements from index 5 up to the end of the array:

In [20]:
int_arr[5:]

array([5, 6, 7, 8, 9])

Return every other element, starting at index 0:

In [21]:
int_arr[::2]

array([0, 2, 4, 6, 8])

Return all elements of the array, but in reverse order:

In [22]:
int_arr[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

## Creating multi-dimensional arrays