## Introduction
An overview of data processing and the NumPy library.

When asked about Google's model for success, Peter Norvig, the director of research at Google, famously stated,

`"We don't have better algorithms than anyone else; we just have more data."`

Though probably an understatement (given the amount of talent employed at Google), the quote does provide a sense of just how vital data is to having successful outcomes.
People normally discuss the importance of data in the context of machine learning. No matter how sophisticated a machine learning model is, it will not perform well unless it has a reasonable amount of data to train on. On the other hand, given a large and diverse set of training data, a good deep learning model will significantly outperform non-deep learning algorithms.

However, data is not just limited to machine learning. Companies use data to identify customer trends, political parties use data to determine which demographics they should target, sports teams use data to analyze players, etc.

The universal usage of data makes data processing, the act of converting raw data into a meaningful form, an essential skill to have.

## Numpy

Many scenarios involve mostly numeric datasets. For example, medical data contains many numeric metrics, such as height, weight, and blood pressure. Furthermore, the majority of neural networks use input data that is either numeric or has been converted to a numeric form.

When we deal with numeric data, the best Python library to use is NumPy. The NumPy library allows us to perform many operations on numeric data, and convert the data to more usable forms.

In [1]:
import numpy as np  # import the NumPy library

# Initializing a NumPy array
arr = np.array([-1, 2, 5], dtype=np.float32)

# Print the representation of the array
print(repr(arr))

array([-1.,  2.,  5.], dtype=float32)


### Arrays

NumPy arrays are basically just Python lists with added features. In fact, you can easily convert a Python list to a Numpy array using the `np.array` function, which takes in a Python list as its required argument. The function also has quite a few keyword arguments, but the main one to know is `dtype`. The `dtype` keyword argument takes in a **NumPy type** and manually cast the array to the specific type.

In [2]:
# The code below is an example usage of np.array to create a 2D matrix. Note that the array is manually cast to np.float32.

arr = np.array([[0,1,2],[3,4,5]], dtype=np.float32)

print(repr(arr))

array([[0., 1., 2.],
       [3., 4., 5.]], dtype=float32)


When the elements of a Numpy array are mixed types, then the array's type will be upcast to the highest level type. This means that if an array input has mixed `int` and `float` elements, all the integers will be cast to their floating-point equivalents. If an array is mixed with `int`, `float` and `string` elements, everything is cast to string.

In [3]:
# The code below is an example of np.array upcasting. Both integers are cast to their floating point equivalents.

arr = np.array([0, 0.1, 2])
print(repr(arr))

array([0. , 0.1, 2. ])
