# Numpy Arrays: The Workhourse of Data Science

The time has come for us to introduce the workhorse that makes nearly all data science in Python possible: the `numpy` `array`.

Like the lists and dictionaries we've already seen, arrays are an object for collecting and organizing lots of individual records -- what's sometimes referred to as a *collection*. Unlike lists and dictionaries, however, which can store nearly anything you want to put into them, arrays are *homogeneously typed*, meaning that each array is designed to store a specific type of data, and can *only* store that type of data. An array of integers, for example, can only store integers, and an array of floating point numbers can only hold floating point numbers. 

While this may initially seem like it makes arrays inferior to more flexible collections, like lists, there is a major upside: this specialization makes arrays *fast* -- as in orders of magnitude faster -- and it is this speed that makes data science in Python possible.

And as we'll see in future readings, arrays make possible a style of programming called *vectorized programming* that not only tends to be very fast, but also makes the kind of programming we do a lot in data science especially concise and easy-to-read. And if you know any linear algebra, you'll also see that this way of programming results in code that looks more like the math of linear algebra, which many data scientists find really appealing.

Arrays come in many forms, but there are two specific types of arrays whose names will be familiar: when an array organizes its data along a single dimension, we call it a *vector*, and when its data is organized along two dimensions (like how data is laid out in an Excel spreadsheet), we call it a *matrix*. 

Most of the time when doing data science, these are actually the only two kinds of arrays you are likely to encounter, and so they will be the focus of the following several readings. However, as we'll see towards the end of this course, there isn't really anything different about working with arrays in three, four, or more dimensions; everything you learn in these first few lessons will generalize easily to all arrays. 

## Using numpy

In past lessons, we saw that not all Python functionality is accessible when you first start a Python session -- some functionality is located in libraries we have to import to use. For example, in a previous module, we used the command `import [something Drew and Geneview imported]` to gain access to functions for [something or other]. 

Numpy basically works the same way -- we can gain access to its functionality by typing `import numpy`:

In [1]:
import numpy
numpy.array([1, 2, 3])

array([1, 2, 3])

But there are two things that are a little different about numpy from the libraries we've used before. 

First, numpy is what's called a *third-party library*, meaning that you don't automatically get it when you install Python. Instead, you have install the numpy library before it can be used. numpy has already been installed in the version of Python you have access to here on coursera, but if you are working with Python on another computer, you'll need to install it using a tool like `pip` or `conda` -- we have a reading on installing third party packages here [we'll need obviously!].

Second, because we will be working with numpy a lot, we don't want to have to type out `numpy.` every time we access a numpy function. Instead, it is common practice to give numpy as alias (a different name) by typing: `import numpy as np`. Once we do that, we can access functions from the numpy library with the prefix `np.` instead of `numpy.`:

In [2]:
import numpy as np

np.array([1, 2, 3])


array([1, 2, 3])

And now that we know how to access the functionality of numpy, let's turn to using it to create vectors!