# Numpy and Matplotlib (Part 1)¶

Today, we'll learn the basics of numpy and matplotlib. These two packages are the basic workhorses for any general computing and analysis done in Python by scientists. 

To start our notebook, we need to import both matplotlib and numpy.

## Introduction

The `numpy` package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. 

To use `numpy` you need to import the module, using for example:

In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. 



## Creating `numpy` arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.
* reading data from files

### From lists

For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.

The difference between the `v` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property.

The number of elements in the array is available through the `ndarray.size` property:

Equivalently, we could use the function `numpy.shape` and `numpy.size`

So far the `numpy.ndarray` looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type? 

There are several reasons:

* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created.
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

### Array type
One of the most fundamental aspects to controlling our arrays is the data type. 

What data types are available? Here's a list off the Numpy website:
* **bool_** - Boolean (True or False) stored as a byte
* **int_** - Default integer type (same as C long; normally either int64 or int32)
* **intc** - Identical to C int (normally int32 or int64)
* **intp** - Integer used for indexing (same as C ssize_t; normally either int32 or int64)
* **int8** - Byte (-128 to 127)
* **int16** - Integer (-32768 to 32767)
* **int32** - Integer (-2147483648 to 2147483647)
* **int64** - Integer (-9223372036854775808 to 9223372036854775807)
* **uint8** - Unsigned integer (0 to 255)
* **uint16** - Unsigned integer (0 to 65535)
* **uint32** - Unsigned integer (0 to 4294967295)
* **uint64** - Unsigned integer (0 to 18446744073709551615)
* **float_** - Shorthand for float64.
* **float16** - Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
* **float32** - Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
* **float64** - Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
* **complex_** - Shorthand for complex128.
* **complex64** - Complex number, represented by two 32-bit floats (real and imaginary components)
* **complex128** - Complex number, represented by two 64-bit floats (real and imaginary components)

We can use the `dtype` (data type) property of an `ndarray`, we can see what type the data of an array has:

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

Common data types that can be used with `dtype` are: `int`, `float`, `complex`, `bool`, `object`, etc.

We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generate arrays of different forms. Some of the more common are:

#### linspace

#### zeros and ones

## File I/O

### Comma-separated values (CSV)

A very common file format for data files is comma-separated values (CSV), or related formats such as TSV (tab-separated values). To read data from such files into Numpy arrays we can use the `numpy.genfromtxt` function. For example, 

Using `numpy.savetxt` we can store a Numpy array to a file in CSV format:

### Numpy's native file format

Useful when storing and reading back numpy array data. Use the functions `numpy.save` and `numpy.load`:

## Manipulating arrays

### Indexing

We can index elements in an array using square brackets and indices:

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array) 

The same thing can be achieved with using `:` instead of an index: 

We can assign new values to elements in an array using indexing:

### Index slicing

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

Array slices are *mutable*: if they are assigned a new value the original array from which the slice was extracted is modified:

We can omit any of the three parameters in `M[lower:upper:step]`:

Negative indices counts from the end of the array (positive index from the begining):

### Data processing

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays. 

For example, let's calculate some properties from the Stockholm temperature dataset used above.

#### sum

#### mean

The daily mean temperature in Stockholm over the last 200 years has been about 6.2 C.

#### standard deviations and variance

#### min and max

## Further reading

* http://numpy.scipy.org
* http://scipy.org/Tentative_NumPy_Tutorial
* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.