# NumPy arrays

Nikolay Koldunov

koldunovn@gmail.com

================

<img  height="100" src="files/numpy.png" >

-    a powerful N-dimensional array object
-    sophisticated (broadcasting) functions
-    tools for integrating C/C++ and Fortran code
-    useful linear algebra, Fourier transform, and random number capabilities


In [1]:
import numpy as np
%matplotlib inline

In [2]:
np.set_printoptions(precision=3 , suppress= True) # this is just to make the output look better

## Load data

Load data in to a variable:

In [3]:
temp = np.loadtxt('Ham_3column.txt')

In [4]:
temp

array([[1891.,    1.,    1.,  -72.],
       [1891.,    1.,    2.,  -43.],
       [1891.,    1.,    3.,  -32.],
       ...,
       [2014.,    8.,   29.,  216.],
       [2014.,    8.,   30.,  198.],
       [2014.,    8.,   31.,  184.]])

In [5]:
temp.shape

(45168, 4)

<img  height="100" src="files/anatomyarray.png" >

In [6]:
temp.size

180672

So it's a *row-major* order. Matlab and Fortran use *column-major* order for arrays.

In [7]:
type(temp)

numpy.ndarray

Numpy arrays are statically typed, which allow faster operations

In [8]:
temp.dtype

dtype('float64')

You can't assign value of different type to element of the numpy array:

In [9]:
temp[0,0] = 'Year'

ValueError: could not convert string to float: Year

Slicing works similarly to Matlab:

In [None]:
temp[0:5,:]

In [None]:
temp[-5:-1,:]

One can look at the data. This is done by matplotlib module:

In [None]:
import matplotlib.pylab as plt
plt.plot(temp[:,3])

## Index slicing

In general it is similar to Matlab

First 12 elements of **second** column (months). Remember that indexing starts with 0:

In [None]:
temp[0:12,2]

First raw:

In [None]:
temp[0,:]

## Exercise

 - Plot only first 1000 values
 - Plot last 1000 values


In [None]:
plt.plot(temp[-1000:,3])

We can create mask, selecting all raws where values in third raw (days) equals 10:

In [None]:
mask = (temp[:,2]==10)

Here we apply this mask and show only first 5 raws of the array:

In [None]:
temp[mask][:20,:]

You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:

In [None]:
temp[temp[:,2]==10][-5:,:]

You can combine conditions. In this case we select days from 10 to 12 (only first 10 elements are shown):

In [None]:
temp[(temp[:,2]>=10)&(temp[:,2]<=12)][0:10,:]

## Exercise

    Select only summer months
    Select only first half of the year


## Basic operations

Create example array from first 12 values of second column and perform some basic operations:

In [None]:
days = temp[0:12,2]
days

In [None]:
days+10

In [None]:
days*20

In [None]:
days*days

What's wrong with this figure?

In [None]:
plt.plot(temp[:100,3])

## Exercise

- Create new array that will contain only temperatures

- Convert temperature to deg C

- Convert all temperatures to deg F


## Basic statistics

Create *temp_values* that will contain only data values:

In [None]:
temp_values = temp[:,3]/10.
temp_values

Simple statistics:

In [None]:
temp_values.min()

In [None]:
temp_values.max()

In [None]:
temp_values.mean()

In [None]:
temp_values.std()

In [None]:
temp_values.sum()

You can also use *sum* function:

In [None]:
np.sum(temp_values)

One can make operations on the subsets:

## Exercise

    Calculate mean for first 1000 values of temperature


## Saving data

You can save your data as a text file

In [None]:
np.savetxt('temp_only_values.csv',temp[:, 3]/10., fmt='%.4f')

Head of resulting file:

In [None]:
!head temp_only_values.csv

You can also save it as binary:

In [None]:
f=open('temp_only_values.bin', 'w')
temp[:,3].tofile(f)
f.close()

## Exercises

* Select and plot only data for October
* Calculate monthly means for years from 1990 to 1999 and plot them