## What are numpy arrays?

**Numpy arrays** are a commonly used scientific structure in Python that store data as **grid or a matrix**. Like **Python lists**, numpy arrays are also composed of ordered values (elements) and also use indexing to organize and manipulate in the numpy arrays.

    array = numpy.array([0.7, 0.75, 1.85])
    
The example creates a numpy array with a simple grid structure along one dimension. 

## Key differences between Python lists and Numpy Arrays

While Python lists and numpy arrays have similarities in that they are both collections of values that use indexing to help you store and access data, there are a few key differences between these two data structures:

1. Unlike a Python list, all elements in a numpy arrays must be the **same data type** (i.e. all integers, decimals, text strings, etc).
2. Numpy arrays **support arithmetic and other mathematical operations** that run on each element of the array (e.g. element-by-element multiplication). Recall that lists cannot have these numeric calculations applied directly to them.
3. Unlike a Python list, a numpy array is not edited by adding/removing/replacing elements in the array. Instead, each time that the numpy array is manipulated in some way, it is actually **deleted and recreated** each time.
4. Numpy arrays can store data along **multiple dimensions** (e.g. rows, columns) that are relative to each other. This makes numpy arrays a very efficient data structure for large datasets.

## Dimensionality of Numpy Arrays

Numpy arrays can be:
* one-dimensional composed of values along one dimension (like a Python list).
* two-dimensional composed of rows of individual arrays with one or more columns.
* multi-dimensional composed of nested arrays with one or more dimensions.

For numpy arrays, brackets **[ ]** are used to assign and identify the dimensions of the numpy arrays.

In [1]:
import numpy as np

In [2]:
avg_monthly_precip = np.array([0.7, 0.75, 1.85])
print(avg_monthly_precip)

[0.7  0.75 1.85]


To create a two-dimensional array, you need to spcify two sets of brackets [ ]. The outer set defines the entire array structure and the inner one define the rows of the individual arrays.

In [3]:
precip_2002_2003 = np.array([
    [1.07, 0.44, 1.50], 
    [0.27, 1.13, 1.72]
])

print(precip_2002_2003)

[[1.07 0.44 1.5 ]
 [0.27 1.13 1.72]]


## Import text files into numpy arrays

Scientific data can come in a vairety of file formats and types. The most two commonly used file formats are:
* Plain text files (.txt)
* Comma-separated values files (.csv)

### txt files
Plain text files simply list out the values on separate lines without any symbols or delimiters to indicate separate values. Due to their simplicity, text files (.txt) can be very useful for collecting very large datasets that are all the same type of observation or data type.

### csv files
Unlike plain-text files which simply list out the values on separate lines without any symbols or delimiters, files containing comma-separated values (.csv) use commas (or some other delimiter like tab spaces or semi-colons) to indicate separate values.This means that .csv files can easily support multiple rows and columns of related data.

In [1]:
import os
import numpy as np
import earthpy as et

Begin by downloading a .txt file for average monthly precipitation (inches) for Boulder, CO collected by the U.S. National Oceanic and Atmospheric Administration (NOAA) from the following URL:

https://ndownloader.figshare.com/files/12565616

In [39]:
monthly_precip_url = 'https://ndownloader.figshare.com/files/12565616'
et.data.get_data(url = monthly_precip_url)

'C:\\Users\\user\\earth-analytics\\data\\earthpy-downloads\\avg-monthly-precip.txt'

The month names are stored in a different .txt file, which you can download from the following URL:

https://ndownloader.figshare.com/files/12565619

In [37]:
month_names_url = 'https://ndownloader.figshare.com/files/12565619'
et.data.get_data(url = month_names_url, replace = True)

Downloading from https://ndownloader.figshare.com/files/12565619


'C:\\Users\\user\\earth-analytics\\data\\earthpy-downloads\\months.txt'

Next, download a .csv file that contains the monthly precipitation (inches) for Boulder, CO for the years 2002 and 2013, collected by the U.S. National Oceanic and Atmospheric Administration (NOAA).

In [40]:
precip_2002_2013_url = 'https://ndownloader.figshare.com/files/12707792'
et.data.get_data(url = precip_2002_2013_url, replace = True)

Downloading from https://ndownloader.figshare.com/files/12707792


'C:\\Users\\user\\earth-analytics\\data\\earthpy-downloads\\monthly-precip-2002-2013.csv'

### Import Data From TXT File
To import data from a .txt file, you simply need to specify a value for the parameter called fname for the file name:

    np.loadtxt(fname)

Recall from the chapter on working with paths and directories that you can use **os.path.join()** to create paths that will work on any operating system.

In the example below, the fname is defined using os.path.join() with a relative path to the avg-monthly-precip.txt file because you previously set the working directory to earth-analytics.

In [5]:
# Set work directory to earth-analytics
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

In [6]:
fname = os.path.join('data/earthpy-downloads/avg-monthly-precip.txt')
avg_monthly_precip = np.loadtxt(fname)
print(avg_monthly_precip)   

[0.7  0.75 1.85 2.93 3.05 2.02 1.93 1.62 1.84 1.31 1.39 0.84]


In [7]:
type(avg_monthly_precip)

numpy.ndarray

### Import Data From CSV File
You can also use np.loadtxt(fname) to import data from .csv files that contain rows and columns of data.

You will need to specify both the fname parameter as well as the delimiter parameter to indicate the character that is being used to separate values in the file (e.g. commas, semi-colons):

    np.loadtxt(fname, delimiter = ",")

In [8]:
fname = os.path.join('data/earthpy-downloads/monthly-precip-2002-2013.csv')
precip_2002_2013 = np.loadtxt(fname, delimiter = ',')
print(precip_2002_2013)

[[ 1.07  0.44  1.5   0.2   3.2   1.18  0.09  1.44  1.52  2.44  0.78  0.02]
 [ 0.27  1.13  1.72  4.14  2.66  0.61  1.03  1.4  18.16  2.24  0.29  0.5 ]]


### Import Text String Data from Text Files Into Numpy Arrays
As needed, you can also import text files with text string values (such as month names) to numpy arrays using the genfromtxt() function from numpy.

You need to specify a parameter value for fname as well as a parameter value for the data type as dtype='str':

    np.genfromtxt(fname, dtype='str')

In [9]:
fname = os.path.join('data/earthpy-downloads/months.txt')
months = np.genfromtxt(fname, dtype = 'str')
print(months)

['Jan' 'Feb' 'Mar' 'Apr' 'May' 'June' 'July' 'Aug' 'Sept' 'Oct' 'Nov'
 'Dec']


In [10]:
type(months)

numpy.ndarray

## Run calculation and summary statistics on numpy arrays

### Check dimensions and shape of numpy arrays

In [11]:
# Check dimensions and shape of numpy arrays
avg_monthly_precip.ndim

1

In [12]:
# Check dimensions and shape of numpy arrays
precip_2002_2013.ndim

2

Another useful attribute of numpy arrays is **.shape** attribute, which provides specific information on how the data is stored within the numpy array. First argument: no. of rows; Second argument: no. of columns.

In [13]:
precip_2002_2013.shape

(2, 12)

In [14]:
avg_monthly_precip.shape

(12,)

### Run calculation on numpy arrays
A key benefit of numpy arrays is that they support mathematical operations on an element-by-element basis, meaning you can run one operation on the entire array with a single line of code.

In [15]:
print(avg_monthly_precip)

[0.7  0.75 1.85 2.93 3.05 2.02 1.93 1.62 1.84 1.31 1.39 0.84]


In [16]:
avg_monthly_precip *= 25.4
print(avg_monthly_precip)

[17.78  19.05  46.99  74.422 77.47  51.308 49.022 41.148 46.736 33.274
 35.306 21.336]


In [17]:
print(precip_2002_2013)

[[ 1.07  0.44  1.5   0.2   3.2   1.18  0.09  1.44  1.52  2.44  0.78  0.02]
 [ 0.27  1.13  1.72  4.14  2.66  0.61  1.03  1.4  18.16  2.24  0.29  0.5 ]]


In [18]:
precip_2002_2013 *= 25.4
print(precip_2002_2013)

[[ 27.178  11.176  38.1     5.08   81.28   29.972   2.286  36.576  38.608
   61.976  19.812   0.508]
 [  6.858  28.702  43.688 105.156  67.564  15.494  26.162  35.56  461.264
   56.896   7.366  12.7  ]]


In [42]:
precip_2002_2013 /= 25.4
avg_monthly_precip /= 25.4

### Run summary statisticson one-dimensional numpy arrays

In [24]:
# Calculate mean
mean_avg_precip = np.mean(avg_monthly_precip)
print('mean average monthly precipitation:', round(mean_avg_precip, 2))

mean average monthly precipitation: 42.82


In [23]:
# Calculate median
median_avg_precip = np.median(avg_monthly_precip)
print('median average monthly precipitation:', round(median_avg_precip, 2))

median average monthly precipitation: 43.94


In [26]:
# Calculate minimum and maximum
print('minimum average monthly precipitation:', round(np.min(avg_monthly_precip), 2))
print('maximum average monthly precipitation:', round(np.max(avg_monthly_precip), 2))

minimum average monthly precipitation: 17.78
maximum average monthly precipitation: 77.47


### Run summary statisticson two-dimensional numpy arrays
To calculate statistics on two-dimensional arrays, you can use the **axis** argument in the same functions to specify which axis you want to summarize:
* vertical axis downwards, summarzing across rows (axis = 0)
* horizonal axis, summarizing across columns (axis = 1)

In [27]:
print(precip_2002_2013)

[[ 27.178  11.176  38.1     5.08   81.28   29.972   2.286  36.576  38.608
   61.976  19.812   0.508]
 [  6.858  28.702  43.688 105.156  67.564  15.494  26.162  35.56  461.264
   56.896   7.366  12.7  ]]


In [32]:
# Calculate summary statistics across rows
print(np.max(precip_2002_2013, axis = 0))

[ 27.178  28.702  43.688 105.156  81.28   29.972  26.162  36.576 461.264
  61.976  19.812  12.7  ]


In [33]:
# Creaate new array of maximum values for each month
precip_2002_2013_monthly_max = np.max(precip_2002_2013, axis = 0)
type(precip_2002_2013_monthly_max)

numpy.ndarray

In [34]:
# Calculate summary statistics across columns
print(np.max(precip_2002_2013, axis = 1))

[ 81.28  461.264]


## Slice/ select data from numpy arrays

### Slice one-dimensional numpy arrays
By checking the shape of avg_monthly_precip using **.shape**, you know that it contains 12 elements along one dimension. You can slice a range of elements from one-dimensional arrays by specifying an index range:
    
    [starting_value, ending_value]

In [35]:
avg_monthly_precip.shape

(12,)

In [43]:
avg_monthly_precip[11]

0.84

In [44]:
avg_monthly_precip[-1]

0.84

In [45]:
# Slice a range from 3rd to 5th elements
print(avg_monthly_precip[2:5])

[1.85 2.93 3.05]


### Slice two-dimensional numpy arrays
Using **.shape**, you can confirm that precip_2002_2013 is a two dimensional array with 2 rows and 12 columns. To slice elements from two-dimensional arrays, you need to specify both a row and column index as

    [row_index, column_index]

You can also use a range for the row and column index to slice multiple elements using

    [start_row_index:end_row_index, start_column_index:end_column_index]


In [46]:
precip_2002_2013.shape

(2, 12)

In [47]:
# Select an element in 2nd row, 3rd column
precip_2002_2013[1, 2]

1.72

In [49]:
# Select element in the last row, last column
precip_2002_2013[1, 11]

0.5

In [48]:
# Select element in the last row, last column
precip_2002_2013[-1, -1]

0.5

In [50]:
# Slice first row, and first two columns
print(precip_2002_2013[0:1, 0:2])

[[1.07 0.44]]


In [51]:
# Slice first two rows and first column 
print(precip_2002_2013[0:2, 0:1])

[[1.07]
 [0.27]]


In [53]:
# Slice 2nd row, 2nd and 3rd columns
print(precip_2002_2013[1:2, 1:3])

[[1.13 1.72]]


In [52]:
# Slice first two rows and first two columns
print(precip_2002_2013[0:2, 0:2])

[[1.07 0.44]
 [0.27 1.13]]


### Use shortcuts to create new one-dimensional array from row or column slice

In [54]:
# Select 1st column
print(precip_2002_2013[:, 0])

[1.07 0.27]


In [55]:
# Select 1st row
print(precip_2002_2013[0, :])

[1.07 0.44 1.5  0.2  3.2  1.18 0.09 1.44 1.52 2.44 0.78 0.02]


In [57]:
# Select 1st row of data for 2002
precip_2002 = precip_2002_2013[0, :]
print(precip_2002.shape)
print(precip_2002)

(12,)
[1.07 0.44 1.5  0.2  3.2  1.18 0.09 1.44 1.52 2.44 0.78 0.02]


In [58]:
# Select 1st row of data for 2002
precip_2002 = precip_2002_2013[0]
print(precip_2002.shape)
print(precip_2002)

(12,)
[1.07 0.44 1.5  0.2  3.2  1.18 0.09 1.44 1.52 2.44 0.78 0.02]


In [59]:
precip_2013 = precip_2002_2013[1]
print(precip_2013.shape)
print(precip_2013)

(12,)
[ 0.27  1.13  1.72  4.14  2.66  0.61  1.03  1.4  18.16  2.24  0.29  0.5 ]
