## What is the difference between regular Python and Numpy Python?

### Data Types
Python has specific data types, but not as many as traditional C or FORTRAN

In [1]:
print(type(1))  # Python integer - this is a 64 bytes integer
print(type(1.))  # Python float - this is a 64 bytes float
print(type('Hello World'))  # Python string which is different than a character array

<class 'int'>
<class 'float'>
<class 'str'>


Numpy can be thought of as a space within the Python language whith more data types like C or FORTRAN and a richer set of mathematical/computational enivironment built on top of the base Python environment. All the regular Python functions still exsits just more options are available.

In [2]:
import numpy as np  # Convention is to import and rename. Best to stick with convention

print('type(np.array(1)):', type(np.array(1)))
print()

print('np.int8:', np.array(1, dtype=np.int8).dtype)    # Numpy integer of size 8 bytes
print('np.int16:', np.array(1, dtype=np.int16).dtype)  # Numpy integer of size 16 bytes
print('np.int32:', np.array(1, dtype=np.int32).dtype)  # Numpy integer of size 32 bytes
print('np.int64:', np.array(1, dtype=np.int64).dtype)  # Numpy integer of size 64 bytes
print('np.int:', np.array(1, dtype=int).dtype)         # Default Numpy integer of size 64 bytes
print()

print('np.float16:', np.array(1, dtype=np.float16).dtype)  # Numpy float of size 16 bytes
print('np.float32:', np.array(1, dtype=np.float32).dtype)   # Numpy float of size 32 bytes
print('np.float64:', np.array(1, dtype=np.float64).dtype)   # Numpy float of size 64 bytes
print('np.float:', np.array(1, dtype=float).dtype)          # Numpy float of size 64 bytes
print()

print('np.bool:', np.array(1, dtype=np.bool_).dtype)    # Numpy boolean of size 8 bytes
print('np.int:', np.array(1, dtype=np.int_).dtype)      # Numpy integer of size 64 bytes
print('np.foat:', np.array(1, dtype=np.float_).dtype)   # Numpy float of size 64 bytes
print('np.uint:', np.array(1, dtype=np.uint).dtype)     # Numpy unsigned integer of size 64 bytes
print('np.complex:', np.array(1, dtype=complex).dtype)  # Numpy complex data object of size 128 bytes
print('Hello World:', np.array('Hello World').dtype)    # <U11 means 11 charater Unicode String.

type(np.array(1)): <class 'numpy.ndarray'>

np.int8: int8
np.int16: int16
np.int32: int32
np.int64: int64
np.int: int64

np.float16: float16
np.float32: float32
np.float64: float64
np.float: float64

np.bool: bool
np.int: int64
np.foat: float64
np.uint: uint64
np.complex: complex128
Hello World: <U11


Numpy is based on arrays of data, with all values having the same data type in the array. This greatly increases computational performance. Can convert between basic Python and Numpy data space easily and quickly.

In [3]:
a = [1, 2, 3]  # Create a Python list
b = np.array([1, 2, 3])  # Create a Nump 1-D array
print('type(a):', type(a))  # Get type of a
print('type(b):', type(b))  # Get type of b
print('type(b[0]):', type(b[0]))  # Get type of index 0 of b
print('b.dtype:', b.dtype)

type(a): <class 'list'>
type(b): <class 'numpy.ndarray'>
type(b[0]): <class 'numpy.int64'>
b.dtype: int64


Using Numpy methods on Python object will cause an error. Also, notice how the printout of the two are similar but slightly different. Python list has commas, the Numpy array does not.

In [4]:
#print('a.dtype:', a.dtype)  # a.dtype will not work. dtype is numpy only.
print('a:', a)  
print('b:', b)

a: [1, 2, 3]
b: [1 2 3]


### Basic Python vs Numpy performance
Basic Python can perform computations relativly fast.

In [10]:
%%timeit
num = 1_000_000
a = list(range(0, num))
for ii in a:
    a[ii] = a[ii] + 1
del a

76.9 ms ± 692 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


The computation can be faster for some applications using list comprehensions. This is performing the same computation just with a diffrent syntax performing the computations in a more optimized way.

In [11]:
%%timeit
num = 1_000_000
a = list(range(0, num))
a = [a[ii] + 1 for ii in a]

74.6 ms ± 781 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


But using Numpy is significantly faster for computations because the code is hightly optimized and uses pre-compiled C libraries under the hood.

In [12]:
%%timeit
num = 1_000_000
a = np.arange(num, dtype=np.int16) + 1

86.7 µs ± 170 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


### Switching between Basic Python space and Numpy space

Create some data using Python lists

In [13]:
num = 10000
a = list(range(0, num))  # Defaults to integer values
print('type(a):', type(a))
print('len(a):', len(a))
print()

type(a): <class 'list'>
len(a): 10000



Convert the python list into numpy array

In [14]:
b = np.array(a)
print('type(b):', type(b))
print('b.size:', b.size)
print('b.shape:', b.shape)
print()

type(b): <class 'numpy.ndarray'>
b.size: 10000
b.shape: (10000,)



Convert the numpy array back into python list

In [15]:
b = np.array(a)
print('type(b):', type(b))
print('b.size:', b.size)
print('b.shape:', b.shape)
print()

type(b): <class 'numpy.ndarray'>
b.size: 10000
b.shape: (10000,)



### Making Numpy Arrays
Numpy also comes with a large number of methods to create data arrays so you don't have to

In [17]:
a = np.array(range(10))   # Converts the itterator range function to numpy array
b = np.arange(10)         # Creates the numpy array directly and faster
c = np.arange(1, 10, 2)   # Creates array counting by two
d = np.arange(10, 1, -1)  # Creates array decending by one
e = np.flip(a)            # Reverses the array a from increasing to decreasing
print('a:', a)
print('b:', b)
print('c:', c)
print('d:', d)
print('e:', e)

a: [0 1 2 3 4 5 6 7 8 9]
b: [0 1 2 3 4 5 6 7 8 9]
c: [1 3 5 7 9]
d: [10  9  8  7  6  5  4  3  2]
e: [9 8 7 6 5 4 3 2 1 0]


### Indexing, Slicing, Broadcasging
How to get the specific data from the Numpy Array. Numpy uses the same slicing as Python lists. Start number to but not including end number.

In [18]:
print('a:', a)
print('a[0:5]:', a[0:5])  # selects upto but not including index 5
print('a[3:]:', a[3:])  # selects everthing from 3 to end of array
print('a[:5]:', a[:5])  # selects to upto but not including index 5
print('a[3:5]:', a[3:5])  # selects from 3 upto but not including index 5
print('a[0:-1]:', a[:-1])  # selects upto but not including index 9
print('a[0:100]:', a[0:100])  # index is past end of array?!?

a: [0 1 2 3 4 5 6 7 8 9]
a[0:5]: [0 1 2 3 4]
a[3:]: [3 4 5 6 7 8 9]
a[:5]: [0 1 2 3 4]
a[3:5]: [3 4]
a[0:-1]: [0 1 2 3 4 5 6 7 8]
a[0:100]: [0 1 2 3 4 5 6 7 8 9]


In [21]:
c = np.arange(10)  # Create a 1-D array
c = c.reshape((2, 5))  # Change to a 2-D array
print('c:\n', c)
print('c.shape:', c.shape)
print('c.size:', c.size)
print()

a = np.zeros((2, 2))    # Create an array of all zeros. Notice defaults to type float
print('a:\n', a)

b = np.ones((2, 2), dtype=int)    # Create an array of all ones
print('\nb:\n', b)

c = np.full((2, 2), 7, dtype=np.int16)   # Create a constant array
print('\nc:\n', c)

d = np.eye(3)           # Create a 3x3 identity matrix
print('\nd:\n', d)

e = np.random.random((2, 4))  # Create an array filled with random values
print('\ne:', e)

c:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
c.shape: (2, 5)
c.size: 10

a:
 [[0. 0.]
 [0. 0.]]

b:
 [[1 1]
 [1 1]]

c:
 [[7 7]
 [7 7]]

d:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

e: [[0.92044085 0.83304554 0.38828201 0.75084949]
 [0.58390636 0.18004267 0.64482428 0.91737542]]


One of the most important features of Numpy is Broadcasting, where a single operation is performed on all values of the Numpy array without the need for a loop. This creates more simple and readable code, and is significantly faster. General rule is if it's possble to remove a loop by Broadcasting, do it!

Create an array of all zeros

In [35]:
a = np.zeros(20, dtype=int)
a

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Add a value of 1 to every value in the array. Also notice that the initial array was of type integer. But we are adding a float so all vales are first upconverted to type float and then a value of 1.0 is added to every value in the array.

In [36]:
a = a + 1.0  # Note how it upconverted from int to float
a

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

Here we change the data type of the entire array from float to integer.

In [37]:
a = a.astype(int)
a

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Here we add a value to a subset of the array by adding 10 to values matching index from 3 to 7. The rest of the values are unchanged.

In [38]:
a[3:8] = a[3:8] + 10
a

array([ 1,  1,  1, 11, 11, 11, 11, 11,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1])

Here we use shorthand notiation to add a constant of 1000 to every value in the array.

In [40]:
a += 1000
a

array([2001, 2001, 2001, 2011, 2011, 2011, 2011, 2011, 2001, 2001, 2001,
       2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001])

### The IEEE Not A Number value
There is a special Numpy value called NaN which is used in calculations to represent a value that is not a number. Think of it as missing data, or a bad value that should not be propigated through the data. One of the most important things to remember about NaN is that it taints anything it touches!

In [42]:
print(np.nan)
print(type(np.nan))   # Python type says is of type float
print(10.0 * np.nan)  # Anything that uses NaN becomes a NaN
print(11 + np.nan, 12 - np.nan, 13 * np.nan, 14 / np.nan)

nan
<class 'float'>
nan
nan nan nan nan


Because NaN is special, it acts funny. Looking for NaN requires some specific logic.

In [43]:
1 == 1

True

In [44]:
np.zeros(1, dtype=int) == np.zeros(1, dtype=int)

array([ True])

In [45]:
np.nan == np.nan  # What is going on here?

False

Will need to use the Numpy methods of searching the arrays if you want to do comparisons for find where NaNs are located

In [46]:
np.isnan(np.nan)

True

To use NaNs properly requries a litte bit of extra thought, but is greatly worth it!

In [48]:
a = np.arange(10, dtype=float)
print('a:', a)
print('a.min():', a.min())      # This is Python min method, not numpy. Maybe slower?
print('np.min(a):', np.min(a))  # This is numpy min function

a: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
a.min(): 0.0
np.min(a): 0.0


Now we will assign a value in the Numpy array to NaN

In [50]:
a[0] = np.nan
a

array([nan,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

Calculate the mean. Notice how it returns NaN. The NaNs are tainting all the operations.

In [51]:
np.mean(a)

nan

We need to use a different method that understands how to correctly handle NaNs

In [52]:
np.nanmean(a)

5.0

### Workign with time in Numpy
If you deal with time series data you will need to work with time data types. Oh that's right, most atmospheric data is time series data...

Python has a library called datetime that is great for working with dates and times. It is timezone unaware and works on one value at a time. This is a native data type within Python and you will see it all over the place. Here are some quick examples.

Get the datetime now.

In [54]:
import datetime

datetime.datetime.now()

datetime.datetime(2022, 4, 20, 13, 37, 41, 54542)

Or get the datetime in UTC timezone.

In [55]:
datetime.datetime.utcnow()

datetime.datetime(2022, 4, 20, 19, 37, 41, 904390)

Or from a timestamp of number of seconds from Epoch, Seconds from 1970-01-01 00:00:00: UTC

In [56]:
datetime.datetime.fromtimestamp(1326244364)

datetime.datetime(2012, 1, 10, 18, 12, 44)

But when working with a lot of time samples we need to use the Numpy time data type. It can take in a string and convert to a Numpy datetime64 value. Notice when it prints out, it will print out in ISO format.

To just get the date right now

In [57]:
np.datetime64('today')

numpy.datetime64('2022-04-20')

To get the date and time right now. Notice the time is in UTC. Most likely this is what you want, if not understand how to get what you want.

In [63]:
np.datetime64('now')

numpy.datetime64('2022-04-20T19:40:58')

In [64]:
np.datetime64('2005-02-25 03:30:55')

numpy.datetime64('2005-02-25T03:30:55')

The default is store the precision at the level input. But we can update the precison by setting precision when reading the value or after the value is created to the desired precision.

In [65]:
np.datetime64('2012-03', 's')

numpy.datetime64('2012-03-01T00:00:00')

In [66]:
np.datetime64('2012-03').astype('datetime64[s]')

numpy.datetime64('2012-03-01T00:00:00')

OK so what you can set precisoin. Why do I care? Because you can use the precision to indicate what step size to use. Notice the following range did not provide a starting or ending day. This helps with not needing to know length of months.

In [67]:
np.arange('2005-02', '2005-03', dtype='datetime64[D]')

array(['2005-02-01', '2005-02-02', '2005-02-03', '2005-02-04',
       '2005-02-05', '2005-02-06', '2005-02-07', '2005-02-08',
       '2005-02-09', '2005-02-10', '2005-02-11', '2005-02-12',
       '2005-02-13', '2005-02-14', '2005-02-15', '2005-02-16',
       '2005-02-17', '2005-02-18', '2005-02-19', '2005-02-20',
       '2005-02-21', '2005-02-22', '2005-02-23', '2005-02-24',
       '2005-02-25', '2005-02-26', '2005-02-27', '2005-02-28'],
      dtype='datetime64[D]')

There is also a a timedelta64 data type that is the result of differencing times.

In [68]:
np.datetime64('2009-01-01') - np.datetime64('2008-01-01')

numpy.timedelta64(366,'D')

Time Deltas are important for adding or subtracting times.

In [69]:
np.datetime64('2011-06-15T00:00') + np.timedelta64(12, 'h')

numpy.datetime64('2011-06-15T12:00')

### Converting between Python datetime and Numpy datetime64
At some point you will need to convert a time value from one space to another. Don't memorize this, just remember it exists and where you can find the code.

In [70]:
dt = datetime.datetime.utcnow()  # Get current date and time with python datetime
# Convert from Python datetime to Numpy datetime64
dt64 = np.datetime64(dt)
print(dt64)

2022-04-20T19:41:38.212691


It is a bit more complicated to convert from Numpy datetime64 to Python datetime

In [71]:
# Set precision to datetime64 seconds and convert to Numpy integer.
# This will give number of seconds since epoch, or timestamp.
ts = dt64.astype('datetime64[s]').astype(int)

# Then use that integer number of seconds into from time stamp method.
datetime.datetime.utcfromtimestamp(ts)

datetime.datetime(2022, 4, 20, 19, 41, 38)