# CHAPTER 4
# NumPy Basics: Arrays and Vectorized Computation

In [1]:
# Here are some of the things you’ll find in NumPy:
# • ndarray, an efficient multidimensional array providing fast array-oriented arithmetic
# operations and flexible broadcasting capabilities.
# • Mathematical functions for fast operations on entire arrays of data without having
# to write loops.
# • Tools for reading/writing array data to disk and working with memory-mapped
# files.
# • Linear algebra, random number generation, and Fourier transform capabilities.

In [None]:
# For most data analysis applications, the main areas of functionality I’ll focus on are:
# • Fast vectorized array operations for data munging and cleaning, subsetting and
# filtering, transformation, and any other kinds of computations
# • Common array algorithms like sorting, unique, and set operations
# • Efficient descriptive statistics and aggregating/summarizing data
# • Data alignment and relational data manipulations for merging and joining
# together heterogeneous datasets
# • Expressing conditional logic as array expressions instead of loops with if-elifelse
# branches
# • Group-wise data manipulations (aggregation, transformation, function application)

In [2]:
# let's compare numpy array and Python list and check the performance of each:

import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))

# Now let's multiply each sequence by 2 and see how much time it takes:

%time for _ in range(10): my_arr2 = my_arr * 2
    
%time for _ in range(10): my_list2 = [x*2 for x in my_list]

Wall time: 15.9 ms
Wall time: 718 ms


In [None]:
# So !!!! NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
# pure Python counterparts and use significantly less memory.

## 4.1 The NumPy ndarray: A Multidimensional Array Object

In [None]:
# One of the key features of NumPy is its N-dimensional array object, or ndarray,
# which is a fast, flexible container for large datasets in Python. Arrays enable you to
# perform mathematical operations on whole blocks of data using similar syntax to the
# equivalent operations between scalar elements.

In [3]:
# Generate some random data

data = np.random.randn(2,3)
data

array([[ 3.19315199,  0.71329227, -1.00180566],
       [ 0.44663474, -0.28424277,  0.64670784]])

In [9]:
# Then we write math operations with data:

print(data*10, """

""" , data + data)

[[ 31.93151988   7.13292265 -10.01805663]
 [  4.46634741  -2.84242769   6.46707835]] 

 [[ 6.38630398  1.42658453 -2.00361133]
 [ 0.89326948 -0.56848554  1.29341567]]


In [10]:
# Every array has a shape and dtype:

print(data.shape, ",", data.dtype)

(2, 3) , float64


### Creating ndarrays

In [None]:
# The easiest way to create an array is to use the array function. This accepts any
# sequence-like object (including other arrays) and produces a new NumPy array containing
# the passed data. For example, a list is a good candidate for conversion:

In [19]:
data1 = [6,7,5,8,9,10]
arr1 = np.array(data1)

data2 = [[6,3,2,1],[9,6,5,4]]
arr2 = np.array(data2)

print("These are the shapes of 2 arrays")
print(arr1.shape, ",", arr2.shape)
print("""

""")
print("These are the arrays")
print(arr1, """

""", arr2)

These are the shapes of 2 arrays
(6,) , (2, 4)



These are the arrays
[ 6  7  5  8  9 10] 

 [[6 3 2 1]
 [9 6 5 4]]


In [20]:
# Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape
# inferred from the data.

arr2.ndim

2