## Python Data Structures and Sequences

###### Tuple

They are like read-only lists. We use them to store a list of items. But once we
define a tuple, we cannot add or remove items or change the existing items. If an object inside a tuple is mutable, such as a list, you can modify it in-place.

In [209]:
tup = 4,5,6
tup

(4, 5, 6)

In [210]:
type(tup)

tuple

In [211]:
nested_tup = (4,5,6), (7,8)
nested_tup

((4, 5, 6), (7, 8))

In [212]:
# We can convert any sequence or iterator to a tuple by invoking tuple:
tuple([4, 0, 2])
#type(tuple([4, 0, 2]))

(4, 0, 2)

In [213]:
tup = tuple("String")
tup
#type(tup)

('S', 't', 'r', 'i', 'n', 'g')

Elements can be accessed with square brackets [] as with most other sequence types. As in C, C++, Java, and many other languages, sequences are 0-indexed in Python:


In [214]:
tup[0]

'S'

In [215]:
tup = tuple(['foo', [1,2], True])
tup

('foo', [1, 2], True)

In [216]:
tup[1]

[1, 2]

While the objects stored in a tuple may be mutable themselves, once the tuple is cre‐
ated it’s not possible to modify which object is stored in each slot:

In [51]:
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in-place:

In [378]:
tup[1].append(3)


In [379]:
tup 

('foo', [1, 2, 3, 3], True)

In [58]:
# concatenate tuples using the + operator
(4, None, 'foo')+(6,0)+('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [59]:
# Multiplying a tuple by an integer
('foo', 'bar')*4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Unpacking tuples

In [92]:
tup = (4, 5, 6)
a, b, c = tup

In [94]:
b

5

In [97]:
# sequences with nested tuples can be unpacked
a, b, (c, d) = (4, 5, (6, 7))

In [98]:
d

7

In [102]:
# Also the swap can be done 
a, b = (1, 2)
a

1

In [103]:
b

2

In [104]:
b, a = (a, b)

In [105]:
b

1

In [106]:
a

2

In [217]:
# A common use of variable unpacking is iterating over sequences of tuples o
seq = [(1,2,3), (4,5,6), (7,8,9)]
for a, b, c in seq:
    print('a={}, b={}, c={}'.format(a, b, c)) # W3school is best for understanding .format()

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [109]:
user_data = ('Pappu', 31, 'Male', 'Cooking', 'Germany')
name, age, gender, *others = user_data
name, age, gender, *_ = user_data

In [110]:
print(others)
print(_)

['Cooking', 'Germany']
['Cooking', 'Germany']


In [89]:
def get_user_data():
    user_email = input("Enter your email: ")
    user_pass = input("Enter your password: ")
    
    return (user_email, user_pass)

In [90]:
print(get_user_data())

Enter your email: pappu@gmail.com
Enter your password: 123
('pappu@gmail.com', '123')


In [130]:
# Count Method
a = (1,2,2,2,3,4,2)
a.count(2)

4

# List

In [183]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')

In [184]:
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [185]:
b_list[1]='peekaboo'
b_list

['foo', 'peekaboo', 'baz']

Lists and tuples are semantically similar (though tuples cannot be modified) and can
be used interchangeably in many functions.
The list function is frequently used in data processing as a way to materialize an
iterator or generator expression

In [186]:
gen = range(10)
gen

range(0, 10)

In [187]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Adding and removing elements

In [188]:
b_list.append("drawf")
b_list

['foo', 'peekaboo', 'baz', 'drawf']

In [189]:
# Using insert we can insert an element at a specific location in the list
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'drawf']

The insertion index must be between 0 and the length of the list

The inverse operation to insert is pop, which removes and returns an element at a
particular index

In [190]:
b_list.pop(2)

'peekaboo'

In [191]:
b_list

['foo', 'red', 'baz', 'drawf']

Elements can be removed by value with remove, which locates the first such value and
removes it from the last:

In [192]:
b_list.append('foo')
b_list

['foo', 'red', 'baz', 'drawf', 'foo']

In [193]:
b_list.remove('foo')
b_list

['red', 'baz', 'drawf', 'foo']

In [195]:
# Check if a list contains a value using the in keyword
'drawf' in b_list

True

In [196]:
# The keyword not can be used to negate in
'drawf' not in b_list

False

Concatenating and combining lists


In [197]:
# Similar to tuples, adding two lists together with + concatenates them
[4, None, 'foo'] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [202]:
# If you have a list already defined, you can append multiple elements to it using the extend method
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Sorting

In [205]:
# You can sort a list in-place (without creating a new object) by calling its sort function
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [208]:
# pass a secondary sort key
b = ['saw', 'small', 'he', 'foxes', 'six']
b.sort(key=len)
b

['he', 'saw', 'six', 'small', 'foxes']

Binary search and maintaining a sorted list

In [239]:
# The built-in bisect module implements binary search and insertion into a sorted list.
import bisect
c = [1, 2, 2, 2, 3, 4, 7]

In [240]:
# bisect.bisect finds the location where an element should be inserted to keep it sor‐ted
bisect.bisect(c, 2)

4

In [241]:
bisect.bisect(c, 5)

6

In [242]:
#  bisect.insort actually inserts the element into that location
bisect.insort(c, 6)
c

[1, 2, 2, 2, 3, 4, 6, 7]

Slicing

We can select sections of most sequence types by using slice notation, which in its
basic form consists of start:stop passed to the indexing operator []:

In [252]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [253]:
seq[3:4]

[7]

In [254]:
# Slices can also be assigned to with a sequence:
seq[3:4] = [6,3]


In [255]:
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

In [256]:
seq[:5]

[7, 2, 3, 6, 3]

In [257]:
seq[3:]

[6, 3, 5, 6, 0, 1]

In [259]:
seq[-4:]

[5, 6, 0, 1]

In [260]:
seq[-6:-2]

[6, 3, 5, 6]

In [261]:
# A step can also be used after a second colon to, say, take every other element
seq[::2]

[7, 3, 3, 6, 1]

In [262]:
# Pass -1, which has the useful effect of reversing a list or tuple
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

## Dict

dict is likely the most important built-in Python data structure. A more common
name for it is hash map or associative array. It is a flexibly sized collection of key-value
pairs, where key and value are Python objects. One approach for creating one is to use
curly braces {} and colons to separate keys and values

In [381]:
empty_dict = {}
d1 = {'a':'some value', 'b':[1,2,3,4,5]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4, 5]}

We can access, insert, or set elements using the same syntax as for accessing elements
of a list or tuple

In [382]:
 d1[7] = 'an integer'

In [383]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4, 5], 7: 'an integer'}

In [384]:
d1['b']

[1, 2, 3, 4, 5]

In [301]:
# n check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value
'b' in d1

True

In [302]:
# delete values either using the del keyword or the pop method (which simul‐taneously returns the value and deletes the key)
d1[5] = 'some value'
d1

{'a': 'some value', 'b': [1, 2, 3, 4, 5], 7: 'an integer', 5: 'some value'}

In [303]:
d1['dummy'] = 'Another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4, 5],
 7: 'an integer',
 5: 'some value',
 'dummy': 'Another value'}

In [304]:
del d1[5]

In [305]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4, 5],
 7: 'an integer',
 'dummy': 'Another value'}

In [309]:
d1.pop('dummy')

'Another value'

In [310]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4, 5], 7: 'an integer'}

The keys and values method give you iterators of the dict’s keys and values, respec‐
tively. While the key-value pairs are not in any particular order, these functions out‐
put the keys and values in the same order

In [315]:
list(d1.keys())

['a', 'b', 7]

In [316]:
list(d1.values())

['some value', [1, 2, 3, 4, 5], 'an integer']

In [319]:
# merge one dict into another using the update method
d1.update({'b':'foo', 'c':12})
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

Valid dict key types

While the values of a dict can be any Python object, the keys generally have to be
immutable objects like scalar types (int, float, string) or tuples (all the objects in the
tuple need to be immutable, too). The technical term here is hashability. You can
check whether an object is hashable (can be used as a key in a dict) with the hash
function

In [329]:
hash('string')

4189693817001924655

In [333]:
hash((1, 2, (2,3)))

-9209053662355515447

In [335]:
hash((1, 2,[1,2]))

TypeError: unhashable type: 'list'

In [337]:
# To use a list as a key, one option is to convert it to a tuple
d = {}
d[tuple([1,2,3])] = 5
d

{(1, 2, 3): 5}

In [340]:
hash(tuple([1,2,3]))

529344067295497451

## Set

A set is an unordered collection of unique elements. You can think of them like dicts,
but keys only, no values. A set can be created in two ways: via the set function or via
a set literal with curly braces

In [345]:
# using set function
set([2,2,2,1,3,3])

{1, 2, 3}

In [347]:
{2,2,2,1,3,3}

{1, 2, 3}

In [348]:
# Sets support mathematical set operations like union, intersection, difference, and symmetric difference. 
a = {1,2,3,4,5}
b = {3,4,5,6,7,8}

In [354]:
# The union of these two sets is the set of distinct elements occurring in either set
# Union Operation can be done in 2 way
# Using union method
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [355]:
# Using Binary operator
a|b

{1, 2, 3, 4, 5, 6, 7, 8}

In [356]:
# The intersection contains the elements occurring in both sets.
# Using intersection method
a.intersection(b)

{3, 4, 5}

In [357]:
# using & operator
a & b

{3, 4, 5}

All of the logical set operations have in-place counterparts, which enable you to
replace the contents of the set on the left side of the operation with the result. For
very large sets, this may be more efficient

In [363]:
c = a.copy()
c |= b
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [366]:
# Like dicts, set elements generally must be immutable. To have list-like elements, you must convert it to a tuple:
my_data = [1,2,3,4]

In [370]:
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [None]:
# check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set:

In [373]:
{1,2,3}.issubset(c)

True

In [374]:
c.issuperset({1,2,3})

True

In [375]:
# Sets are equal if and only if their contents are equal
{1,2,3} == {3,2,1}

True

#### String

In [387]:
# String Object Methods
# a comma-separated string can be broken into pieces with split
val = 'a,b, guido'
val.split(',')

['a', 'b', ' guido']

In [389]:
# split is often combined with strip to trim whitespace (including line breaks)
pieces = [x.strip() for x in val.split(',')]
pieces

['a', 'b', 'guido']

In [390]:
# These substrings could be concatenated together with a two-colon delimiter using addition
first, second, third = pieces
first+"::"+second+"::"+third

'a::b::guido'

In [391]:
# A faster and more Pythonic way is to pass alist or tuple to the join method on the string '::'
"::".join(pieces)

'a::b::guido'

In [392]:
# Using Python’s in keyword is the best way to detect a substring, though index and find can also be used
'guido' in val

True

In [394]:
val.index(',')

1

In [399]:
val.find(":")

-1

In [400]:
# e the difference between find and index is that index raises an exception if the string isn’t found (versus returning –1)
val.index(":")

ValueError: substring not found

In [402]:
# count returns the number of occurrences of a particular substring
val.count(',')

2

In [403]:
# replace will substitute occurrences of one pattern for another
val.replace(',', "::")

'a::b:: guido'

In [405]:
# It is commonly used to delete patterns, too, by passing an empty string
val.replace(',', '')

'ab guido'

#### List, Set, and Dict Comprehensions


They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in one concise expression. 
[expr for val in collection if condition]

In [406]:
# This is equivalent to the following for loop:
#result = []
#for val in collection:
 #if condition:
 #result.append(expr)

The filter condition can be omitted, leaving only the expression. For example, given a
list of strings, we could filter out strings with length 2 or less and also convert them to
uppercase like this

In [409]:
strings = ['a','as','bat','car','dove','python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions are a natural extension, producing sets and dicts in an
idiomatically similar way instead of lists. A dict comprehension looks like this:
dict_comp = {key-expr : value-expr for value in collection
 if condition}


A set comprehension looks like the equivalent list comprehension except with curly
braces instead of square brackets:
set_comp = {expr for value in collection if condition}


In [414]:
# we wanted a set containing just the lengths of the strings con‐tained in the collection
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [416]:
# We could also express this more functionally using the map function
set(map(len, strings))

{1, 2, 3, 4, 6}

Nested list comprehensions

In [424]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

In [429]:
# suppose we wanted to get a single list containing all names with two or more e’s in them.
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e')>=2]
    names_of_interest.extend(enough_es)
print(names_of_interest)

['Steven']


In [431]:
# We can actually wrap this whole operation up in a single nested list comprehension,
results = [name for names in all_data for name in names if name.count('e')>=2]
results

['Steven']

The for
parts of the list comprehension are arranged according to the order of nesting, and
any filter condition is put at the end as before.

another example where we
“flatten” a list of tuples of integers into a simple list of integers

In [432]:
some_tuples = [(1,2,3),(4,5,6),(7,8,9)]
[x for tup in some_tuples for x in tup]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [439]:
flattened = []
for tup in some_tuples:
    blah = [x for x in tup]
    flattened.extend(blah)
print(result)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


In [440]:
flattened = []
for tup in some_tuples:
    for x in tup:
        flattened.append(blah)
print(result)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


It’s important to distinguish the syntax just shown
from a list comprehension inside a list comprehension, which is also perfectly valid

In [441]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Numpy 

In [553]:
import numpy as np

The performance difference, consider a NumPy array of one million integers, and the equivalent Python list

In [554]:
my_arr = np.arange(1000000)
my_list = list(range(1000000))

In [564]:
%timeit my_arr * 2
%timeit my_list * 2 

2.28 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
34.9 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [571]:
%time for _ in range(100): my_arr2 = my_arr * 2

Wall time: 242 ms


In [573]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 1.19 s


In [579]:
# Generate some random data
data = np.random.rand(2,3)
data

array([[0.84104116, 0.69263374, 0.44590044],
       [0.36234046, 0.51609462, 0.45733218]])

In [580]:
# Some mathmatical operation with data
data*10

array([[8.41041156, 6.92633744, 4.45900443],
       [3.62340463, 5.16094623, 4.57332177]])

In [581]:
data+data

array([[1.68208231, 1.38526749, 0.89180089],
       [0.72468093, 1.03218925, 0.91466435]])

In [582]:
data.shape

(2, 3)

In [583]:
data.dtype

dtype('float64')

###### Creating ndarrays

In [597]:
data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [602]:
# Nested sequences, like a list of equal-length lists, will be converted into a multidimen‐sional array
data2 = [[1,2,3,4], [5,6,7,8]]  
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [604]:
arr2.dtype, arr1.dtype

(dtype('int32'), dtype('float64'))

In [603]:
arr2.ndim

2

In [605]:
arr2.shape

(2, 4)

In [607]:
zeros = np.zeros(10)
zeros

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [609]:
zeros1 = np.zeros((3,6))
zeros1

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [610]:
# Respectively, with a given length or shape. empty creates an array without initializing its values to any par‐ticular value
np.empty((2,3,2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

It’s not safe to assume that np.empty will return an array of all
zeros. In some cases, it may return uninitialized “garbage” values.

In [617]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [619]:
np.ones((3,6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

In [611]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [616]:
np.arange(1,15,2)

array([ 1,  3,  5,  7,  9, 11, 13])

Data Types for ndarrays

In [628]:
arr1 = np.array([1,2,3], dtype=np.float64)
arr1

array([1., 2., 3.])

In [629]:
arr1.dtype

dtype('float64')

In [630]:
arr2 = np.array([1,2,3], dtype=np.int32)
arr2

array([1, 2, 3])

In [631]:
arr2.dtype

dtype('int32')

In [632]:
# We can explicitly convert or cast an array from one dtype to another using ndarray’s astype method
arr = np.array([1,2,3,4,5])
arr.dtype

dtype('int32')

In [637]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [636]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [638]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

In [641]:
# If we have an array of strings representing numbers, we can use astype to convert them to numeric form:
numeric_string = np.array(['1.25', '-9.6', '42'])
numeric_string.astype(np.float64)

array([ 1.25, -9.6 , 42.  ])

In [643]:
# We can also use another array’s dtype attribute
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype = np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization.
Vectorization (as the term is normally used) refers to SIMD (single instruction, multiple data) operation.

In [645]:
# Any arithmetic operations between equal-size arrays applies the operation element-wise
arr = np.array([[1,2,3], [4,5,6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [646]:
arr*arr

array([[ 1,  4,  9],
       [16, 25, 36]])

In [647]:
arr-arr

array([[0, 0, 0],
       [0, 0, 0]])

In [648]:
# Arithmetic operations with scalars propagate the scalar argument to each element in the array
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [649]:
arr**.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [650]:
# Comparisons between arrays of the same size yield boolean arrays
arr2 = np.array([[0, 4, 1], [7, 2, 12]])
arr2>arr1

array([[False,  True, False],
       [ True, False,  True]])

Basic Indexing and Slicing

In [653]:
# One-dimensional arrays are simple; on the surface they act similarly to Python lists
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [657]:
arr[5] 

5

In [658]:
arr[5:8]

array([5, 6, 7])

In [660]:
arr[5:8] = 21
arr

array([ 0,  1,  2,  3,  4, 21, 21, 21,  8,  9])

The data is not copied, and any modifications to the view will be
reflected in the source array

In [661]:
arr_slice = arr[5:8]
arr_slice

array([21, 21, 21])

In [664]:
# Now, when change values in arr_slice, the mutations are reflected in the original array arr
arr_slice[1] = 4444
arr

array([   0,    1,    2,    3,    4,   21, 4444,   21,    8,    9])

In [665]:
# The “bare” slice [:] will assign to all values in an array
arr_slice[:] = 66
arr

array([ 0,  1,  2,  3,  4, 66, 66, 66,  8,  9])

If we want a copy of a slice of an ndarray instead of a view, we will need to explicitly copy the array—for example,
arr[5:8].copy()

In [666]:
# In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays
arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr2d[2]

array([7, 8, 9])

Thus, individual elements can be accessed recursively. But that is a bit too much
work, so we can pass a comma-separated list of indices to select individual elements.
So these are equivalent

In [668]:
arr2d[0][2]

3

In [669]:
arr2d[0,2]

3

multidimensional arrays


In [685]:
arr3d = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [686]:
arr3d.shape

(2, 2, 3)

In [687]:
# arr3d[0] is a 2 × 3 array
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [688]:
# Both scalar values and arrays can be assigned to arr3d[0]
old_values = arr3d[0].copy()

In [690]:
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [692]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [697]:
# arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array
arr3d[1,0]

array([7, 8, 9])

In [698]:
arr3d[1][0]

array([7, 8, 9])

Indexing with slices


In [699]:
#Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax
arr

array([ 0,  1,  2,  3,  4, 66, 66, 66,  8,  9])

In [700]:
arr[1:6]

array([ 1,  2,  3,  4, 66])

In [701]:
#Consider the two-dimensional array from before, arr2d. Slicing this array is a bit different
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [729]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# We can pass multiple slices just like you can pass multiple indexes

In [741]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [753]:
# By mixing integer indexes and slices, you get lower dimensional slices
# I can select the second row but only the first two columns like so
arr2d[1, :2]

array([4, 5])

In [755]:
, I can select the third column but only the first two rows like so
arr2d[:2, 2:]

array([[3],
       [6]])

In [775]:
# A colon by itself means to take the entire axis, so you can slice only higher dimensional axes by doing
arr2d[:, :1]

array([[1],
       [4],
       [7]])

In [778]:
# assigning to a slice expression assigns to the whole selection
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

#### Transposing Arrays and Swapping Axes


Transposing is a special form of reshaping that similarly returns a view on the under‐
lying data without copying anything. Arrays have the transpose method and also the
special T attribute

In [786]:
arr = np.arange(15).reshape((3,5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [788]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When doing matrix computations, we may do this very often—for example, when
computing the inner matrix product using np.dot

In [790]:
arr = np.random.randn(6, 3)
arr

array([[ 0.2513882 , -0.62837604, -1.14365496],
       [-1.40883585, -2.56080106,  0.88833589],
       [ 0.28292531,  1.26919245,  0.2132615 ],
       [ 0.2703769 ,  1.20618323, -0.66370371],
       [ 0.17659617,  0.2450555 ,  0.62723447],
       [-0.35572495,  1.2717041 ,  0.17594318]])

In [791]:
arr.shape

(6, 3)

In [792]:
arr.T.shape

(3, 6)

In [795]:
np.dot(arr.T, arr)

array([[ 2.35889134,  3.72589177, -1.60995406],
       [ 3.72589177, 11.69556949, -1.70862957],
       [-1.60995406, -1.70862957,  3.00744948]])

Universal Functions: Fast Element-Wise Array
Functions


In [823]:
# A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays.
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [824]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [825]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [826]:
x = np.random.randn(8)
y = np.random.randn(8)

In [831]:
x

array([-0.70746362, -0.63409305, -0.13368054, -1.3610901 , -1.30022534,
        0.01004113, -1.05368742, -0.91878414])

In [838]:
y

array([ 0.00852071,  1.01270555, -1.87120247,  0.85779193, -0.93330128,
        0.26973425, -0.91206319, -0.87171282])

In [841]:
np.maximum(x,y)

array([ 0.00852071,  1.01270555, -0.13368054,  0.85779193, -0.93330128,
        0.26973425, -0.91206319, -0.87171282])

Mathematical and Statistical Methods


In [845]:
# Here I generate some normally distributed random data and compute some aggregate statistics
arr = np.random.randn(5,3)
arr

array([[-1.22953124, -2.69690397, -2.28717305],
       [ 0.28321478,  0.80487385, -0.63143446],
       [-0.26565433,  0.76640794, -0.34941098],
       [ 0.69218827, -0.91758427, -0.69132444],
       [-1.01201689, -0.2156499 , -0.01935525]])

In [848]:
arr.mean()

-0.5179569286801139

In [849]:
np.mean(arr)

-0.5179569286801139

In [850]:
arr.sum()

-7.769353930201709

In [851]:
# Functions like mean and sum take an optional axis argument that computes the statis‐tic over the given axis
arr.mean(axis=1)

array([-2.07120275,  0.15221806,  0.05044754, -0.30557348, -0.41567401])

In [852]:
arr.sum(axis=0)

array([-1.5317994 , -2.25885635, -3.97869818])

Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0)
means “compute sum down the rows.”


In [883]:
arr = np.array([0,1,2,3,4,5,6,7])
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

Sorting

In [885]:
arr = np.random.randn(6)
arr

array([-0.46640699,  1.35298762,  1.73627223,  0.15845135, -0.08080524,
       -0.36337495])

In [888]:
arr.sort()
arr

array([-0.46640699, -0.36337495, -0.08080524,  0.15845135,  1.35298762,
        1.73627223])

We can sort each one-dimensional section of values in a multidimensional array inplace along an axis by passing the axis number to sort:

In [889]:
arr = np.random.randn(5,3)
arr

array([[-0.19121804,  1.61115322, -0.5416413 ],
       [-1.17096857,  0.42248754, -0.73910971],
       [ 1.33971843,  0.67446905,  0.63638934],
       [ 1.29883749,  0.66926649,  1.40051026],
       [ 1.24630695,  0.53883581, -0.64360128]])

In [891]:
arr.sort(1)
arr

array([[-0.5416413 , -0.19121804,  1.61115322],
       [-1.17096857, -0.73910971,  0.42248754],
       [ 0.63638934,  0.67446905,  1.33971843],
       [ 0.66926649,  1.29883749,  1.40051026],
       [-0.64360128,  0.53883581,  1.24630695]])

In [892]:
# A quick-and-dirty way to compute the quantiles of an array is to sort it and select the value at a particular rank
large_arr = np.random.randn(1000)
large_arr.sort()

In [896]:
large_arr[int(0.05 * len(large_arr))] # 5% quantile

-1.5759345937598173

In [897]:
len(large_arr)

1000

In [900]:
1000*0.05

50.0

Unique and Other Set Logic

In [901]:
names = np.array(['bob', 'joe', 'will', 'bob', 'will', 'joe', 'joe'])
np.unique(names)

array(['bob', 'joe', 'will'], dtype='<U4')

In [902]:
ints = np.array([3,3,3,2,2,1,1,4,4])
np.unique(ints)

array([1, 2, 3, 4])

In [903]:
# Contrast np.unique with the pure Python alternative
sorted(set(names))

['bob', 'joe', 'will']

In [904]:
values = np.array([6,0,0,3,2,5,6])
np.in1d(values, [2,3,6])

array([ True, False, False,  True,  True, False,  True])

####  Linear Algebra

In [906]:
x = np.array([[1,2,3], [4,5,6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])

In [907]:
x

array([[1, 2, 3],
       [4, 5, 6]])

In [908]:
y

array([[ 6, 23],
       [-1,  7],
       [ 8,  9]])

In [909]:
x.dot(y)

array([[ 28,  64],
       [ 67, 181]])

In [910]:
# x.dot(y) is equivalent to np.dot(x, y):
np.dot(x, y)

array([[ 28,  64],
       [ 67, 181]])

In [911]:
# A matrix product between a two-dimensional array and a suitably sized onedimensional array results in a one-dimensional array

array([1., 1., 1.])

In [913]:
np.ones(3)

array([1., 1., 1.])

In [914]:
np.dot(x, np.ones(3))

array([ 6., 15.])