A brief digression into dicts:
Recall the python type hierarchy: scalers (int, float, boolean, complex..) and sequences (we've seen two - list and tuple). dict is another sequence.

Technically, it is an 'associative array' - the easiest way to think of dictionaries is as a collection of 'key-value' pairs, and indexing is done using the keys, rather than a number


In [3]:
import numpy as np
import pandas as pd

In [4]:
print(pd.__version__)

0.22.0


In [20]:
d = {0:1, np.pi/2:0}
print(d)
print(type(d))

{0: 1, 1.5707963267948966: 0}
<class 'dict'>


In [22]:
print(d[0])
#print(d[1])
print(d[np.pi/2])

1
0


In [24]:
flag = np.pi in d
print(flag) 

False


In [27]:
d[np.pi] = -1

In [28]:
for k, v in d.items():
    print("key {}: value: {}".format(k, v))

key 0: value: 1
key 1.5707963267948966: value: 0
key 3.141592653589793: value: -1


In [29]:
print(d.keys())
print(d.values())
print(d.items())

dict_keys([0, 1.5707963267948966, 3.141592653589793])
dict_values([1, 0, -1])
dict_items([(0, 1), (1.5707963267948966, 0), (3.141592653589793, -1)])


Incidentally, this is one of the key reasons of using tuples -- we can't use lists as keys, but tuples are just fine!

In [48]:
#and of course, we can do dict comprehensions!
d = {str(x):np.cos(x) for x in np.linspace(0, np.pi, 5)}
print(d)

{'0.0': 1.0, '0.7853981633974483': 0.7071067811865476, '1.5707963267948966': 6.123233995736766e-17, '2.356194490192345': -0.7071067811865475, '3.141592653589793': -1.0}


In [51]:
d['0.0']

1.0

Pandas Overview

Basic structures:
- Series
- DataFrame
- Index

In [9]:
p = pd.Series([np.cos(x) for x in np.linspace(0, np.pi, 16)])
print(p)
print(type(p), id(p))
print(p.values)
print(type(p.values))
print(p.index)
print(type(p.index))

0     1.000000
1     0.978148
2     0.913545
3     0.809017
4     0.669131
5     0.500000
6     0.309017
7     0.104528
8    -0.104528
9    -0.309017
10   -0.500000
11   -0.669131
12   -0.809017
13   -0.913545
14   -0.978148
15   -1.000000
dtype: float64
<class 'pandas.core.series.Series'> 1579864845224
[ 1.          0.9781476   0.91354546  0.80901699  0.66913061  0.5
  0.30901699  0.10452846 -0.10452846 -0.30901699 -0.5        -0.66913061
 -0.80901699 -0.91354546 -0.9781476  -1.        ]
<class 'numpy.ndarray'>
RangeIndex(start=0, stop=16, step=1)
<class 'pandas.core.indexes.range.RangeIndex'>


In [5]:
p = pd.Series([np.cos(x) for x in np.linspace(0, np.pi, 16)],
             index=[np.linspace(0, np.pi, 16)])
print(p)
print(p.index)

0.000000    1.000000
0.209440    0.978148
0.418879    0.913545
0.628319    0.809017
0.837758    0.669131
1.047198    0.500000
1.256637    0.309017
1.466077    0.104528
1.675516   -0.104528
1.884956   -0.309017
2.094395   -0.500000
2.303835   -0.669131
2.513274   -0.809017
2.722714   -0.913545
2.932153   -0.978148
3.141593   -1.000000
dtype: float64
MultiIndex(levels=[[0.0, 0.20943951023931953, 0.41887902047863906, 0.6283185307179586, 0.8377580409572781, 1.0471975511965976, 1.2566370614359172, 1.4660765716752366, 1.6755160819145563, 1.8849555921538759, 2.0943951023931953, 2.3038346126325147, 2.5132741228718345, 2.722713633111154, 2.9321531433504733, 3.141592653589793]],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]])


Note that we can access values in a pd.Series using either an index (like it were an ndArray) or a label (like it were a dict)!!

In [43]:
p = pd.Series((np.cos(x) for x in np.linspace(0, np.pi, 5)),
              index=['zero','pi/4','pi/2','3pi/4','pi'])

In [44]:
print(p)

zero     1.000000e+00
pi/4     7.071068e-01
pi/2     6.123234e-17
3pi/4   -7.071068e-01
pi      -1.000000e+00
dtype: float64


In [6]:
p['pi/3']

KeyError: 'pi/3'

In [7]:
## create a series from a dict. The dict is created using the list comprehension above
p = pd.Series({str(x):np.cos(x) for x in np.linspace(0, np.pi, 5)})

In [8]:
print(p)

0.0                   1.000000e+00
0.7853981633974483    7.071068e-01
1.5707963267948966    6.123234e-17
2.356194490192345    -7.071068e-01
3.141592653589793    -1.000000e+00
dtype: float64


In [9]:
print(p[3])

-0.7071067811865475


In [60]:
print(p['2.356194490192345'])

-0.7071067811865475


In [63]:
v = p.get('2.35619449019234')
print(v)

None


In [11]:
## do ndarray like stuff: let's try our favourite sin^2 + cos^2
cos_s = pd.Series({str(x):np.cos(x) for x in np.linspace(0, np.pi, 5)})
sin_s = pd.Series({str(x):np.sin(x) for x in np.linspace(0, np.pi, 5)})

cos_sq_s = cos_s * cos_s
sin_sq_s = sin_s * sin_s

eqn = cos_sq_s + sin_sq_s
print(eqn)

0.0                   1.0
0.7853981633974483    1.0
1.5707963267948966    1.0
2.356194490192345     1.0
3.141592653589793     1.0
dtype: float64


In [12]:
## ... and do it with just parts of the array (indexing done by labels)
cos_sq_s = cos_s[1:] * cos_s[:-1]
sin_sq_s = sin_s[1:] * sin_s[:-1]

eqn = cos_sq_s + sin_sq_s
print(eqn)
print(cos_sq_s)

0.0                   NaN
0.7853981633974483    1.0
1.5707963267948966    1.0
2.356194490192345     1.0
3.141592653589793     NaN
dtype: float64
0.0                            NaN
0.7853981633974483    5.000000e-01
1.5707963267948966    3.749399e-33
2.356194490192345     5.000000e-01
3.141592653589793              NaN
dtype: float64


In [13]:
print(eqn.values)

[nan  1.  1.  1. nan]
