## <center> Pandas Cheat-sheet


to add:  pandas dataframe map(), transform(), apply(), merge(), join(), concat(), merge() join() etc - see goncalves lecture


### <center> IN BUILT DATA TYPES:
- ```int```
- ```float```
- ```bool```
- ```str```
- ```tuple```
- ```list```
- ```dict```
- ```set```


### <center> STRING FORMATTING
e.g.
```{04.f}``` -> Floating point, 4 dp.

### <center> REGULAR EXPRESSIONS
```import re```

etc

### <center> DATA STRUCTURES
 
TUPLE () -> Immmutable, only a few methods

LIST -> Mutable, lots of methods

DICT -> key-value pairs

SET -> un-ordered set of unique items

### <center> TUPLES ```()``` IMMUTABLE

#### Constructing

In [3]:
t = (1, 2.5, 'data')

# or

t = 1, 2.5, 'data'

#### Indexing

In [4]:
t[2] # return third entry.  Here, 'data'

'data'

In [5]:
type(t[2]) # type of 3rd entry. Here, string

str

#### Useful Methods

Only two:

In [6]:
t.count('data') # number of instances of a particular object
t.index('data') # index of the first instance of a particular object

2

### <center> LISTS ```[]``` MUTABLE ORDERED SORTABLE

- flexible, powerful

#### Constructing

In [7]:
l = [1, 2.5, 'data']

#### Indexing

In [8]:
l[2] # return 3rd entry.  Here, 'data'

'data'

In [9]:
l[2:5] # return items 3 to 6

['data']

#### Useful Methods

In [10]:
# append()
l.append([4, 3])
l

[1, 2.5, 'data', [4, 3]]

In [11]:
# extend()
l.extend([1.0, 1.5, 2.0])
l

[1, 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]

#### See also:

```insert()```

```remove()```

```reverse()```

### Using Lists in a Constrol Structure

Looping is generally done over a list object, e.g.:

```
for element in l[2:5]:
    print(element)
```

#### List Comprehension

In [12]:
m = [i**2 for i in range(5)]
m

[0, 1, 4, 9, 16]

### <center> DICTS ```{}``` MUTABLE UNORDERED NOT SORTABLE

Disctinoary data, retrieval by key. Aka key-value stores

#### Constructing

In [13]:
d= {
    'Name': 'Angela Merkel',
    'Country': 'DE',
    'Job': 'Chancellor',
    'Age': 64
}
d

{'Name': 'Angela Merkel', 'Country': 'DE', 'Job': 'Chancellor', 'Age': 64}

#### Indexing

In [14]:
print(d['Name'], d['Age'])

Angela Merkel 64


#### Useful Methods

In [15]:
# keys() - Just the keys
d.keys()

dict_keys(['Name', 'Country', 'Job', 'Age'])

In [16]:
# values() - just the values
d.values()

dict_values(['Angela Merkel', 'DE', 'Chancellor', 64])

In [17]:
# items - show all
d.items()

dict_items([('Name', 'Angela Merkel'), ('Country', 'DE'), ('Job', 'Chancellor'), ('Age', 64)])

You can also do maths:

In [18]:
d['Age'] +=1
d['Age']

65

#### Iterating

In [19]:
for item in d.items():
    print(item)

('Name', 'Angela Merkel')
('Country', 'DE')
('Job', 'Chancellor')
('Age', 65)


#### See also

In [None]:
d.clear()
d.copy()

### <center> NUMPY

### ```ndarray``` IMMUTABLE

(Note that there is also an in-built Python array class, distinct from numpy)


In [20]:
import numpy as np


#### Constructing

In [33]:
# Different ways to construct
a = np.array([0., 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0], dtype=np.float64)
a

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

#### Indexing

In [34]:
a[5:] # get element 6 to the end

array([2.5, 3. , 3.5, 4. ])

In [35]:
a[:2] # get the first two elements (items 0, 1, and not including 2)

array([0. , 0.5])

#### Useful Methods

In [37]:
a.sum() # sum
a.std() # standard deviation
a.cumsum() # cumulative sum
# + more...

```np.diff()``` - calculates the difference between subsequent elements of the array

Caution - be careful not to confuse with Pandas ```DataFrame.diff()```

In [43]:
jim = np.array([1, 5, 6, 12])
np.diff(jim)

array([4, 1, 6])

#### Vectorised Operations

In [44]:
a = np.array([0., 0.5, 1.0, 1.5, 3., 5.])
2 * a

array([ 0.,  1.,  2.,  3.,  6., 10.])

In [45]:
a**2

array([ 0.  ,  0.25,  1.  ,  2.25,  9.  , 25.  ])

In [46]:
a**a

array([1.00000000e+00, 7.07106781e-01, 1.00000000e+00, 1.83711731e+00,
       2.70000000e+01, 3.12500000e+03])

#### Universal Functions

These are ```numpy``` functions that apply to both normal python objects and also ```numpy``` objects.  Note, they are SLOW when applied to non-```numpy``` objects

In [47]:
np.exp(a) # i.e. e^a

array([  1.        ,   1.64872127,   2.71828183,   4.48168907,
        20.08553692, 148.4131591 ])

In [48]:
np.sqrt(a)

array([0.        , 0.70710678, 1.        , 1.22474487, 1.73205081,
       2.23606798])

In [49]:
# and also work on non-numpy objects
np.sqrt(2.5)

1.5811388300841898

#### Multiple Dimensional Arrays

In [50]:
b = np.array([a, a**2]) # 2-dimensional
b

array([[ 0.  ,  0.5 ,  1.  ,  1.5 ,  3.  ,  5.  ],
       [ 0.  ,  0.25,  1.  ,  2.25,  9.  , 25.  ]])

In mathematical form:

$$R = \begin{pmatrix}
a \\
a^2 \\
\end{pmatrix}
=
\begin{pmatrix}
0.0 & 0.5 & 1.0 & 1.5 & 3.0 & 5.0 \\
0.0 & 0.25 & 1.0 & 2.25 & 9.0 & 25 \\
\end{pmatrix}$$

#### Indexing

In [51]:
b[0] # Get the entire first row

array([0. , 0.5, 1. , 1.5, 3. , 5. ])

In [52]:
b[0,2] # third element on the first row

1.0

In [53]:
b[:,1] # final column

array([0.5 , 0.25])

In [56]:
b[-1, -1] # final value on the final row

25.0

#### Useful Methods

In [57]:
b.sum()

48.5

In [58]:
b.sum(axis=0) # sum across the FIRST axis i.e. column-wise

array([ 0.  ,  0.75,  2.  ,  3.75, 12.  , 30.  ])

In [59]:
b.sum(axis=1) # sum across the SECOND axis i.e. row-wise

array([11. , 37.5])

#### Constructing Multi-dimensional Arrays

If you already know all the elements to go in, construct as previously:

In [61]:
b = np.array(
    [
        [0., 0.5, 1.0, 1.5, 3., 5.], 
        [0., 0.25, 1., 2.25, 9., 25.]
    ]
)

In [63]:
# Or can pre-populate with zeros or ones:
c = np.zeros((2, 3), dtype = 'i', order = 'c')
c

# similarly with np.ones(...)

array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [None]:
# what does the "order" parameter mean?
# C-style (row-wise) ='c'
# Fortran-style(column-wise) = 'f'

In [65]:
# Can also clone the shape of an existing array:
d = np.zeros_like(c)
d

# again, similarly np.ones_like(...)

array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [66]:
# Or as an empty matrix
# NOTE!  values will be populated RANDOMLY
e = np.empty((2, 3))
e

# and similarly empty_like(...)

array([[ 0.  ,  0.75,  2.  ],
       [ 3.75, 12.  , 30.  ]])

#### Linspace

In [67]:
g = np.linspace(5, 15, 12) # linspace from 5 to 15 inclusive, and create 12 equally spaced steps
g

array([ 5.        ,  5.90909091,  6.81818182,  7.72727273,  8.63636364,
        9.54545455, 10.45454545, 11.36363636, 12.27272727, 13.18181818,
       14.09090909, 15.        ])

#### Meta Info

In [69]:
# Note - no brackets
g.size
g.itemsize
g.ndim
g.shape

(12,)

#### Reshaping & Resizing

```ndarrays``` are immutable, but you can reshape & resize

In [70]:
# e.g. start with 15 elements:
g = np.arange(15)

# and reshape to a 3x5 matrix:
g.reshape((3, 5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [73]:
# Can also transpose
g.reshape((3, 5)).T

# or
g.reshape((3, 5)).transpose()

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

#### Stacking (Horizontal & Vertical)

In [78]:
h = g.reshape((3,5))

# h at the top, 2*h below
np.vstack((h, 2*h))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

In [79]:
# side by side
np.hstack((h, 2*h))

array([[ 0,  1,  2,  3,  4,  0,  2,  4,  6,  8],
       [ 5,  6,  7,  8,  9, 10, 12, 14, 16, 18],
       [10, 11, 12, 13, 14, 20, 22, 24, 26, 28]])

#### Flattening

Flatten an n-dimensional array to 1D.  You can choose to flatten row-wise(C-style) or column-wise(fortran style)

In [80]:
h.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [81]:
h.flatten(order = 'f')

array([ 0,  5, 10,  1,  6, 11,  2,  7, 12,  3,  8, 13,  4,  9, 14])

In [82]:
# can also use ravel()
h.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Boolean matrices

Can be useful

In [83]:
# if h is as above, then h >= 3 creates 
# a boolean matrix of which values are >= 3
h >= 3

array([[False, False, False,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

In [84]:
# You can use this for indexing and data selection too:
h[ h>=3 ]

array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [86]:
# everything >=4 and <=12
h[(h > 4) & (h <= 12)]

# similarly "or"
h[(h > 4) | (h <= 12)]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [87]:
# See also np.where() for similar behaviour
np.where( h>7, 1, 0) # where values in h are > 7, then 1 else 0

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [1, 1, 1, 1, 1]])

In [88]:
np.where(h%2, "even", "odd")

array([['odd', 'even', 'odd', 'even', 'odd'],
       ['even', 'odd', 'even', 'odd', 'even'],
       ['odd', 'even', 'odd', 'even', 'odd']], dtype='<U4')

### ```Numpy``` vectorisation

You can easily add together two ```ndarrays```

In [89]:
r = np.arange(12).reshape((4,3))
s = np.arange(12).reshape((4,3))*0.5

r+s

array([[ 0. ,  1.5,  3. ],
       [ 4.5,  6. ,  7.5],
       [ 9. , 10.5, 12. ],
       [13.5, 15. , 16.5]])

"Broadcasting" also allows you to combine objects of different shapes, e.g. a scalar + matrix.  This "might" work on matrices of differing shapes, but be careful

In [90]:
2*r

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22]])

Often a function will work just as well with a vector as it will with a scalar, e.g.:

```
def f(x):
    return 3 * x + 5
```
You can call this equally with an ```ndarray``` as with a scalar

### <center> PANDAS
