# 4 NumPy and Pandas

## Introduction

After getting used to how Python works, it is the moment to begin getting our hands dirty with data analysis. We will study two packages: [NumPy](http://www.numpy.org/) is the fundamental numeric computing and linear algebra package in Python, that allows for decent data analysis. We will learn it not only for the data analysis, but more importantly because it will be a package that will be always present in our `import` section as scientists. After NumPy we will go to [Pandas](http://pandas.pydata.org/). Pandas is a dedicated data analysis package with a lot more functionalities than NumPy, making our life much easier in terms of data visualization and manipulation. All the power of Pandas will be completely unleashed in Section 5, where we will see how to visualize information in plots.

As usual, we begin importing the necessary packages

In [1]:
import numpy as np
print('NumPy:', np.__version__)

NumPy: 1.14.3


## Numeric Python (NumPy)

NumPy is an open-source add-on module to Python that provides common mathematical and numerical routines in pre-compiled, fast functions. These are growing into highly mature packages that provide functionality that meets, or perhaps exceeds, that
associated with common commercial software like MATLAB. The [NumPy](http://www.numpy.org/) (Numeric Python) package provides basic routines for manipulating large arrays and matrices of numeric data. The main object NumPy works with is a *homogeneous multidimensional array*. Despite its intimidating name, these are nothing but tables of numbers, each labelled by a tuple of indices.

We will now explore some capabilities of NumPy that will prove very useful not only for data analyisis, but throughout all our life with Python.

### Creating Arrays

As mentioned before, the main object in NumPy is the *array*. Creating one is as easy as calling the command `array`

In [2]:
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9]
my1darr = np.array(mylist)
my1darr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
type(my1darr)

numpy.ndarray

The same applies to multidimensional arrays

In [4]:
my2darr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my2darr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [5]:
my3darr = np.array([  [ [1], [2], [3] ], [ [4], [5], [6] ], [ [7], [8], [9] ]  ])
my3darr

array([[[1],
        [2],
        [3]],

       [[4],
        [5],
        [6]],

       [[7],
        [8],
        [9]]])

A NumPy array has a number of dimensions (or *axes*). To obtain the number of axes and the size of each of them you use the command `shape`. For 2-dimensional arrays (matrices), the order corresponds to (rows, columns)

There are two different ways of calling the `shape` command, either with `np.shape(arr)` or `arr.shape`. This is not the only command that works in both formats, and we will be finding some more in our way.

In [6]:
print(my1darr.shape)
print(my2darr.shape)
print(my3darr.shape)

print(np.shape(my1darr))
print(np.shape(my2darr))
print(np.shape(my3darr))

(9,)
(3, 3)
(3, 3, 1)
(9,)
(3, 3)
(3, 3, 1)


There is one restriction with respect to the use of lists: while you could create lists with data of different type, all the data in an array has to be of the same type, and it will be converted automatically.

In [7]:
lst = [1., 'cat']
print((lst[0]))

arr = np.array(lst)
print((arr[0]))

print( type((lst[0])) == type((arr[0])) )

print(type(lst[0]))
print(type(arr[0]))

1.0
1.0
False
<class 'float'>
<class 'numpy.str_'>


(We will go deeper into indexing in a while)

### Special Arrays

Now we review some built-in functions that create matrices commonly used. `ones` and `zeros` return arrays of given shape and type (default is float64), filled with ones or zeros, respectively.

In [8]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [9]:
np.ones((3, 2, 4), dtype=np.int8)

array([[[1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int8)

`eye(d)` returns a 2-D, dimension-$d$ array with ones on the diagonal and zeros elsewhere.

In [10]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

`eye` can also create arrays with ones in upper and lower diagonals. To achieve this, you must call `eye(d, d, k)` where $k$ denotes the diagonal (positive for above the center diagonal, negative for below), or `eye(d, k=num)`

In [11]:
print(np.eye(5, 5, 2))
np.eye(5, k=-1)

[[0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


array([[0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])

`diag`, depending on the input, either constructs a diagonal array (if the input is a 1-D array) or extracts a diagonal from a matrix (if the input is a 2-D array).

In [12]:
np.diag(my1darr, 1)

array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 3, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 4, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 5, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 6, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 8, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [13]:
np.diag(my2darr)

array([1, 5, 9])

In [14]:
np.diag(np.diag(my2darr))

array([[1, 0, 0],
       [0, 5, 0],
       [0, 0, 9]])

`arange(begin, end, step)` returns evenly spaced values within a given interval. Note that the beginning point is included, but not the ending.

In [15]:
myrange = np.arange(0, 16, 2) # start at 0 count up by 2, stop before 16
myrange

array([ 0,  2,  4,  6,  8, 10, 12, 14])

**Exercise 1**: Create an array of the first million of odd numbers, both with `arange` and using loops. Try timing both methods to see which one is faster. For that, use `%timeit`.

In [16]:
%timeit np.arange(0, 2000000, 2)

%timeit [i for i in range(2000000) if i % 2 == 0]

5.69 ms ± 692 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
330 ms ± 29.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Similarly, `linspace(begin, end, points)` returns evenly spaced numbers over a specified interval. Here, instead of specifying the step, you specify the amount of points you want. Also with `linspace` you include the ending of the interval.

In [17]:
myline = np.linspace(0, 14, 8) # start at 0, stop at 14 and get 7 evenly spaced points
myline

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14.])

In [18]:
print(len(myrange))
print(len(myline))

8
8


`reshape` changes the shape of an array, but not its data. This is another of the commands that can be called before or after the array.

In [19]:
print(np.reshape(myrange, (2, 4)))
print(myrange.reshape(4, 2))

print(np.reshape(myrange, (1, 8)))
print(myrange.reshape(8, 1))

[[ 0  2  4  6]
 [ 8 10 12 14]]
[[ 0  2]
 [ 4  6]
 [ 8 10]
 [12 14]]
[[ 0  2  4  6  8 10 12 14]]
[[ 0]
 [ 2]
 [ 4]
 [ 6]
 [ 8]
 [10]
 [12]
 [14]]


Note however that, in order for these changes to be permanent, you should do a reassignment of the variable

In [20]:
myrange  # After the reshapings above, the original array stays being the same

array([ 0,  2,  4,  6,  8, 10, 12, 14])

In [21]:
mynewrange = myrange.reshape(2, 4)
mynewrange  # Now that we have reassigned it is when it definitely changes shape

array([[ 0,  2,  4,  6],
       [ 8, 10, 12, 14]])

### Combining Arrays

The most general command for combining arrays is `concatenate(arrs, d)`. It takes a list of arrays and concatenates them along axis $d$. Remember your 3-D array:

In [22]:
print(my3darr)
print(my3darr.shape)

[[[1]
  [2]
  [3]]

 [[4]
  [5]
  [6]]

 [[7]
  [8]
  [9]]]
(3, 3, 1)


In [23]:
conc_along0 = np.concatenate([my3darr, 10 * my3darr], 0)
print(conc_along0)
print(conc_along0.shape)

[[[ 1]
  [ 2]
  [ 3]]

 [[ 4]
  [ 5]
  [ 6]]

 [[ 7]
  [ 8]
  [ 9]]

 [[10]
  [20]
  [30]]

 [[40]
  [50]
  [60]]

 [[70]
  [80]
  [90]]]
(6, 3, 1)


In [24]:
conc_along1 = np.concatenate([my3darr, 10 * my3darr], 1)
print(conc_along1)
print(conc_along1.shape)

[[[ 1]
  [ 2]
  [ 3]
  [10]
  [20]
  [30]]

 [[ 4]
  [ 5]
  [ 6]
  [40]
  [50]
  [60]]

 [[ 7]
  [ 8]
  [ 9]
  [70]
  [80]
  [90]]]
(3, 6, 1)


In [25]:
conc_along2 = np.concatenate([my3darr, 10 * my3darr], 2)
print(conc_along2)
print(conc_along2.shape)

[[[ 1 10]
  [ 2 20]
  [ 3 30]]

 [[ 4 40]
  [ 5 50]
  [ 6 60]]

 [[ 7 70]
  [ 8 80]
  [ 9 90]]]
(3, 3, 2)


However, for common combinations there exist special commands. Use `vstack` to stack arrays in sequence vertically (row wise), `hstack` to stack arrays in sequence horizontally (column wise).

In [26]:
print(my1darr)
print(my1darr.shape)

[1 2 3 4 5 6 7 8 9]
(9,)


In [27]:
print(np.vstack([my1darr, 10 * my1darr]))
print('\n')
print(np.hstack([my1darr, 10 * my1darr]))

[[ 1  2  3  4  5  6  7  8  9]
 [10 20 30 40 50 60 70 80 90]]


[ 1  2  3  4  5  6  7  8  9 10 20 30 40 50 60 70 80 90]


In [28]:
print(my2darr)
print(my2darr.shape)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
(3, 3)


In [29]:
print(np.vstack([my2darr, 10 * my2darr]))
print('\n')
print(np.hstack([my2darr, 10 * my2darr]))

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 20 30]
 [40 50 60]
 [70 80 90]]


[[ 1  2  3 10 20 30]
 [ 4  5  6 40 50 60]
 [ 7  8  9 70 80 90]]


### Operations

You can perform easily element-wise operations on arrays of any shape. Use the typical symbols, +, -, \*, / and \*\* to perform element-wise addition, subtraction, multiplication, division and power.

In [30]:
x = np.array([1, 2, 3])
print(x)
print(x + 10)
print(3 * x)
print(1 / x)
print(x ** (-2 / 3))
print(2 ** x)

[1 2 3]
[11 12 13]
[3 6 9]
[1.         0.5        0.33333333]
[1.         0.62996052 0.48074986]
[2 4 8]


Also (and obviously) these symbols can be used to operate between two arrays, which must be of the same shape. If this is the case, they also do element-wise operations

In [31]:
y = np.arange(4, 7, 1)
print(x + y)     # [1+4, 2+5, 3+6]
print(x * y)     # [1*4, 2*5, 3*6]
print(x / y)     # [1/4, 2/5, 3/6]
print(x ** y)    # [1**4, 2**5, 3**6]

[5 7 9]
[ 4 10 18]
[0.25 0.4  0.5 ]
[  1  32 729]


For doing vector or matrix multiplication, the command to be used is `dot`

In [32]:
print(x)
print(y)
x.dot(y)  # 1*4 + 2*5 + 3*6

[1 2 3]
[4 5 6]


32

With python 3.5 matrix multiplication got it's own operator

In [33]:
print(x)
print(y)
x @ y  # 1*4 + 2*5 + 3*6

[1 2 3]
[4 5 6]


32

In [34]:
X = np.array([[i + j for i in range(3, 6)] for j in range(3)])
Y = np.diag([1, 1], 1) + np.diag([1], -2)

print(X)
print(Y)

[[3 4 5]
 [4 5 6]
 [5 6 7]]
[[0 1 0]
 [0 0 1]
 [1 0 0]]


The operator `*` returns the elements-wise multiplication of arrays with the same shape.

In [35]:
X * Y

array([[0, 4, 0],
       [0, 0, 6],
       [5, 0, 0]])

The operator `@` returns the product of arrays compatible shapes.

In [36]:
X @ Y

array([[5, 3, 4],
       [6, 4, 5],
       [7, 5, 6]])

**Exercise 2**: Take a 10x2 matrix representing $(x1,x2)$ coordinates and transform them into polar coordinates $(r,\theta)$.

*Hint 1: the inverse transformation is given by $x1 = r\cos\theta$, $x2 = r\sin\theta$*

*Hint 2: generate random numbers with the functions in numpy.random*

In [37]:
z = np.random.random((10, 2))
x1, x2 = z[:, 0], z[:, 1]
R = np.sqrt(x1 ** 2 + x2 ** 2)
T = np.arctan2(x2, x1)
print(R)
print(T)

[0.76204999 1.28328499 1.04841484 1.0235242  0.79018177 0.76184656
 0.9989484  1.07118975 0.84365048 0.73135688]
[1.49213664 0.78953915 0.86629082 1.03912044 0.32902248 1.46562262
 0.42149847 0.67319124 0.85020763 0.56076331]


#### Transposing

Transposition is a very important operation for linear algebra. Although NumPy is capable of correctly doing matrix-vector products correctly regardless of the orientation of the vector, it is not the case for products of matrices

In [38]:
Z = np.arange(0, 12, 1).reshape((4, 3))

In [39]:
print(Z)
print(y)
Z @ y

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[4 5 6]


array([ 17,  62, 107, 152])

In [40]:
print(Z)
print(y.T)
Z @ y.T

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[4 5 6]


array([ 17,  62, 107, 152])

In [41]:
print(Z)
print(X)
Z @ X  # (4 rows and 3 cols) @ (3 rows and 3 cols) yields (4 rows and 3 cols)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[3 4 5]
 [4 5 6]
 [5 6 7]]


array([[ 14,  17,  20],
       [ 50,  62,  74],
       [ 86, 107, 128],
       [122, 152, 182]])

In [42]:
print(Z.T)
print(X)
#Z.T @ X  # (3 rows and 4 cols) @ (3 rows and 3 cols) yields error

[[ 0  3  6  9]
 [ 1  4  7 10]
 [ 2  5  8 11]]
[[3 4 5]
 [4 5 6]
 [5 6 7]]


#### Array Methods

To not have to go from NumPy arrays to lists back and forth, NumPy contains some functions to know properties of your arrays. Actually, there are more of these functions than in standard Python.

In [43]:
a = np.array([-4, -2, 1, 3, 5])
print(a.max())
print(a.min())
print(a.sum())
print(a.mean())
print(a.std())

5
-4
3
0.6
3.2619012860600183


Some interesting functions are `argmax` and `argmin`, which return the index of the maximum and minimum values in the array.

In [44]:
print(a.argmax())
print(a.argmin())

4
0


### Indexing/Slicing

We have already seen briefly that to access individual elements you use the bracket notation: `array[ax_0, ax_1, ...]`, where the `ax_i` denotes the coordinate in the `i`-th axis. You can even use this to assign new values to your elements.

In [45]:
r = [4, 5, 6, 7]
print(r[2])
# Reassigning the value stored in index 0
r[0] = 198
r

6


[198, 5, 6, 7]

To select a range of rows or columns you can use a colon `:`. A second `:` can be used to indicate the step size. `array[start:stop:stepsize]`. If you leave `start` (`stop`) blank, the selection will go from the very beginning (until the very end) of the array

In [46]:
s = np.arange(13)**2
print(s)

# Starting by index 2, until index 10, pick each element
print(s[3:10])

# Starting by index 2, until index 10, pick every 3 elements 
print(s[2:10:3])

# Starting by index -5, until the end, pick every 2 elements
print(s[-5::2])

# Starting by index -5, until the end, pick every 2 elements backwards
print(s[-5::-2])

[  0   1   4   9  16  25  36  49  64  81 100 121 144]
[ 9 16 25 36 49 64 81]
[ 4 25 64]
[ 64 100 144]
[64 36 16  4  0]


The same applies to matrices or higher-dimensional arrays

In [47]:
r = np.arange(36).reshape((6, 6))
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [48]:
# Pick rows 2 to 5, and cols 1 to 3 (remember, row 5 and col 3 not included)
r[2:5, 1:3]

array([[13, 14],
       [19, 20],
       [25, 26]])

You can also select specific rows and columns, separated by commas

In [49]:
# Pick rows 1, 3 and 4, and cols 1 to 3 (remember, col 3 not included)
r[[1, 3, 4], 1:3]

array([[ 7,  8],
       [19, 20],
       [25, 26]])

A very useful tool is *conditional indexing*, where we apply a function, assignment... only to those elements of an array that satisfy some condition

In [50]:
r[r > 30] = 30
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

**Exercise 3**: Create a random 1-dimensional array, and find which element is closest to 0.7

In [51]:
Z = np.random.uniform(0,1,100)
z = 0.7
m = Z[np.abs(Z - z).argmin()]
print(m)

0.708889504266799


#### Copying Data

**Be very careful with copying and modifying arrays in NumPy!** You will see the reason right now. Let's begin defining `r2` as a slice of r

In [52]:
r2 = r[:3,:3]
r2

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])

And now let's set all its elements to zero

In [53]:
r2[:] = 0
r2

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

When looking at `r`, we see that it has also been changed!

In [54]:
r

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

The proper way of handling selections without modifying the original arrays is through the `copy` command.

In [55]:
r_copy = r.copy()
r_copy

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

Now we can safely modify `r_copy` without affecting `r`.

In [56]:
r_copy[:] = 10
print('f{r_copy}\n')
r

f{r_copy}



array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

### Iterating Over Arrays

Finally, you can iterate over arrays in the same way as you iterate over lists

In [57]:
test = np.random.randint(0, 10, (4,3))
test

array([[4, 5, 8],
       [0, 9, 8],
       [5, 1, 5],
       [0, 9, 0]])

You can iterate by row:

In [58]:
for row in test:
    print(row)

[4 5 8]
[0 9 8]
[5 1 5]
[0 9 0]


Or by row index

In [59]:
for i in range(len(test)):
    print(test[i])

[4 5 8]
[0 9 8]
[5 1 5]
[0 9 0]


Or by row and index:

In [60]:
for i, row in enumerate(test):
    print(f'Row {i} is {row}')

Row 0 is [4 5 8]
Row 1 is [0 9 8]
Row 2 is [5 1 5]
Row 3 is [0 9 0]


In the same way as with lists, you can use `zip` to iterate over multiple iterables.

In [61]:
test2 = test**2
test2

array([[16, 25, 64],
       [ 0, 81, 64],
       [25,  1, 25],
       [ 0, 81,  0]], dtype=int32)

In [62]:
for i, j in zip(test, test2):
    print(f'{i} + {j} = {i+j}')

[4 5 8] + [16 25 64] = [20 30 72]
[0 9 8] + [ 0 81 64] = [ 0 90 72]
[5 1 5] + [25  1 25] = [30  2 30]
[0 9 0] + [ 0 81  0] = [ 0 90  0]


**Exercise 4**: Create a function that iterates over the columns of a 2-dimensional array

In [63]:
def iterate(df):
    for i, row in enumerate(df):
        shp = row.shape
        row.shape = shp + (1,)
        print(f'Column {i} is {row}')

iterate(test.T)

Column 0 is [[4]
 [0]
 [5]
 [0]]
Column 1 is [[5]
 [9]
 [1]
 [9]]
Column 2 is [[8]
 [8]
 [5]
 [0]]


### Loading and Saving Data

To load and save data NumPy has the `loadtxt` and `savetxt` commands. However, they only work for two-dimensional arrays

In [64]:
np.savetxt('numpytest.txt', test)
np.loadtxt('numpytest.txt')

array([[4., 5., 8.],
       [0., 9., 8.],
       [5., 1., 5.],
       [0., 9., 0.]])

## Pandas

When dealing with numeric matrices and vectors in Python, NumPy makes life a lot easier. For more complex data, however, it leaves a bit to be desired. For those used to working with dedicated languages like R, doing data analysis directly with numpy feels like a step back. Fortunately, some nice folks have written the Python Data Analysis Library (a.k.a. [pandas](http://pandas.pydata.org/)). Pandas provides an R-like DataFrame, produces high quality plots with matplotlib, and integrates nicely with other libraries that expect NumPy arrays.

Pandas works with `Series` of data, that then are arranged in `DataFrame`s. A dataframe will be the object closest to an Excel spreadsheet that you will see throughout the course (but of course, given that it is integrated in Python and can be combined with so many different packages, dataframes are much more powerful than Excel spreadsheets). The data in the series can be either qualitative or quantitative data. Creating a series is as easy as creating a NumPy array from a one-dimensional list.

In [65]:
import pandas as pd
print('Pandas:', pd.__version__)

Pandas: 0.23.0


In [66]:
animals = ['Tiger', 'Bear', 'Moose']
pd.Series(animals)

0    Tiger
1     Bear
2    Moose
dtype: object

In [67]:
numbers = [1, 2, 3]
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

Notice that the series is indexed by default by integers. You can change this indexing by using a dictionary instead of a list for creating the series.

In [68]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

On the other hand, `DataFrame`s can be built from two-dimensional arrays, with the ability of labelling columns and indexing the rows

In [69]:
# Sampling a 1000 rows 6 cols 2D array from the standard normal distribution and creating DataFrame
u = pd.DataFrame(np.random.randn(1000, 6),
                 index=np.arange(0, 3000, 3),
                 columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(type(u))

u

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,A,B,C,D,E,F
0,-0.662239,-0.860211,-0.693941,0.180686,0.522984,-3.488931
3,-1.856639,-0.741630,-0.659115,0.480230,1.427569,-1.390488
6,-0.622770,0.576408,-1.418174,0.234617,-2.298688,-1.664684
9,0.144419,-1.442894,0.542897,0.913752,-0.719872,0.172177
12,-2.008801,-0.105574,-0.123445,-0.103497,0.330018,1.025630
15,-0.648124,0.921015,-0.042529,-1.105254,-2.051500,0.686010
18,0.390465,0.191777,0.093593,-1.178423,-0.874159,0.755770
21,0.532867,1.231472,0.203445,-1.304901,1.376622,0.936794
24,1.406641,0.075832,-0.586003,1.272955,-0.002525,0.303566
27,2.423388,0.549470,1.237121,-0.966267,-0.169228,0.365107


As you might have noticed, it is a bit ugly to deal with large dataframes. There are however some functions that allows to have an idea of the data in a frame.

In [70]:
u.head()

Unnamed: 0,A,B,C,D,E,F
0,-0.662239,-0.860211,-0.693941,0.180686,0.522984,-3.488931
3,-1.856639,-0.74163,-0.659115,0.48023,1.427569,-1.390488
6,-0.62277,0.576408,-1.418174,0.234617,-2.298688,-1.664684
9,0.144419,-1.442894,0.542897,0.913752,-0.719872,0.172177
12,-2.008801,-0.105574,-0.123445,-0.103497,0.330018,1.02563


In [71]:
u.tail()

Unnamed: 0,A,B,C,D,E,F
2985,0.683408,0.421711,-0.220345,0.266733,1.363646,1.69447
2988,-0.550528,-0.965207,1.137873,1.457458,-0.443835,-1.209277
2991,1.242121,0.273902,0.267935,0.701894,-1.348973,-0.487914
2994,0.414032,-0.06176,-0.582928,0.18152,0.624948,0.618543
2997,-0.543866,-0.115203,-0.379113,2.51803,-0.155285,1.529563


In [72]:
u.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 2997
Data columns (total 6 columns):
A    1000 non-null float64
B    1000 non-null float64
C    1000 non-null float64
D    1000 non-null float64
E    1000 non-null float64
F    1000 non-null float64
dtypes: float64(6)
memory usage: 54.7 KB


In [73]:
u.describe()

Unnamed: 0,A,B,C,D,E,F
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.043603,-0.02589,-0.004746,-0.036378,0.038894,-0.039051
std,0.999383,0.978131,1.001992,1.000319,1.020122,1.020924
min,-2.486541,-3.525007,-3.143087,-3.113421,-2.952576,-3.488931
25%,-0.67428,-0.682204,-0.693943,-0.704353,-0.651564,-0.715428
50%,0.050285,-0.041296,-0.016096,-0.037925,-0.015326,-0.077394
75%,0.717496,0.62836,0.67594,0.647556,0.77289,0.643107
max,3.298376,3.073995,3.452285,2.76927,2.781209,3.584851


One can also change the maximal number of rows that is displayed:

In [74]:
pd.set_option('display.max_rows', 15)

u

Unnamed: 0,A,B,C,D,E,F
0,-0.662239,-0.860211,-0.693941,0.180686,0.522984,-3.488931
3,-1.856639,-0.741630,-0.659115,0.480230,1.427569,-1.390488
6,-0.622770,0.576408,-1.418174,0.234617,-2.298688,-1.664684
9,0.144419,-1.442894,0.542897,0.913752,-0.719872,0.172177
12,-2.008801,-0.105574,-0.123445,-0.103497,0.330018,1.025630
15,-0.648124,0.921015,-0.042529,-1.105254,-2.051500,0.686010
18,0.390465,0.191777,0.093593,-1.178423,-0.874159,0.755770
...,...,...,...,...,...,...
2979,0.418268,0.899213,1.318929,-0.665430,-0.668137,0.401558
2982,-0.485414,-0.663386,-0.639191,0.926521,0.407382,0.348006


### Indexing/Slicing in Pandas

The easiest way of accessing information in a Pandas dataframe, equivalent to the way used in NumPy, is using the `iloc` command. With this you can also set specific values, do conditional indexing... all that we have seen before in section 2.4

In [75]:
# Slice-in rows index 125 to 132 (132 included!) from columns index 0, 2 and 5
u.iloc[125:132, [0, 2, 5]]

Unnamed: 0,A,C,F
375,0.175071,0.564787,-0.977051
378,0.574764,-1.439122,-0.379785
381,-0.010161,0.097404,0.851237
384,-1.207635,-0.896451,0.507183
387,1.719935,-1.512461,0.811479
390,-0.328843,-0.356548,-2.912631
393,-0.155489,0.053797,-0.493593


You can also choose specific rows according to their indices with the `loc` command

In [76]:
# Slice-in rows 375 to 393 (393 included!) from columns A, C and F
u.loc[375:393, ['A', 'C', 'F']]

Unnamed: 0,A,C,F
375,0.175071,0.564787,-0.977051
378,0.574764,-1.439122,-0.379785
381,-0.010161,0.097404,0.851237
384,-1.207635,-0.896451,0.507183
387,1.719935,-1.512461,0.811479
390,-0.328843,-0.356548,-2.912631
393,-0.155489,0.053797,-0.493593


The usual `[]` will select specific rows according to the row number

In [77]:
# Slice-in rows index 125 to 132 (132 included!) from columns A, C and F
u[125:132][['A', 'C', 'F']]

Unnamed: 0,A,C,F
375,0.175071,0.564787,-0.977051
378,0.574764,-1.439122,-0.379785
381,-0.010161,0.097404,0.851237
384,-1.207635,-0.896451,0.507183
387,1.719935,-1.512461,0.811479
390,-0.328843,-0.356548,-2.912631
393,-0.155489,0.053797,-0.493593


However, there are a few different ways of accessing the data in a Pandas dataframe, that typically have a more "direct" connection with the actual content fo the dataframe. Individual or sets of columns can also be accessed by their column names. Choosing one single column will give a Series, while two or more will produce a DataFrame

In [78]:
u['A'].head()

0    -0.662239
3    -1.856639
6    -0.622770
9     0.144419
12   -2.008801
Name: A, dtype: float64

In [79]:
u[['A', 'D']].head()

Unnamed: 0,A,D
0,-0.662239,0.180686
3,-1.856639,0.48023
6,-0.62277,0.234617
9,0.144419,0.913752
12,-2.008801,-0.103497


Not only that, you can access a single column without the need of brackets []

In [80]:
u.A.head()

0    -0.662239
3    -1.856639
6    -0.622770
9     0.144419
12   -2.008801
Name: A, dtype: float64

Or, you can access just the elements that satisfy some condition

In [81]:
u[u.D > 2]

Unnamed: 0,A,B,C,D,E,F
354,-1.190963,2.016324,-0.091702,2.751617,0.380951,0.169527
447,1.092366,-0.957381,0.160103,2.424202,0.461510,1.396946
480,1.557415,-2.231186,-1.307552,2.151007,1.596920,-1.338196
549,0.748195,-0.331447,-0.684377,2.290553,-1.891842,-0.171652
684,-1.495301,1.348423,0.879932,2.387653,-0.956090,0.008358
783,-1.064988,-0.934688,-1.154213,2.205864,-0.257541,-0.010576
1182,-0.361148,0.847534,0.017291,2.741245,1.432786,-0.161272
...,...,...,...,...,...,...
2373,-0.054766,0.107712,-0.260757,2.650631,-0.642320,-0.590547
2469,0.148001,0.588214,-1.115547,2.010975,-0.245087,-0.427692


In [82]:
u[~(u.D > 2)]  # For the inverse of u.D > 2

Unnamed: 0,A,B,C,D,E,F
0,-0.662239,-0.860211,-0.693941,0.180686,0.522984,-3.488931
3,-1.856639,-0.741630,-0.659115,0.480230,1.427569,-1.390488
6,-0.622770,0.576408,-1.418174,0.234617,-2.298688,-1.664684
9,0.144419,-1.442894,0.542897,0.913752,-0.719872,0.172177
12,-2.008801,-0.105574,-0.123445,-0.103497,0.330018,1.025630
15,-0.648124,0.921015,-0.042529,-1.105254,-2.051500,0.686010
18,0.390465,0.191777,0.093593,-1.178423,-0.874159,0.755770
...,...,...,...,...,...,...
2973,-0.439616,-1.109086,-0.939723,1.892993,-0.334205,-0.031487
2979,0.418268,0.899213,1.318929,-0.665430,-0.668137,0.401558


Recently `query` has been added to `DataFrame` for the same purpose. While it is less powerful than logical indexing, it is often faster and shorter (when names are longer than just `u`):

In [83]:
u.query('D > 2')

Unnamed: 0,A,B,C,D,E,F
354,-1.190963,2.016324,-0.091702,2.751617,0.380951,0.169527
447,1.092366,-0.957381,0.160103,2.424202,0.461510,1.396946
480,1.557415,-2.231186,-1.307552,2.151007,1.596920,-1.338196
549,0.748195,-0.331447,-0.684377,2.290553,-1.891842,-0.171652
684,-1.495301,1.348423,0.879932,2.387653,-0.956090,0.008358
783,-1.064988,-0.934688,-1.154213,2.205864,-0.257541,-0.010576
1182,-0.361148,0.847534,0.017291,2.741245,1.432786,-0.161272
...,...,...,...,...,...,...
2373,-0.054766,0.107712,-0.260757,2.650631,-0.642320,-0.590547
2469,0.148001,0.588214,-1.115547,2.010975,-0.245087,-0.427692


### Reshaping `DataFrame`s

In [84]:
df1 = pd.DataFrame()

df1['sample'] = ['A', 'A', 'A', 'B', 'B', 'B']
df1['replicate'] = ['01', '02', '03', '01', '02', '03']
df1['protein'] = 'P02768'
df1['value1'] = np.random.randn(6)

df1

Unnamed: 0,sample,replicate,protein,value1
0,A,1,P02768,-0.799373
1,A,2,P02768,-0.455
2,A,3,P02768,-0.226954
3,B,1,P02768,-0.0043
4,B,2,P02768,0.605822
5,B,3,P02768,1.044721


In [150]:
pivot_df1 = df1.pivot(index='replicate', columns='sample', values='value1')

pivot_df1.head()

sample,A,B
replicate,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.799373,-0.0043
2,-0.455,0.605822
3,-0.226954,1.044721


### Computing With `DataFrames`

You can calculate with `DataFrames` or their columns (which are `Series`) the same way you could with `arrays`s

In [86]:
df1['value2'] = 1 / df1['value1']
df1.head()

Unnamed: 0,sample,replicate,protein,value1,value2
0,A,1,P02768,-0.799373,-1.25098
1,A,2,P02768,-0.455,-2.197804
2,A,3,P02768,-0.226954,-4.406174
3,B,1,P02768,-0.0043,-232.575816
4,B,2,P02768,0.605822,1.650651


In [151]:
np.mean(df1)

value1     0.027486
value2   -39.637155
dtype: float64

You can apply functions to the whole dataset or specific columns with the `apply` command. `apply` acts on the whole column at a time (i.e. a Pandas `Series`), so you can compute things that depend on several values of the column, for instance the mean value. To apply functions in a real element-by-element basis the function `applymap` or `Series.apply` should be used.

In [152]:
def mn(col):
    return sum(col) / len(col)

df1[['value1', 'value2']].apply(mn)

value1     0.027486
value2   -39.637155
dtype: float64

While most can be directly calculated (including the given example of the mean), `apply` also works on columns with strings or categorical data, where no mathematical operations are defined. The limit is the imagination.

### Combining `DataFrames`

Something we will do quite often as scientists is combining data from different sources into one single source. This can be achieved by different commands in Pandas, depending on the actual goal we want.

To begin with, appending new rows of data is achieved by the command `append`.

In [153]:
df2 = pd.DataFrame()

df2['sample'] = ['A', 'A', 'A', 'B', 'B', 'B']
df2['replicate'] = ['01', '02', '03', '01', '02', '03']
df2['protein'] = 'P69892'
df2['value1'] = np.random.randn(6)
df2['value2'] = 1 / df2['value1']

df2

Unnamed: 0,sample,replicate,protein,value1,value2
0,A,1,P69892,0.300071,3.332542
1,A,2,P69892,0.192804,5.186606
2,A,3,P69892,-2.00234,-0.499416
3,B,1,P69892,1.063775,0.940049
4,B,2,P69892,0.910545,1.098243
5,B,3,P69892,-0.509781,-1.961625


In [154]:
df1.append(df2, ignore_index=True)

Unnamed: 0,sample,replicate,protein,value1,value2
0,A,1,P02768,-0.799373,-1.25098
1,A,2,P02768,-0.455,-2.197804
2,A,3,P02768,-0.226954,-4.406174
3,B,1,P02768,-0.0043,-232.575816
4,B,2,P02768,0.605822,1.650651
5,B,3,P02768,1.044721,0.957193
6,A,1,P69892,0.300071,3.332542
7,A,2,P69892,0.192804,5.186606
8,A,3,P69892,-2.00234,-0.499416
9,B,1,P69892,1.063775,0.940049


The same result can be obtained with `concat`.

In [155]:
df = pd.concat([df1, df2], ignore_index=True)

df

Unnamed: 0,sample,replicate,protein,value1,value2
0,A,1,P02768,-0.799373,-1.25098
1,A,2,P02768,-0.455,-2.197804
2,A,3,P02768,-0.226954,-4.406174
3,B,1,P02768,-0.0043,-232.575816
4,B,2,P02768,0.605822,1.650651
5,B,3,P02768,1.044721,0.957193
6,A,1,P69892,0.300071,3.332542
7,A,2,P69892,0.192804,5.186606
8,A,3,P69892,-2.00234,-0.499416
9,B,1,P69892,1.063775,0.940049


### Grouping Data

In [156]:
df.groupby('protein').agg(sum)

Unnamed: 0_level_0,value1,value2
protein,Unnamed: 1_level_1,Unnamed: 2_level_1
P02768,0.164916,-237.82293
P69892,-0.044926,8.096399


In [157]:
df.groupby(['protein', 'sample']).agg(sum)

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
protein,sample,Unnamed: 2_level_1,Unnamed: 3_level_1
P02768,A,-1.481327,-7.854958
P02768,B,1.646243,-229.967973
P69892,A,-1.509464,8.019732
P69892,B,1.464538,0.076667


In [158]:
df.groupby(['protein', 'sample', 'replicate']).agg(sum)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value1,value2
protein,sample,replicate,Unnamed: 3_level_1,Unnamed: 4_level_1
P02768,A,1,-0.799373,-1.25098
P02768,A,2,-0.455,-2.197804
P02768,A,3,-0.226954,-4.406174
P02768,B,1,-0.0043,-232.575816
P02768,B,2,0.605822,1.650651
P02768,B,3,1.044721,0.957193
P69892,A,1,0.300071,3.332542
P69892,A,2,0.192804,5.186606
P69892,A,3,-2.00234,-0.499416
P69892,B,1,1.063775,0.940049


In [161]:
df.groupby('protein').transform(np.mean)

Unnamed: 0,replicate,value1,value2
0,1700502000.0,0.027486,-39.637155
1,1700502000.0,0.027486,-39.637155
2,1700502000.0,0.027486,-39.637155
3,1700502000.0,0.027486,-39.637155
4,1700502000.0,0.027486,-39.637155
5,1700502000.0,0.027486,-39.637155
6,1700502000.0,-0.007488,1.3494
7,1700502000.0,-0.007488,1.3494
8,1700502000.0,-0.007488,1.3494
9,1700502000.0,-0.007488,1.3494


In [114]:
df.groupby('protein')['value1', 'value2'].transform(np.mean)

Unnamed: 0,value1,value2
0,0.027486,-39.637155
1,0.027486,-39.637155
2,0.027486,-39.637155
3,0.027486,-39.637155
4,0.027486,-39.637155
5,0.027486,-39.637155
6,0.258036,-1.046025
7,0.258036,-1.046025
8,0.258036,-1.046025
9,0.258036,-1.046025


In [162]:
for g, g_df in df.groupby(['protein', 'sample']):
    print(g_df)
    print(f"{g} --> mean value1: {np.mean(g_df['value1'])}")
    print(f"      mean value2: {np.mean(g_df['value2'])}\n")

  sample replicate protein    value1    value2
0      A        01  P02768 -0.799373 -1.250980
1      A        02  P02768 -0.455000 -2.197804
2      A        03  P02768 -0.226954 -4.406174
('P02768', 'A') --> mean value1: -0.49377574651694944
      mean value2: -2.618319192097318

  sample replicate protein    value1      value2
3      B        01  P02768 -0.004300 -232.575816
4      B        02  P02768  0.605822    1.650651
5      B        03  P02768  1.044721    0.957193
('P02768', 'B') --> mean value1: 0.5487477471544414
      mean value2: -76.65599087675166

  sample replicate protein    value1    value2
6      A        01  P69892  0.300071  3.332542
7      A        02  P69892  0.192804  5.186606
8      A        03  P69892 -2.002340 -0.499416
('P69892', 'A') --> mean value1: -0.5031547793839836
      mean value2: 2.673244002156377

   sample replicate protein    value1    value2
9       B        01  P69892  1.063775  0.940049
10      B        02  P69892  0.910545  1.098243
11      B

In [163]:
df.groupby(['protein', 'sample']).describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value1,value1,value1,value1,value1,value1,value1,value2,value2,value2,value2,value2,value2,value2,value2
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
protein,sample,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
P02768,A,3.0,-0.493776,0.288173,-0.799373,-0.627186,-0.455,-0.340977,-0.226954,3.0,-2.618319,1.619085,-4.406174,-3.301989,-2.197804,-1.724392,-1.25098
P02768,B,3.0,0.548748,0.526834,-0.0043,0.300761,0.605822,0.825271,1.044721,3.0,-76.655991,135.030975,-232.575816,-115.809312,0.957193,1.303922,1.650651
P69892,A,3.0,-0.503155,1.29944,-2.00234,-0.904768,0.192804,0.246438,0.300071,3.0,2.673244,2.899779,-0.499416,1.416563,3.332542,4.259574,5.186606
P69892,B,3.0,0.488179,0.867649,-0.509781,0.200382,0.910545,0.98716,1.063775,3.0,0.025556,1.722765,-1.961625,-0.510788,0.940049,1.019146,1.098243


In [164]:
df.pivot_table(index='protein',
               columns='sample', 
               aggfunc='mean')

Unnamed: 0_level_0,value1,value1,value2,value2
sample,A,B,A,B
protein,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
P02768,-0.493776,0.548748,-2.618319,-76.655991
P69892,-0.503155,0.488179,2.673244,0.025556


In [165]:
df.pivot_table(index='protein',
               columns='sample',
               aggfunc={'value1': min,
                        'value2': max})

Unnamed: 0_level_0,value1,value1,value2,value2
sample,A,B,A,B
protein,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
P02768,-0.799373,-0.0043,-1.25098,1.650651
P69892,-2.00234,-0.509781,5.186606,1.098243


### Loading and saving dataframes

To load and save Pandas dataframes we will use the `to_csv` and `read_csv` commands

In [143]:
df.to_csv('test.csv')
pd.read_csv('test.csv', index_col=0)

Unnamed: 0,sample,replicate,protein,value1,value2
0,A,1,P02768,-0.799373,-1.25098
1,A,2,P02768,-0.455,-2.197804
2,A,3,P02768,-0.226954,-4.406174
3,B,1,P02768,-0.0043,-232.575816
4,B,2,P02768,0.605822,1.650651
5,B,3,P02768,1.044721,0.957193
6,A,1,P69892,-0.282452,-3.540424
7,A,2,P69892,-0.631005,-1.584773
8,A,3,P69892,0.200593,4.98523
9,B,1,P69892,1.725466,0.579554


But, as an addition, Pandas has special commands to load and save Excel spreadsheets (yay!). However, to use it you'll need the `openpyxl` and `xlrd` packages.

In [144]:
df.to_excel('test.xlsx', sheet_name='My sheet')
pd.read_excel('test.xlsx', 'My sheet', index_col=0)

Unnamed: 0,sample,replicate,protein,value1,value2
0,A,1,P02768,-0.799373,-1.25098
1,A,2,P02768,-0.455,-2.197804
2,A,3,P02768,-0.226954,-4.406174
3,B,1,P02768,-0.0043,-232.575816
4,B,2,P02768,0.605822,1.650651
5,B,3,P02768,1.044721,0.957193
6,A,1,P69892,-0.282452,-3.540424
7,A,2,P69892,-0.631005,-1.584773
8,A,3,P69892,0.200593,4.98523
9,B,1,P69892,1.725466,0.579554


**Exercise 5**: Download [this dataset](https://raw.githubusercontent.com/ChihChengLiang/pokemongor/master/data-raw/pokemons.csv) and load it, using the first column as the index. Take a look at it, and do the following things:
- Choose the columns 'Identifier', 'BaseStamina', 'BaseAttack', 'BaseDefense', 'Type1' and 'Type2' 
- Create a function that lowercases strings and apply it to 'Type1' and 'Type2' (*Extra: just capitalize the strings, i.e., leave the first letter uppercase and lowercase the rest*)
- Create a function that returns a Boolean value (don't be afraif by this, it is a function that returns either True or False) that tells if a Pokémon has high stamina (BaseStamina>170) or not. Store this information in a new column and show the list of Pokémon with high stamina
- Show the instructor the last 15 rows of your dataset

In [145]:
df = pd.read_csv('https://raw.githubusercontent.com/ChihChengLiang/pokemongor/master/data-raw/pokemons.csv', 
                 index_col=0)

df = df[['Identifier', 'BaseStamina', 'BaseAttack', 'BaseDefense', 'Type1', 'Type2']]

capitalize = lambda st: st.capitalize()

for col in ['Type1', 'Type2']:
    df[col] = df[col].apply(capitalize)
    
def highstamina(x):
    return True if x > 170 else False

df['HighStamina'] = df.BaseStamina.apply(highstamina)

print(df[df['HighStamina'] == True].Identifier)

df.tail(15)

PkMn
31      Nidoqueen
36       Clefable
39     Jigglypuff
40     Wigglytuff
59       Arcanine
62      Poliwrath
68        Machamp
          ...    
143       Snorlax
144      Articuno
145        Zapdos
146       Moltres
149     Dragonite
150        Mewtwo
151           Mew
Name: Identifier, Length: 26, dtype: object


Unnamed: 0_level_0,Identifier,BaseStamina,BaseAttack,BaseDefense,Type1,Type2,HighStamina
PkMn,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
137,Porygon,130,156,158,Normal,,False
138,Omanyte,70,132,160,Rock,Water,False
139,Omastar,140,180,202,Rock,Water,False
140,Kabuto,60,148,142,Rock,Water,False
141,Kabutops,120,190,190,Rock,Water,False
142,Aerodactyl,160,182,162,Rock,Flying,False
143,Snorlax,320,180,180,Normal,,True
144,Articuno,180,198,242,Ice,Flying,True
145,Zapdos,180,232,194,Electric,Flying,True
146,Moltres,180,242,194,Fire,Flying,True
