# Querying Pandas Data Series

In [5]:
import pandas as pd

animals = pd.Series(['Tiger', 'Lion', 'Zebra'], index=['Asia', 'Africa', 'North Korea'])

In [6]:
animals.iloc[1]

'Lion'

In [7]:
animals.Africa

'Lion'

In [8]:
animals[1]

'Lion'

In [9]:
stores = pd.Series(['Mac', 'Apple', 'Gucci'])

In [10]:
stores

0      Mac
1    Apple
2    Gucci
dtype: object

In [12]:
stores[1]

'Apple'

Trying to index on a number indexed list is not possible

In [32]:
stores = pd.Series({
    99:'Mac',
    100:'Apple',
    101:'Gucci'
})

In [33]:
stores[0]

KeyError: 0

We have to specifically index using the df.iloc[] notation

In [35]:
stores.iloc[1]

'Apple'

Applying a given function on all items in a series

In [37]:
# Numpy supports vectorization for highly efficient computations
import numpy as np

s = [1,2,3,4,5,6]
total = np.sum(s)
total

21

In [60]:
# Testing the speed of the implementations
s = pd.Series(np.random.randint(1,1000, 10000))

In [49]:
%%timeit
result = 0
for i in s:
    result += s
result

10 loops, best of 3: 80.9 ms per loop


In [50]:
%%timeit -n 100
s.sum()

100 loops, best of 3: 10.2 µs per loop


In [51]:
%%timeit -n 100
np.sum(s)

100 loops, best of 3: 12.6 µs per loop


In [52]:
%%timeit -n 100
s.sum()

100 loops, best of 3: 9.39 µs per loop


We see that the Series Implementation in Pandas is actually the fastest, and that the implementation of sum uses caching to further reduce the necessary computation time for already computed results

#### Increase values

In [70]:
%%timeit -n 10

s = pd.Series(np.random.randint(0,1000, 10000))
for label, value in s.iteritems():
    s.loc[label] = value+2


10 loops, best of 3: 1.23 s per loop


In [71]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000, 10000))
s += 2

10 loops, best of 3: 384 µs per loop


Assign Values

In [72]:
animals = pd.Series(range(1,10))

In [77]:
animals.loc[1] = 'Animal'
animals

0         1
1    Animal
2         3
3         4
4         5
5         6
6         7
7         8
8         9
dtype: object

### Appending existing Series

Enables to create new objects from the concatenation of existing series.

In [83]:
new = animals.append(s).head(15)
new

0         1
1    Animal
2         3
3         4
4         5
5         6
6         7
7         8
8         9
0       511
1       221
2       196
3       963
4        70
5       728
dtype: object

In [84]:
# This does not change objects in place
animals

0         1
1    Animal
2         3
3         4
4         5
5         6
6         7
7         8
8         9
dtype: object