# Series
A faster approach to list by Pandas

In [2]:
# Pandas is for Data Editing
# Numpy is for mathematics calculation

!pip3 install pandas
import pandas as pd

!pip3 install numpy
import numpy as np



In [3]:
animals = ['Tiger', 'Bear', 'Moose']
pd.Series(animals)

0    Tiger
1     Bear
2    Moose
dtype: object

In [4]:
numbers = [1,2,3]
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

## Type conversion for Series


In [5]:
# animals with None value
animals = ['Tiger', 'Bear', None]
pd.Series(animals)

0    Tiger
1     Bear
2     None
dtype: object

In [6]:
# numbers with None value
numbers = [1,2,None]
pd.Series(numbers)
# It chose floating point and set None value to NaN (not a number)


0    1.0
1    2.0
2    NaN
dtype: float64

to check if the nan value is

In [None]:
np.nan = np.nan # Returns False
np.isnan(np.nan) # Returns True


## Dictionary to Series

In [16]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo':'Japan',
          'Taekwondo':'South Korea'}
s = pd.Series(sports)
s


Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [17]:
s.index


Index(['Archery', 'Golf', 'Sumo', 'Taekwondo'], dtype='object')

### Explicitly use of index name


In [18]:
s1 = pd.Series(['Tiger', 'Golf', 'Taekwondo'],
               index=['India', 'America', 'Canada'])
s1

India          Tiger
America         Golf
Canada     Taekwondo
dtype: object

## Quering a Series
you can queries it by its index position or the index label

- to query with index position **starting with zero**, use `iloc()` attribute
- to query with index label, use `loc()` attribute

you should explicitly tell that you will use either index position **or** a label


In [22]:
s.iloc[3]

# or you can use
s[3]

'South Korea'

In [23]:
s.loc['Golf']

# or you can use
s['Golf']

'Scotland'

## Vectorization
We can use Numpy to calculate the Mathematics in the Series, including the sum function.
Because of the Vectorization does better calculations than traditional calculations from Python

### Calculate the sum with `np.sum()`

In [26]:
s = pd.Series([100,120,110,3])
total = np.sum(s)
total

333

### Show some of the values inside of the data frame with `.head()`

In [37]:
s = pd.Series(np.random.randint(0,1000,10000))
s.head()

0    588
1    866
2    419
3    458
4    214
dtype: int64

In [30]:
# Show how long is the Series with `len()`
len(s)

10000

## Vectorization Speed Test
Use `timeit` (Jupyter Magic Function) from the Jupyter to time how long does it take to compute

Jupyter Magic Function is indicated by a `%`. But we are using `%%` for Cellular Magic Function

As you can see down below, Vectorization is faster

### Without vectorization

In [42]:
%%timeit -n 100
total = 0
for i in s:
    total += i

1.67 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### With vectorization

In [44]:
%%timeit -n 100
total = np.sum(s)
total

195 µs ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Broadcasting
Dealing with all of the data in the series

In [45]:
s += 2
s.head()


0    590
1    868
2    421
3    460
4    216
dtype: int64