# Series Introduction

### A Series is used to model one-dimensional data.
 - It has a singel `axis` - the index

In [1]:
series = {
    'index': [0,1,2,3],
    'data': [145,142,38,13],
    'name': 'songs'
}

In [4]:
series

{'index': [0, 1, 2, 3], 'data': [145, 142, 38, 13], 'name': 'songs'}

`get` Function to extract the data

In [5]:
def get(series, idx):
    value_idx = series['index'].index(idx)
    return series['data'][value_idx]

In [6]:
get(series,1)

142

### 1. The index abstraction

Many of the operation performed on a Series operate directly on the index or by index lookup

In [8]:
songs = {
    'index': ['Paul','John','George','Ringo'],
    'data': [145,142,38,13],
    'name': 'counts'
}

In [9]:
get(songs,'John')

142

### 2 The Pandas Series

Create a Series object from List

In [10]:
import pandas as pd
songs2 = pd.Series([145, 142, 38, 13],
                   name = 'counts')

In [11]:
songs2

0    145
1    142
2     38
3     13
Name: counts, dtype: int64

- values of the index - 0,1,2,3 are called axis labels
- The data - 145, 142, 38 and 13 is also called the values of the series

- Values of a Series can hold strings, floats, booleans or arbitrary python objects

In [12]:
songs2.index

RangeIndex(start=0, stop=4, step=1)

Note:
- The default values for an index are monotonically increasing integers.

-> We can insert Python objects into a series

In [13]:
class Foo:
    pass

ringo = pd.Series(
    ['Richard', 'Starkey', 13, Foo()],
    name ='ringo'
)

In [14]:
ringo

0                                 Richard
1                                 Starkey
2                                      13
3    <__main__.Foo object at 0x10be11d30>
Name: ringo, dtype: object

### 3. The NaN value

- When pandas determines that a series holds numeric values but cannot find a number to represent an entry, it will use `NaN`.
- This value stands for `Not A Number` and is usually ignored in arithmetic operations. (Similar to `NULL` in SQL)

In [15]:
import numpy as np

In [16]:
nan_series = pd.Series([2, np.nan],
                       index = ['Ono', 'Clapton'])

In [17]:
nan_series

Ono        2.0
Clapton    NaN
dtype: float64

- `float64` support `NaN`
- `int64` does not support `NaN`

- The `.count` method, which counts the number of values in a series, disregards NaN

In [18]:
nan_series.count()

np.int64(1)

In [19]:
nan_series.size

2

### 4. Optional Integer Support for `NaN`

- The `int64` type does not support missing data.

-> When we create a series, we can pass in dtype= `Int64`

In [20]:
nan_series2 = pd.Series([2, np.nan],
                       index = ['Ono', 'Clapton'],
                       dtype='Int64')

In [21]:
nan_series2

Ono           2
Clapton    <NA>
dtype: Int64

- Operation on these series still ignore `NaN` or `<NA>`.

In [22]:
nan_series2.count()

np.int64(1)

### 5. Similar to NumPy

The Series object behaves similarly to a NumPy array

In [23]:
numpy_ser = np.array([145,142,38,13])

In [24]:
songs2[1]

np.int64(142)

In [25]:
numpy_ser[1]

np.int64(142)

In [26]:
numpy_ser.mean()

np.float64(84.5)

In [27]:
songs2.mean()

np.float64(84.5)

- They both have methods in common.
- They also both have a notion of a boolean array

### 6. Categorical Data

- When we load data, we indicate that the data is categorical.
- Categorical calues have few benefits:
    - Use less memory than strings
    - Imporve performance
    - Can have an ordering
    - Can perform operations on categories
    - Enforce membership on values

-> To create a category, we pass `dtype="category"` into the Series constructor

In [29]:
s = pd.Series(['m', 'l', 'xs', 's', 'xl'], dtype='category')

In [30]:
s

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['l', 'm', 's', 'xl', 'xs']

In [31]:
s.cat.ordered

False

To convert a non-categorical series to an ordered category, we can create a type with the `CategoricalDtype` constructor and the appropriate parameters.

In [32]:
s2 = pd.Series(['m', 'l', 'xs', 's', 'xl'])
size_type = pd.api.types.CategoricalDtype(
    categories= ['s','m','l'], ordered=True
)

In [33]:
s3 = s2.astype(size_type)

In [34]:
s3

0      m
1      l
2    NaN
3      s
4    NaN
dtype: category
Categories (3, object): ['s' < 'm' < 'l']