# Series in pandas
A set of examples that exhibit some of the core features of the [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) data type in the `pandas` module.

## import

In [1]:
import numpy as np
import pandas as pd

## Create a Series

In [2]:
# simple series with automatic numeric indices
x = pd.Series([22, 44, 66, 88])
x

0    22
1    44
2    66
3    88
dtype: int64

In [3]:
# get value with numeric index
x[2]

66

In [51]:
# with a custom indices
y = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y

a    22
b    44
c    66
d    88
dtype: int64

In [54]:
# get value with custom index
y['c']

66

In [55]:
# from a Python dictionary
z = pd.Series({'a': 22, 'b': 44, 'c': 66, 'd': 88})
z

a    22
b    44
c    66
d    88
dtype: int64

In [7]:
# from a scalar
a = pd.Series(5, index=[0, 1, 2, 3, 4, 5])
a

0    5
1    5
2    5
3    5
4    5
5    5
dtype: int64

## Data types
Unlike `numpy` ndarrays, a single `pandas` Series can contain a variety of data types.

In [8]:
s = pd.Series( ['hello', 44, True, 3.14, [1, 2, 3] ] )
s

0        hello
1           44
2         True
3         3.14
4    [1, 2, 3]
dtype: object

Convert data types with `astype()`.

In [9]:
# convert to string
s = s.astype(str)
s

0        hello
1           44
2         True
3         3.14
4    [1, 2, 3]
dtype: object

## Lambda expressions

In [56]:
# run a function to transform each value in the series
s.apply(lambda x: 'hello ' + x)

0        hello hello
1           hello 44
2         hello True
3         hello 3.14
4    hello [1, 2, 3]
dtype: object

In [57]:
s

0        hello
1           44
2         True
3         3.14
4    [1, 2, 3]
dtype: object

## Naming Series
Series can be given a custom name.

In [11]:
# create a Series with a custom name
x = pd.Series([22, 44, 66, 88], name='non-anonymous Series')
x

0    22
1    44
2    66
3    88
Name: non-anonymous Series, dtype: int64

In [12]:
# get the name
x.name

'non-anonymous Series'

## Indexing
Accessing values from a `pandas` Series.

In [58]:
# an example Series, with String labels
x = pd.Series({'foo': 22, 'bar': 44, 'baz': 66, 'bum': 88})

In [59]:
x

foo    22
bar    44
baz    66
bum    88
dtype: int64

In [14]:
# access by label
x['bar']

44

In [15]:
# access by position (even with a custom-labeled Series)
x[1]

44

In [16]:
# access by integer index
x.iloc[1]

44

In [17]:
# access by label index
x.loc['bar']

44

In [60]:
# accessing a subset by positions
x[ [0, 1, 2] ]

foo    22
bar    44
baz    66
dtype: int64

In [19]:
# accessing a subset by integer indices
x.iloc[ [0, 1, 2] ]

foo    22
bar    44
baz    66
dtype: int64

In [20]:
# accessing a subset by label indices
x.loc[ ['foo', 'bar', 'baz'] ]

foo    22
bar    44
baz    66
dtype: int64

## Slicing
Unlike `numpy` ndarrays, slicing a `pandas` Series will also slice the index.

In [21]:
# slice an automatically-indexed Series
x = pd.Series([22, 44, 66, 88])
x[2 : ]

2    66
3    88
dtype: int64

In [22]:
# the same thing, using iloc
x.iloc[2 : ]

2    66
3    88
dtype: int64

In [23]:
# slice a custom-indexed Series
y = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y[2 : ]

c    66
d    88
dtype: int64

Slice syntax within the brackets, `[` and `]`, generally works the same way as regular Python list slices and `numpy` slices.

## Sorting

In [24]:
# unsorted
y

a    22
b    44
c    66
d    88
dtype: int64

In [25]:
# sorted by index
y.sort_index(ascending=False)

d    88
c    66
b    44
a    22
dtype: int64

In [26]:
# sorted by value
y.sort_values(ascending=True)

a    22
b    44
c    66
d    88
dtype: int64

## Introspection
Accessing some metadata about a Series

In [27]:
# the data type of the Series
x = pd.Series([22, 44, 66, 88])
x.dtype

dtype('int64')

In [28]:
# the shape of the Series... in this case a one-dimensional array with 4 values
y = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y.shape

(4,)

In [29]:
# get the name of a named series
x = pd.Series([22, 44, 66, 88], name="non-anonymous Series")
x.name

'non-anonymous Series'

## Simple math operations

In [61]:
# add a scalar to all values in a Series
x = pd.Series([22, 44, 66, 88])
y = x + 2
y

0    24
1    46
2    68
3    90
dtype: int64

In [31]:
# subtract a scalar from all values in a Series
x = pd.Series([22, 44, 66, 88])
x - 2

0    20
1    42
2    64
3    86
dtype: int64

In [32]:
# divid all values in a Series by a scalar
x = pd.Series([22, 44, 66, 88])
x / 11

0    2.0
1    4.0
2    6.0
3    8.0
dtype: float64

... and so on

In [33]:
x > 50

0    False
1    False
2     True
3     True
dtype: bool

In [34]:
x != 44

0     True
1    False
2     True
3     True
dtype: bool

In [35]:
# add two series together
x = pd.Series([22, 44, 66, 88])
x + x

0     44
1     88
2    132
3    176
dtype: int64

In [36]:
# add two series together
x = pd.Series([22, 44, 66, 88], index=['a', 'b', 'c', 'd'])
y = pd.Series([1, 2, 3, 4], index=['d', 'c', 'a', 'b'])
x + y

a    25
b    48
c    68
d    89
dtype: int64

## Math operations and the alignment of labels
Unlike `numpy` ndarrays, operations on Series automatically align by labels.

In [37]:
# for example, take two Series with the same set of labels, but in different orders
a = pd.Series({'foo': 22, 'bar': 44, 'baz': 66, 'bum': 88})
b = pd.Series({'bum': 1, 'baz': 2, 'bar': 3, 'foo': 4, })

In [38]:
# math operations will be performed on values that share the same label
a + b

bar    47
baz    68
bum    89
foo    26
dtype: int64

Besides this difference, all the basic math operations (+, -, *, /) between two Series work the same way as in `numpy` ndarrays.

## Heads and tails
When dealing with large amounts of data, it's sometimes useful to see a sample of the data, without viewing the entire data set.  The `head()`, `tail()`, and `sample()` functions can help with this.

In [62]:
# first, let's generate a large Series

import numpy as np # import numpy for convience generating a lot of sample data

# make a really big Series from a random numpy ndarray
x = pd.Series( np.random.random(5000) ) 

In [40]:
# get the default of what's in x
x

0       0.053586
1       0.623813
2       0.085380
3       0.818867
4       0.093050
          ...   
4995    0.267786
4996    0.938746
4997    0.301619
4998    0.193766
4999    0.185180
Length: 5000, dtype: float64

In [64]:
# get the head... the first few values
x.head(5)

0    0.071964
1    0.684159
2    0.678309
3    0.638631
4    0.551681
dtype: float64

In [42]:
# get the tail... the last few values
x.tail()

4995    0.267786
4996    0.938746
4997    0.301619
4998    0.193766
4999    0.185180
dtype: float64

In [43]:
# get a sample of a few random values
x.sample(5)

3319    0.337011
1945    0.383846
3598    0.724559
1708    0.172642
3350    0.913176
dtype: float64

## Basic statistics
Basic statistical functions, like `mean()`, `median()`, `min()`, `max()`, and `std()` work just like their `numpy` equivalents.

In [44]:
# make a linearly-spaced series of 50 values from 1 to 100
x = pd.Series( np.linspace(1, 100, 50) ) 
x.head()

0    1.000000
1    3.020408
2    5.040816
3    7.061224
4    9.081633
dtype: float64

In [45]:
# get an overview of most common stats
x.describe()

count     50.000000
mean      50.500000
std       29.452257
min        1.000000
25%       25.750000
50%       50.500000
75%       75.250000
max      100.000000
dtype: float64

In [46]:
# calculate the mean value of the entire Series
x.mean()

50.5

In [47]:
# calculate the mean value for only those values in the Series that are greater than 50
x[ x < 5 ].mean()

2.010204081632653

The other statistics functions - `min()`, `max()`, `median()`, `std()`, `count()` - work similarly.

In [48]:
x

0       1.000000
1       3.020408
2       5.040816
3       7.061224
4       9.081633
5      11.102041
6      13.122449
7      15.142857
8      17.163265
9      19.183673
10     21.204082
11     23.224490
12     25.244898
13     27.265306
14     29.285714
15     31.306122
16     33.326531
17     35.346939
18     37.367347
19     39.387755
20     41.408163
21     43.428571
22     45.448980
23     47.469388
24     49.489796
25     51.510204
26     53.530612
27     55.551020
28     57.571429
29     59.591837
30     61.612245
31     63.632653
32     65.653061
33     67.673469
34     69.693878
35     71.714286
36     73.734694
37     75.755102
38     77.775510
39     79.795918
40     81.816327
41     83.836735
42     85.857143
43     87.877551
44     89.897959
45     91.918367
46     93.938776
47     95.959184
48     97.979592
49    100.000000
dtype: float64