# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have **axis labels**, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
!pip install numpy

Collecting numpy
  Downloading numpy-1.23.4-cp310-cp310-win_amd64.whl (14.6 MB)
     --------------------------------------- 14.6/14.6 MB 29.7 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-1.23.4



[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
!pip install pandas

Collecting pandas
  Downloading pandas-1.5.1-cp310-cp310-win_amd64.whl (10.4 MB)
     --------------------------------------- 10.4/10.4 MB 15.2 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2022.5-py2.py3-none-any.whl (500 kB)
     ------------------------------------- 500.7/500.7 kB 32.7 MB/s eta 0:00:00
Installing collected packages: pytz, pandas
Successfully installed pandas-1.5.1 pytz-2022.5



[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
import numpy as np
import pandas as pd

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [7]:
labels = ['a', 'b', 'c']
number_list = [10, 15, 20]
a = np.array(number_list)
d = {'a':10, 'b':15, 'c':25}

In [9]:
d['a']

10

** Using Lists**

In [10]:
pd.Series(data=number_list)

0    10
1    15
2    20
dtype: int64

In [11]:
pd.Series(data=number_list, index=labels)

a    10
b    15
c    20
dtype: int64

In [12]:
pd.Series(data=['hamburger','french fries','nugget'], index=['AA','BB','CC'])

AA       hamburger
BB    french fries
CC          nugget
dtype: object

** NumPy Arrays **

In [15]:
pd.Series(a, labels)

a    10
b    15
c    20
dtype: int32

** Dictionary**

In [16]:
pd.Series(d)

a    10
b    15
c    25
dtype: int64

### Data in a Series

A pandas Series can hold a variety of object types:

In [17]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [18]:
# Even functions (although unlikely that you will use this)
pd.Series([sum, print, len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two series, ser1 and ser2:

In [19]:
ser1 = pd.Series([1,2,3,4], index=['Bangkok', 'Korat', 'Phuket', 'Chiang Mai'])              

In [20]:
ser1

Bangkok       1
Korat         2
Phuket        3
Chiang Mai    4
dtype: int64

In [22]:
ser1['Chiang Mai']          

4

In [23]:
ser2 = pd.Series([10,11,12,13], index=['Bangkok','Korat','Teparuk', 'Hua Hin'])

In [24]:
ser2

Bangkok    10
Korat      11
Teparuk    12
Hua Hin    13
dtype: int64

Operations are then also done based off of index:

In [25]:
ser1 + ser2

Bangkok       11.0
Chiang Mai     NaN
Hua Hin        NaN
Korat         13.0
Phuket         NaN
Teparuk        NaN
dtype: float64

# Group of 7

"[The Group of Seven](https://en.wikipedia.org/wiki/Group_of_Seven)" is a political formed by Canada, France, Germany, Italy, Japan, the United Kingdom and the United States.

In [26]:
# In millions
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])

In [27]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

Series can have a `name`, to better document the purpose of the Series:

In [28]:
g7_pop.name = 'G7 Population in millions'

In [29]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

Series are pretty similar to numpy arrays:

In [30]:
g7_pop.dtype

dtype('float64')

In [31]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  60.665, 127.061,  64.511, 318.523])

In [32]:
g7_pop.index

RangeIndex(start=0, stop=7, step=1)

A Series has an index, that's similar to the automatic index assigned to Python's lists:

In [33]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [35]:
g7_pop[5]

64.511

In [36]:
g7_pop.index

RangeIndex(start=0, stop=7, step=1)

Explicitly define the index

In [37]:
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

In [40]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Create Series out of dictionaries

In [None]:
pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, name='G7 Population in millions')

In [None]:
pd.Series(
    [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523],
    index=['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom',
       'United States'],
    name='G7 Population in millions')

Numeric positions can also be used, with the `iloc` attribute:

In [42]:
g7_pop['Canada']

35.467

In [45]:
g7_pop.iloc[-1]

318.523

In [46]:
g7_pop[['Italy', 'France']]

Italy     60.665
France    63.951
Name: G7 Population in millions, dtype: float64

In [47]:
g7_pop.iloc[[1,3]]

France    63.951
Italy     60.665
Name: G7 Population in millions, dtype: float64

In [48]:
g7_pop['Germany':'Japan']

Germany     80.940
Italy       60.665
Japan      127.061
Name: G7 Population in millions, dtype: float64

In [49]:
g7_pop.iloc[1:5]

France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
Name: G7 Population in millions, dtype: float64

## Conditional selection (boolean arrays)

In [51]:
g7_pop > 70

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [52]:
g7_pop[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [53]:
g7_pop.mean()

107.30257142857144

In [56]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.min()) & (g7_pop > g7_pop.mean() + g7_pop.std())]

United States    318.523
Name: G7 Population in millions, dtype: float64

| or -> False if both left & right are False, else is True

& and -> True if both left & right are True, else is False

In [61]:
g7_pop[g7_pop > 80]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [62]:
g7_pop[g7_pop < 40]

Canada    35.467
Name: G7 Population in millions, dtype: float64

In [65]:
g7_pop[g7_pop < 200]

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
Name: G7 Population in millions, dtype: float64

In [64]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]

Germany     80.940
Japan      127.061
Name: G7 Population in millions, dtype: float64

## Operations and methods

In [59]:
g7_pop.sum()

751.118

In [58]:
(g7_pop/g7_pop.sum()) * 100

Canada             4.721895
France             8.514108
Germany           10.775937
Italy              8.076627
Japan             16.916250
United Kingdom     8.588664
United States     42.406519
Name: G7 Population in millions, dtype: float64

## Modifying series

In [66]:
g7_pop['Canada'] = 99

In [67]:
g7_pop

Canada             99.000
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [68]:
g7_pop.iloc[-1] = 600

In [69]:
g7_pop

Canada             99.000
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     600.000
Name: G7 Population in millions, dtype: float64

In [70]:
g7_pop[g7_pop < 70] = 80

In [71]:
g7_pop

Canada             99.000
France             80.000
Germany            80.940
Italy              80.000
Japan             127.061
United Kingdom     80.000
United States     600.000
Name: G7 Population in millions, dtype: float64