In [2]:
import numpy as np
import pandas as pd

## pandas data Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [3]:

my_index = ['USA', 'Canada', 'Mexico']
my_data  = [1776, 1867, 1890]

In [4]:
my_ser = pd.Series(data = my_data)
my_ser

0    1776
1    1867
2    1890
dtype: int64

In [5]:
my_ser = pd.Series(data = my_data, index = my_index)
my_ser

USA       1776
Canada    1867
Mexico    1890
dtype: int64

In [14]:
my_ser.iloc[0]

np.int64(1776)

In [7]:
my_ser['USA']

np.int64(1776)

In [13]:
py_dict = {1: 'Geeks', 2: 'For', 3: 'Geeks'}
pd.Series(py_dict)

1    Geeks
2      For
3    Geeks
dtype: object

In [16]:
ran_data = np.random.randint(0,100,4)
ran_data

array([70, 86, 82, 60])

In [17]:
names = ['Andrew','Bobo','Claire','David']
ages = pd.Series(ran_data,names)
ages

Andrew    70
Bobo      86
Claire    82
David     60
dtype: int64

# Key Ideas of a Series

## Named Index

In [19]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [20]:
# Convert into Pandas Series
sales_Q1 = pd.Series(q1)
sales_Q2 = pd.Series(q2)

In [21]:
# Call values based on Named Index
sales_Q1['Japan']

np.int64(80)

In [23]:
# Integer Based Location information also retained!
sales_Q1.iloc[0]

np.int64(80)

## Operations

In [24]:
# Grab just the index keys
sales_Q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

In [25]:
# Can Perform Operations Broadcasted across entire Series
sales_Q1 * 2

Japan    160
China    900
India    400
USA      500
dtype: int64

In [26]:
sales_Q1 * 10

Japan     800
China    4500
India    2000
USA      2500
dtype: int64

In [27]:
sales_Q2 / 100

Brazil    1.0
China     5.0
India     2.1
USA       2.6
dtype: float64

## Between Series

In [28]:
# Notice how Pandas informs you of mismatch with NaN
sales_Q1 + sales_Q2

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In [30]:
(sales_Q1 + sales_Q2).info()

<class 'pandas.core.series.Series'>
Index: 5 entries, Brazil to USA
Series name: None
Non-Null Count  Dtype  
--------------  -----  
3 non-null      float64
dtypes: float64(1)
memory usage: 80.0+ bytes


In [32]:
# You can fill these with any value you want
sales_Q1.add(sales_Q2,fill_value=0)

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64