<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

## Imports

In [1]:
import numpy as np

In [3]:
import pandas as pd

## Creating a Series from Python Objects

In [4]:
help(pd.Series)

Help on class Series in module pandas.core.series:

class Series(pandas.core.base.IndexOpsMixin, pandas.core.generic.NDFrame)
 |  Series(data=None, index=None, dtype: 'Dtype | None' = None, name=None, copy: 'bool' = False, fastpath: 'bool' = False)
 |  
 |  One-dimensional ndarray with axis labels (including time series).
 |  
 |  Labels need not be unique but must be a hashable type. The object
 |  supports both integer- and label-based indexing and provides a host of
 |  methods for performing operations involving the index. Statistical
 |  methods from ndarray have been overridden to automatically exclude
 |  missing data (currently represented as NaN).
 |  
 |  Operations between Series (+, -, /, *, **) align values based on their
 |  associated index values-- they need not be the same length. The result
 |  index will be the sorted union of the two indexes.
 |  
 |  Parameters
 |  ----------
 |  data : array-like, Iterable, dict, or scalar value
 |      Contains data stored in Ser

### Index and Data Lists

We can create a Series from Python lists (also from NumPy arrays)

In [5]:
myindex = ['USA', 'Canada', 'Mexico']

In [6]:
mydata = [1776, 1867, 1821]

In [7]:
myser = pd.Series(data=mydata)

In [10]:
myser #by default, pandas put numerics index

0    1776
1    1867
2    1821
dtype: int64

In [9]:
type(myser)

pandas.core.series.Series

In [12]:
# but we know, pandas allows us to use Index values . So we gonna use it while creating new series object
myser = pd.Series(data=mydata, index=myindex)

In [13]:
myser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [14]:
myser[0] #using default index to access info

1776

In [15]:
myser['USA'] #using our index to access info

1776

#### using random number to creat series

In [19]:
rand_data = np.random.randint(0, 100, 4)

In [20]:
rand_data

array([86, 75, 85, 76])

In [21]:
names = ['Andrew', 'Bobo', 'Claire', 'David']

In [23]:
ages = pd.Series(data=rand_data, index=names)

In [24]:
ages

Andrew    86
Bobo      75
Claire    85
David     76
dtype: int32

### From a  Dictionary

In [25]:
# we can directly change Dictionary values into Pandas Series

In [26]:
ages = {'Sam': 5, 'Frank': 10, 'Spike': 7}

In [28]:
ages

{'Sam': 5, 'Frank': 10, 'Spike': 7}

In [27]:
pd.Series(ages)

Sam       5
Frank    10
Spike     7
dtype: int64

# Key Ideas of a Series

## Named Index

In [29]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [39]:
# Convert into Pandas Series
sales_Q1 = pd.Series(q1)
sales_Q2 = pd.Series(q2)

In [40]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [41]:
sales_Q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [46]:
# Call values based on Named Index
sales_Q1['Japan']

80

In [47]:
# Integer Based Location information also retained!
sales_Q1[0]

80

#### Getting values of Keys

In [45]:
# Grab just the index keys
sales_Q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

**Be careful with potential errors!**

In [37]:
# Wrong Name
# sales_Q1['France']

In [38]:
# Accidental Extra Space
# sales_Q1['USA ']

In [39]:
# Capitalization Mistake
# sales_Q1['usa']

## Operations

Here we want to muliple the list value by 2.
However this will just duplicate the list.

In [48]:
[1, 2] * 2

[1, 2, 1, 2]

We know that if we use numpy, we can `Broadcast` as we wanted

In [49]:
np.array([1,2]) * 2

array([2, 4])

Lucky for us is "Pandas series are buit on top of numpy array". So we can easily achieve what we want.

In [50]:
# Can Perform Operations Broadcasted across entire Series
sales_Q1 * 2

Japan    160
China    900
India    400
USA      500
dtype: int64

In [51]:
sales_Q1 / 100

Japan    0.8
China    4.5
India    2.0
USA      2.5
dtype: float64

## Between Series

If we want to know the total of two Quarters, we can do like below.

Pandas is smart enough to sum up for Index Values which are both presented in Series.

However it is NOT smart enough for values if it is ONLY presented in either of the series.

In [52]:
# Notice how Pandas informs you of mismatch with NaN
sales_Q1 + sales_Q2

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

See `Brazil` and `Japan` are reported as `NaN` because by default, pandas doesn't know what to do. Because those values are not in Both Series.

However as a user, we can put **some default values** if one of those values are missing from Series.

In [54]:
# You can fill these with any value you want
sales_Q1.add(sales_Q2, fill_value=0)

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64

Notice that After doing some computation, pandas change those into Float.

In [58]:
sales_Q1.dtype

dtype('int64')

In [60]:
first_half = sales_Q1.add(sales_Q2, fill_value=0)
first_half.dtype

dtype('float64')

That is all we need to know about Series, up next, DataFrames!