In [None]:
pip install pandas

# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

## Imports

In [1]:
import numpy as np
import pandas as pd

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Creating a Series from Python Objects

In [None]:
help(pd.Series)

### Index and Data Lists

We can create a Series from Python lists (also from NumPy arrays)

In [2]:
myindex = ['USA','Canada','Mexico']

In [3]:
mydata = [1776,1867,1821]

In [4]:
myser = pd.Series(data=mydata)

In [5]:
myser

0    1776
1    1867
2    1821
dtype: int64

In [6]:
pd.Series(data=mydata,index=myindex)

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [7]:
ran_data = np.random.randint(0,100,4)

In [8]:
ran_data

array([78, 42, 14,  8])

In [9]:
names = ['Andrew','Bobo','Claire','David']

In [10]:
ages = pd.Series(ran_data,names)

In [11]:
ages

Andrew    78
Bobo      42
Claire    14
David      8
dtype: int64

### From a  Dictionary

In [12]:
ages = {'Sammy':5,'Frank':10,'Spike':7}

In [13]:
ages

{'Sammy': 5, 'Frank': 10, 'Spike': 7}

In [14]:
pd.Series(ages)

Sammy     5
Frank    10
Spike     7
dtype: int64

# Key Ideas of a Series

## Named Index

In [15]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [16]:
# Convert into Pandas Series
sales_Q1 = pd.Series(q1)
sales_Q2 = pd.Series(q2)

In [17]:
sales_Q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [18]:
# Call values based on Named Index
sales_Q1['Japan']

80

In [19]:
# Integer Based Location information also retained!
sales_Q1[0]

  sales_Q1[0]


80

**Be careful with potential errors!**

In [20]:
# Wrong Name
# sales_Q1['France']

In [21]:
# Accidental Extra Space
# sales_Q1['USA ']

In [22]:
# Capitalization Mistake
# sales_Q1['usa']

## Operations

In [23]:
# Grab just the index keys
sales_Q1.keys()

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

In [24]:
# Can Perform Operations Broadcasted across entire Series
sales_Q1 * 2

Japan    160
China    900
India    400
USA      500
dtype: int64

In [25]:
sales_Q2 / 100

Brazil    1.0
China     5.0
India     2.1
USA       2.6
dtype: float64

## Between Series

In [26]:
# Notice how Pandas informs you of mismatch with NaN
sales_Q1 + sales_Q2

  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In [27]:
# You can fill these with any value you want
sales_Q1.add(sales_Q2,fill_value=0)

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64

That is all we need to know about Series, up next, DataFrames!