In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.Series()

Series([], dtype: object)

In [3]:
ice_cream_flavors = [
    "Chocolate",
    "Vanilla",
    "Strawberry",
    "Rum Raisin"
]

In [4]:
ice_cream_flavors

['Chocolate', 'Vanilla', 'Strawberry', 'Rum Raisin']

In [6]:
pd.Series(data=ice_cream_flavors)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

## index position / index label

In addition to an index position, we can assign each Series value an index label. Index labels can be of any immutable data type: strings, tuples, datetimes, and more. This flexibility makes a Series powerful: we can reference a value by its order or by a key/label. In a sense, each value has two identifiers.

If we do not pass an argument to the parameter, pandas defaults to a numeric index starting from 0. With this type of index, the label and the position identifiers are one and the same.

We can pass objects of different data types to the data and index parameters, but they must have the same length so that pandas can associate their values.

In [7]:
days_of_week = ("Monday", "Tuesday", "Wednesday", "Thursday")

In [9]:
cho_ser = pd.Series(
    data=ice_cream_flavors,
    index=days_of_week,
)

In [10]:
cho_ser

Monday        Chocolate
Tuesday         Vanilla
Wednesday    Strawberry
Thursday     Rum Raisin
dtype: object

## Point

Even though the index consists of string labels, pandas still assigns each Series value an index position. In other words, we can access the value "Vanilla" either by the index label "Tuesday" or by index position 1.

In [12]:
cho_ser["Tuesday"]

'Vanilla'

In [14]:
cho_ser.iloc[1]

'Vanilla'

In [15]:
cho_ser.loc["Tuesday"]

'Vanilla'

## Duplicate index

The index permits duplicates, a detail that distinguishes a Series from a Python dictionary.

Although pandas permits duplicates, it is ideal to avoid them whenever possible, because a unique index allows the library to locate index labels more quickly.

In [16]:
pd.Series(
    data=ice_cream_flavors,
    index=("Monday", "Tuesday", "Wednesday", "Tuesday"),
)

Monday        Chocolate
Tuesday         Vanilla
Wednesday    Strawberry
Tuesday      Rum Raisin
dtype: object

For most data types, pandas will display a predictable type (such as bool, float, or int). For strings and more-complex objects (such as nested data structures), pandas will show dtype: object.

In [17]:
pd.Series([True, False, False])

0     True
1    False
2    False
dtype: bool

In [18]:
pd.Series(
    [985.32, 950.44],
    index=("Open", "Close"),
)

Open     985.32
Close    950.44
dtype: float64

In [19]:
pd.Series([4, 8, 16, 32])

0     4
1     8
2    16
3    32
dtype: int64

In [21]:
pd.Series(
    [4, 8, 16, 32],
    dtype="float",
)

0     4.0
1     8.0
2    16.0
3    32.0
dtype: float64

## Missing value

When pandas sees a missing value during a file import, the library substitutes NumPyâ€™s nan object. The acronym nan is short for not a number and is a catch-all term for an undefined value. In other words, nan is a placeholder object that represents nullness or absence.

In [22]:
temperatures = [94, 88, np.nan, 91]

pd.Series(data=temperatures)

0    94.0
1    88.0
2     NaN
3    91.0
dtype: float64

Notice that the Series dtype is float64. Pandas automatically converts numeric values from integers to floating-points when it spots a nan value; this internal techni-cal requirement allows the library to store numeric values and missing values in the same homogeneous Series.

In [23]:
calorie_info = {
    "Cereal": 125,
    "Chocolate Bar": 406,
    "Ice Cream Sundae": 342,
}

diet = pd.Series(data=calorie_info)
diet

Cereal              125
Chocolate Bar       406
Ice Cream Sundae    342
dtype: int64

In [24]:
random_data = np.random.randint(1, 101, 10)

random_data

array([32, 92, 55, 11, 10, 24, 99, 61, 77, 96])

In [25]:
pd.Series(random_data)

0    32
1    92
2    55
3    11
4    10
5    24
6    99
7    61
8    77
9    96
dtype: int64

## Series Attribute

1. values
2. dtype
3. index
4. shape
5. size
6. is_unique
7. is_monotonic_increasing
8. is_monotonic_decreasing

In [26]:
diet

Cereal              125
Chocolate Bar       406
Ice Cream Sundae    342
dtype: int64

In [27]:
diet.values

array([125, 406, 342])

In [28]:
diet.dtype

dtype('int64')

In [29]:
type(diet.values)

numpy.ndarray

In [30]:
diet.index

Index(['Cereal', 'Chocolate Bar', 'Ice Cream Sundae'], dtype='object')

In [31]:
type(diet.index)

pandas.core.indexes.base.Index

In [32]:
diet.shape

(3,)

In [33]:
diet.size

3

In [34]:
diet.is_unique

True

In [35]:
pd.Series(data=[1, 2, 3, 2]).is_unique

False

In [36]:
pd.Series(data=[1, 2, 5, 10]).is_monotonic_increasing

True

In [37]:
pd.Series(data=[1, 2, 5, 10]).is_monotonic_decreasing

False

In [38]:
pd.Series(data=[1, 9, 5, 10]).is_monotonic_increasing

False

In [39]:
pd.Series(data=[1, 9, 5, 10]).is_monotonic_decreasing

False

## Series Methods

1. head
2. tail
3. sum
4. product
5. cumsum
6. pct_change
7. mean
8. median
9. max
10. min
11. describe


In [40]:
nums_5 = pd.Series(data=range(0, 500, 5))

nums_5

0       0
1       5
2      10
3      15
4      20
     ... 
95    475
96    480
97    485
98    490
99    495
Length: 100, dtype: int64

In [43]:
nums_5.head(n=3)

0     0
1     5
2    10
dtype: int64

In [44]:
nums_5.tail(4)

96    480
97    485
98    490
99    495
dtype: int64

In [45]:
nums_5.head()

0     0
1     5
2    10
3    15
4    20
dtype: int64

In [47]:
numbers = pd.Series([1, 2, 3, np.nan, 4, 5])

numbers

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    5.0
dtype: float64

In [48]:
numbers.count()

np.int64(5)

### Caution:

Most mathematical methods ignore missing values by default. We can pass an argument of False to the skipna parameter to force the inclusion of missing values.

In [51]:
numbers.sum(skipna=False)

np.float64(nan)

In [52]:
numbers.sum(skipna=True)

np.float64(15.0)

In [60]:
numbers.sum(min_count=1)

np.float64(15.0)

In [58]:
numbers.sum(min_count=6)

np.float64(nan)

In [59]:
numbers.sum(min_count=7)

np.float64(nan)

In [61]:
numbers.product()

np.float64(120.0)

In [62]:
numbers.product(skipna=False)

np.float64(nan)

In [63]:
numbers.product(min_count=3)

np.float64(120.0)

In [65]:
numbers.cumsum(skipna=True)

0     1.0
1     3.0
2     6.0
3     NaN
4    10.0
5    15.0
dtype: float64

In [66]:
numbers.cumsum(skipna=False)

0    1.0
1    3.0
2    6.0
3    NaN
4    NaN
5    NaN
dtype: float64

In [67]:
numbers.pct_change()

  numbers.pct_change()


0         NaN
1    1.000000
2    0.500000
3    0.000000
4    0.333333
5    0.250000
dtype: float64

In [68]:
numbers.mean(), numbers.median(), numbers.std(), numbers.max(), numbers.min()

(np.float64(3.0),
 np.float64(3.0),
 np.float64(1.5811388300841898),
 np.float64(5.0),
 np.float64(1.0))

In [69]:
numbers.describe()

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64

### Caution

The sample method selects a random assortment of values from the Series. It is possible for the order of values to differ between the new Series and the original Series. In the next example, notice that the lack of NaN values from the random selection allows pandas to return a Series of integers. If NaN was even one of the values, pandas would return a Series of floats instead.

In [70]:
numbers.sample(2)

5    5.0
0    1.0
dtype: float64

In [72]:
# if NaN, float instead
numbers.sample(3)  

1    2.0
2    3.0
0    1.0
dtype: float64

In [73]:
authors = pd.Series(
    ["Hemingway", "Orwell", "Dostoevsky", "Fitzgerald", "Orwell"]
)

authors.unique()

array(['Hemingway', 'Orwell', 'Dostoevsky', 'Fitzgerald'], dtype=object)

In [76]:
authors.nunique()

4

In [77]:
authors.nunique() == len(authors.unique())

True

## Arithmetic operations

In [80]:
s1 = pd.Series(data=[5, np.nan, 15], index=tuple("ABC"))

In [81]:
s1

A     5.0
B     NaN
C    15.0
dtype: float64

In [82]:
s1 + 3

A     8.0
B     NaN
C    18.0
dtype: float64

In [84]:
s1.add(3)

A     8.0
B     NaN
C    18.0
dtype: float64

In [83]:
s1 * 2

A    10.0
B     NaN
C    30.0
dtype: float64

In [86]:
# the original s1 doesn't be changed!
s1

A     5.0
B     NaN
C    15.0
dtype: float64

In [87]:
s1 - 5

A     0.0
B     NaN
C    10.0
dtype: float64

In [88]:
s1.sub(5)

A     0.0
B     NaN
C    10.0
dtype: float64

In [89]:
s1 * 2

A    10.0
B     NaN
C    30.0
dtype: float64

In [90]:
s1.mul(2)

A    10.0
B     NaN
C    30.0
dtype: float64

In [97]:
s1.multiply(2)

A    10.0
B     NaN
C    30.0
dtype: float64

In [92]:
s1 % 3

A    2.0
B    NaN
C    0.0
dtype: float64

In [98]:
# When we use the + operator with the two Series as operands, pandas adds the values at the same index positions
s2 = pd.Series([1, 2, 3], index=list("ABC"))

s3 = pd.Series([4, 5, 6], index=list("CBA"))

s2 + s3

A    7
B    7
C    7
dtype: int64

In [99]:
# Note that pandas considers a nan value to be unequal to another nan; 
# it cannot assume that an absent value is equal to another absent value.
s4 = pd.Series([3, 6, np.nan, 12])

s5 = pd.Series([3, 6, np.nan, 12])

s4.eq(s5)

0     True
1     True
2    False
3     True
dtype: bool

In [100]:
s4 == s5

0     True
1     True
2    False
3     True
dtype: bool

In [101]:
s6 = pd.Series(data = [5, 10, 15], index = ["A", "B", "C"])
s7 = pd.Series(data = [4, 8, 12, 14], index = ["B", "C", "D", "E"])

s6 + s7

A     NaN
B    14.0
C    23.0
D     NaN
E     NaN
dtype: float64

## Python's built-in

In [102]:
cities = pd.Series(data = ["San Francisco", "Los Angeles", "Las Vegas", np.nan])

In [104]:
# The len function returns the number of rows in a Series. The count includes missing values (NaNs)
len(cities), cities.count()

(4, np.int64(3))

In [105]:
type(cities)

pandas.core.series.Series

In [106]:
dir(cities)

['T',
 '_AXIS_LEN',
 '_AXIS_ORDERS',
 '_AXIS_TO_AXIS_NUMBER',
 '_HANDLED_TYPES',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_ufunc__',
 '__bool__',
 '__class__',
 '__column_consortium_standard__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__firstlineno__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pandas_priority__',
 '__pos__',
 '__pow__',
 '_

In [107]:
list(cities)

['San Francisco', 'Los Angeles', 'Las Vegas', nan]

In [108]:
dict(cities)

{0: 'San Francisco', 1: 'Los Angeles', 2: 'Las Vegas', 3: nan}

In [109]:
cities

0    San Francisco
1      Los Angeles
2        Las Vegas
3              NaN
dtype: object

In [110]:
"Las Vegas" in cities

False

In [111]:
cities.values

array(['San Francisco', 'Los Angeles', 'Las Vegas', nan], dtype=object)

In [112]:
"Las Vegas" in cities.values

True

In [116]:
2 in cities, 2 in cities.index

(True, True)

In [117]:
100 not in cities, "Pairs" not in cities.values

(True, True)

## Coding Challenge

In [119]:
superheroes = [
    "Batman",
    "Superman",
    "Spider-Man",
    "Iron Man",
    "Captain America",
    "Wonder Woman",
]
strength_levels = (100, 120, 90, 95, 110, 120)

In [121]:
h1 = pd.Series(data=superheroes)

l1 = pd.Series(data=strength_levels)

In [122]:
heros = pd.Series(data=strength_levels, index=superheroes)

In [123]:
heros

Batman             100
Superman           120
Spider-Man          90
Iron Man            95
Captain America    110
Wonder Woman       120
dtype: int64

In [124]:
heros.head(n=2)

Batman      100
Superman    120
dtype: int64

In [125]:
heros.tail(4)

Spider-Man          90
Iron Man            95
Captain America    110
Wonder Woman       120
dtype: int64

In [127]:
heros.nunique(), len(heros.unique())

(5, 5)

In [128]:
heros.mean()

np.float64(105.83333333333333)

In [129]:
heros.max(), heros.min()

(np.int64(120), np.int64(90))

In [130]:
heros * 2

Batman             200
Superman           240
Spider-Man         180
Iron Man           190
Captain America    220
Wonder Woman       240
dtype: int64

In [131]:
dict(heros)

{'Batman': np.int64(100),
 'Superman': np.int64(120),
 'Spider-Man': np.int64(90),
 'Iron Man': np.int64(95),
 'Captain America': np.int64(110),
 'Wonder Woman': np.int64(120)}