# Series

Pandas series is the building block of data frame, and it is one of the most basic and important data type defined in pandas. The following pandas series operations are demonstrated in this notebook:
- Create series from Python list, numpy array and Python dictionary
- Check built-in attruibutes such as length, memory usage of a series
- Operate series using built-in methods such as `.count()` to count the number of elements
- Using aggregating functions to get a summary sheet of the series
- Sorting
- Series duplication
- Access items in a series
- Define element-wise operations using `.apply()` and `.map()`
- Convert data types of series using `.astype()`

In [1]:
import numpy as np
import pandas as pd

## Create Series from Scratch

It has been introduced earlier how to obtain a series from a data frame column. This section introduces creating series from scratch. The data source used in this section include Python array, numpy array, and Python dictionary. Use `pandas.Series()` to create a series as follows.


### From Python Array

In [2]:
series_example = pd.Series(["a", "b", "c", "def"])
series_example

0      a
1      b
2      c
3    def
dtype: object

In [3]:
series_example = pd.Series([123, 456, 789])
series_example

0    123
1    456
2    789
dtype: int64

In [4]:
series_example = pd.Series([True, False, True, False])
series_example

0     True
1    False
2     True
3    False
dtype: bool

In [5]:
series_example = pd.Series(["a", 123, True])
series_example

0       a
1     123
2    True
dtype: object

In [6]:
series_example = pd.Series(["a", 123, True])
print(type(series_example[0]))
print(type(series_example[1]))
print(type(series_example[2]))
print(series_example.dtype)

<class 'str'>
<class 'int'>
<class 'bool'>
object


Data of different data types can be stored in the same series. The data retains its data type while stored in the series. A series also has a "homogenious" data type that is decided by all the elements in the series, which is given by the `.dtype` attribute.

A summary of commonly used data types is given below. Notice that although pandas tries to interprete data types automatically based on the input, sometimes it may fail to do so and categorize everything into string values.

| Data Type | Description |
|:--------- |:----------- |
| int64 | 64-bit integer |
| int32 | 32-bit integer |
| float64 | 64-bit floating-point number |
| float32 | 32-bit floating-point number |
| object | String values |
| bool | Boolean values (True/False) |
| datetime64\[ns\] | Date and time values |
| category | Categorical variables |
| NaN | Missing or null values |

If no index is specified, auto-incremental index is used. Index can be specified as follows.

In [7]:
index = ["first", "second", "third"]
values = [100, 200, 300]
series_example = pd.Series(data=values, index=index)
series_example

first     100
second    200
third     300
dtype: int64

When index is specified, the items can be accessed by both the "hidden" auto-increment numerical index and the specified index.

In [8]:
index = ["first", "second", "third"]
values = [100, 200, 300]
series_example = pd.Series(data=values, index=index)
print(series_example[0])
print(series_example["first"])

100
100


### From Numpy Array

In [9]:
series_example = pd.Series(np.random.randn(5))
series_example

0    1.974779
1   -0.745118
2   -0.180925
3    1.449197
4    0.176615
dtype: float64

### From Python Dictionary

When creating pandas array from Python dictionary, the keys of the dictionary is automatically used as the index of the series.

In [10]:
dictionary = {
    "first": 100,
    "second": 200,
    "third": 300,
}
series_example = pd.Series(dictionary)
series_example

first     100
second    200
third     300
dtype: int64

## Series Attributes

Commonly used series attributes are summarized in the table below.

| Attribute Name | Description |
|:------ |:----------- |
| .size | The length of the series. |
| .shape | The shape (length in each dimension) of the series. |
| .is_unique | A flag indicating whether there are duplications of values in the series. |
| .hasnans | A flag indicating whether there is `NaN` in the series. |
| .values | The values of the series. |
| .index | The index of the series. |
| .axes | A list of index that marks each dimension of the data set. In the case of series it is same with `.index`. |
| .dtype | The data type of the series. |

Examples to demonstrate the attributes are given below.

In [11]:
series_example = pd.read_csv("best-selling-video-games.csv", index_col = "Title")["Sales"]
print(series_example)
print("size: {}".format(series_example.size))
print("shape: {}".format(series_example.shape))
print("is_unique: {}".format(series_example.is_unique))
print("hasnans: {}".format(series_example.hasnans))
print("values: {}".format(series_example.values))
print("index: {}".format(series_example.index))
print("axes: {}".format(series_example.axes))
print("dtype: {}".format(series_example.dtype))

Title
Minecraft                                           238000000
Grand Theft Auto V                                  175000000
Tetris (EA)                                         100000000
Wii Sports                                           82900000
PUBG: Battlegrounds                                  75000000
Mario Kart 8 / Deluxe                                60460000
Super Mario Bros.                                    58000000
Red Dead Redemption 2                                50000000
Pokémon Red / Green / Blue / Yellow                  47520000
Terraria                                             44500000
Wii Fit / Plus                                       43800000
Tetris (1989)                                        43000000
Pac-Man                                              42071635
Animal Crossing: New Horizons                        41590000
Human: Fall Flat                                     40000000
The Witcher 3 / Hearts of Stone / Blood and Wine     40000000
Ma

## Series Methods

Commonly used series attributes are summarized in the table below.

| Method Name | Description |
|:------ |:----------- |
| .head(n=5), .tail(n=5) | Display the first and last n items in the series. Default value for n is 5. |
| .info() | A summary of the series including index and data type. |
| .count() | The number of non-NA/null items in the series. |
| .nunique() | Count the number of unique non-NA/null values in the series. |
| .sum() | Sum of all items (applicable to numeric data types). |
| .product() | Product of all items (applicable to numeric data types). |
| .mean() | Mean of all items (applicable to numeric data types). |
| .std() | Standard deviation of all items (applicable to numeric data types). |
| .median() | Median of all items (applicable to numeric data types). |
| .mode() | The mode of all items (applicable to any data type). |
| .min(), .max() | Min and max of all items (applicable to any ordered data type). |
| .value_counts() | Group the values and count the number in each group. Useful when the values are of categories (applicable to any data type). |
| .sort_values(ascending = True, inplace = False) | Sort the values (the indices move along with the values). 'ascending' is a boolean parameter; 'inplace' updates the series in place if set to True. |
| .sort_index(ascending = True, inplace = False) | Sort the index (the values move along with the indices). 'ascending' is a boolean parameter; 'inplace' updates the series in place if set to True. |
| .copy() | Duplicate a series decoupled from its origin. |
| .get(key) | Get a particular item from the series using the index key. |
| .dropna() | Remove rows with NaN (applicable to series with missing data). |
| .fillna(value) | Replace NaN with the specified value (applicable to series with missing data). |
| .apply(func) | Apply a function 'func' to each item in a series. |
| .map(mapping) | Map the value of each item in a series using an association (dictionary, function, or series) defined by 'mapping'. |
| .astype(dtype) | Change data type to the specified 'dtype'. |


In [12]:
series_example = pd.read_csv("best-selling-video-games.csv", index_col = "Title")["Sales"]
print("#----------")
print(".head(): \n {}".format(series_example.head()))
print("#----------")
print(".tail(): \n {}".format(series_example.tail()))
series_example = series_example.head() # for simplicity
print("#----------")
print(".info(): \n {}".format(series_example.info()))
print("#----------")
print(".count(): \n {}".format(series_example.count()))
print("#----------")
print(".nunique(): \n {}".format(series_example.nunique()))
print("#----------")
print(".sum(): \n {}".format(series_example.sum()))
print("#----------")
print(".product(): \n {}".format(series_example.product()))
print("#----------")
print(".mean(): \n {}".format(series_example.mean()))
print("#----------")
print(".std(): \n {}".format(series_example.std()))
print("#----------")
print(".median(): \n {}".format(series_example.median()))
print("#----------")
print(".mode(): \n {}".format(series_example.mode()))
print("#----------")
print(".min(): \n {}".format(series_example.min()))
print("#----------")
print(".max(): \n {}".format(series_example.max()))
print("#----------")
print(".sort_values(): \n {}".format(series_example.sort_values()))
print("#----------")
print(".sort_index(): \n {}".format(series_example.sort_index()))

#----------
.head(): 
 Title
Minecraft              238000000
Grand Theft Auto V     175000000
Tetris (EA)            100000000
Wii Sports              82900000
PUBG: Battlegrounds     75000000
Name: Sales, dtype: int64
#----------
.tail(): 
 Title
New Super Mario Bros. U / Luigi U / Deluxe    23640000
Mario Kart DS                                 23600000
Pokémon Ruby / Sapphire / Emerald             23280000
God of War                                    23000000
Red Dead Redemption                           23000000
Name: Sales, dtype: int64
#----------
<class 'pandas.core.series.Series'>
Index: 5 entries, Minecraft to PUBG: Battlegrounds
Series name: Sales
Non-Null Count  Dtype
--------------  -----
5 non-null      int64
dtypes: int64(1)
memory usage: 80.0+ bytes
.info(): 
 None
#----------
.count(): 
 5
#----------
.nunique(): 
 5
#----------
.sum(): 
 670900000
#----------
.product(): 
 7537240835054632960
#----------
.mean(): 
 134180000.0
#----------
.std(): 
 70258536.84784505


Other methods such as `.value_counts()`, etc., are introduced later.

### Aggregating Methods

Aggregating methods such as counting the number of elements in a series, calculating maximun, minimum, mean values, etc., have already been introduced in the example above. An example to introduce `.value_counts()` is given below. From this example, we can see how many games each publisher makes to the list. The return is also a pandas series.

In [13]:
series_example = pd.read_csv("best-selling-video-games.csv", index_col = "Title")["Publisher(s)"]
publisher_games_counts = series_example.value_counts()
print(publisher_games_counts)
print(type(publisher_games_counts))

Nintendo                          22
Rockstar Games                     5
Activision                         4
Nintendo / The Pokémon Company     4
Xbox Game Studios                  2
Re-Logic / 505 Games               1
Namco                              1
Curve Digital                      1
CD Projekt                         1
PUBG Corporation                   1
Blizzard Entertainment             1
Bethesda Softworks                 1
Telltale Games                     1
2K Games                           1
Electronic Arts                    1
EA Sports                          1
Sega                               1
Sony Interactive Entertainment     1
Name: Publisher(s), dtype: int64
<class 'pandas.core.series.Series'>


### The `.apply()` Method

Define a function in Python. The `.apply()` method allows running that functions to each and every element in the series in an element wise manner. It saves the trouble of forming a for-loop function. An example is given below. The return is also a pandas series.

In [14]:
def evaluate_sales(total_sales: int):
    if total_sales > 50000000:
        return "high"
    return "normal"

series_example = pd.read_csv("best-selling-video-games.csv", index_col = "Title")["Sales"]
evaluate_result = series_example.apply(evaluate_sales)
evaluate_result

Title
Minecraft                                             high
Grand Theft Auto V                                    high
Tetris (EA)                                           high
Wii Sports                                            high
PUBG: Battlegrounds                                   high
Mario Kart 8 / Deluxe                                 high
Super Mario Bros.                                     high
Red Dead Redemption 2                               normal
Pokémon Red / Green / Blue / Yellow                 normal
Terraria                                            normal
Wii Fit / Plus                                      normal
Tetris (1989)                                       normal
Pac-Man                                             normal
Animal Crossing: New Horizons                       normal
Human: Fall Flat                                    normal
The Witcher 3 / Hearts of Stone / Blood and Wine    normal
Mario Kart Wii                                    

### The `.map()` Method

The `.map()` method is a special case of `.apply()` method. It is also an element-wise operation and returns a new panda series. Each element in the new series is obtained from a predefined association. An example is given below.

In [15]:
def evaluate_sales(total_sales: int):
    if total_sales > 50000000:
        return "high"
    return "normal"

series_example = pd.read_csv("best-selling-video-games.csv", index_col = "Title")["Sales"]
evaluate_result = series_example.apply(evaluate_sales)

evaluate_result_update_dic = {
    "high": "very high",
    "normal": "high",
}

evaluate_result = evaluate_result.map(evaluate_result_update_dic)
evaluate_result

Title
Minecraft                                           very high
Grand Theft Auto V                                  very high
Tetris (EA)                                         very high
Wii Sports                                          very high
PUBG: Battlegrounds                                 very high
Mario Kart 8 / Deluxe                               very high
Super Mario Bros.                                   very high
Red Dead Redemption 2                                    high
Pokémon Red / Green / Blue / Yellow                      high
Terraria                                                 high
Wii Fit / Plus                                           high
Tetris (1989)                                            high
Pac-Man                                                  high
Animal Crossing: New Horizons                            high
Human: Fall Flat                                         high
The Witcher 3 / Hearts of Stone / Blood and Wine         high
Ma

### The `.astype()` Method