## Pandas - Series

`Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. A `Series` can be created from a list or dictionary. 

In [322]:
import pandas as pd
import numpy as np

Let´s start by analyzing the `Group of Seven` (G7) countries. The G7 is an intergovernmental organization consisting of Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States. The G7 was formed in 1975 to discuss and coordinate economic policy among the world's largest advanced economies. 

Will analyze the population in millions of these countries with `pandas.Series` object .

In [323]:
g7_pop = pd.Series(
    [35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523],
    # dtype=np.float64,
    # index=[
    #     "Canada",
    #     "France",
    #     "Germany",
    #     "Italy",
    #     "Japan",
    #     "United Kingdom",
    #     "United States",
    # ],
)
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

Someone might not know we're representing population in millions of inhabitants. Series can have a `name`, to better document the purpose of the Series:

In [324]:
g7_pop.name = "G7 Population in millions"

In [325]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

Series are pretty similar to numpy arrays:

In [326]:
g7_pop.dtype    

dtype('float64')

In [327]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  60.665, 127.061,  64.511, 318.523])

They're actually backed by numpy arrays:

In [328]:
type(g7_pop.values)

numpy.ndarray

And they look like simple Python lists or Numpy Arrays. But they're actually more similar to Python dicts.

A Series has an `index`, that's similar to the automatic index assigned to Python's lists:

In [329]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [330]:
g7_pop[2] # Accessing by index

80.94

In [331]:
g7_pop.index

RangeIndex(start=0, stop=7, step=1)

But, in contrast to lists, we can explicitly define the index, like giving labels to the elements of a list. This is similar to how we can define keys in a Python dictionary or in DataFrames, the index is the column name. The index is a list of labels that we can use to access the elements of the Series. The index can be defined when creating the Series, or it can be set later. The index can also be a range of numbers, like in a list, or it can be a list of strings, like in a dictionary.

In [332]:
# Assigning names to the property index of tje Series object

g7_pop.index = [
    "Canada",
    "France",
    "Germany",
    "Italy",
    "Japan",
    "United Kingdom",
    "United States",
]
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

We can say that Series look like "ordered dictionaries". We can actually create Series out of dictionaries, with keys and values, and the keys will be used as the index of the Series: 

In [333]:
pd.Series(
    {
        "Canada": 35.467,
        "France": 63.951,
        "Germany": 80.94,
        "Italy": 60.665,
        "Japan": 127.061,
        "United Kingdom": 64.511,
        "United States": 318.523,
    },
    name="G7 Population in millions",
)

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

You can also create Series out of other series, specifying indexes. We can notice the `Spain` index is not in the original Series, so it will be filled with `NaN` (Not a Number), which is a special value used to represent missing or undefined values in pandas. This is similar to how we can use `None` in Python to represent missing values. In this case, the `NaN` value is used to indicate that the value for the `Spain` index is not available in the original Series.

In [334]:
pd.Series(g7_pop, index=["France", "Germany", "Italy", "Spain"])

France     63.951
Germany    80.940
Italy      60.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

---

### Indexing and Slicing


Indexing works similarly to lists and dictionaries, you use the index of the element you're looking for:

In [335]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [336]:
g7_pop["Canada"]

35.467

In [337]:
g7_pop["Japan"]

127.061

Numeric positions can also be used, with the `iloc` attribute:

In [338]:
g7_pop.iloc[0]

35.467

In [339]:
g7_pop.iloc[-1]

318.523

Selecting multiple elements at once:

In [340]:
g7_pop[["Italy", "France"]]

Italy     60.665
France    63.951
Name: G7 Population in millions, dtype: float64

In [341]:
g7_pop.iloc[[0, 2]]

Canada     35.467
Germany    80.940
Name: G7 Population in millions, dtype: float64

In [342]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Slicing also works, but important, in Pandas, the upper limit is also included:

In [343]:
g7_pop[:"Germany"]

Canada     35.467
France     63.951
Germany    80.940
Name: G7 Population in millions, dtype: float64

In [344]:
g7_pop["France":"Japan"]

France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
Name: G7 Population in millions, dtype: float64

---

### Conditional selection (boolean arrays or Series)

The same boolean array techniques we saw applied to numpy arrays can be used for Pandas `Series`:

In [345]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [346]:
g7_pop > 70

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [347]:
g7_pop[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [348]:
g7_pop.mean()

107.30257142857144

In [349]:
g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [350]:
g7_pop.std()

97.24996987121581

In [351]:
g7_pop[
    (g7_pop > g7_pop.mean() - g7_pop.std() / 2)
    | (g7_pop > g7_pop.mean() + g7_pop.std() / 2)
]

France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

---

### Operations and methods

Series also support vectorized operations and aggregation functions as Numpy:

In [352]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [353]:
g7_pop * 1_000_000 # Converting to absolute numbers to unit

Canada             35467000.0
France             63951000.0
Germany            80940000.0
Italy              60665000.0
Japan             127061000.0
United Kingdom     64511000.0
United States     318523000.0
Name: G7 Population in millions, dtype: float64

In [354]:
g7_pop.mean()

107.30257142857144

In [355]:
np.log(g7_pop) # Logarithm of the population

Canada            3.568603
France            4.158117
Germany           4.393708
Italy             4.105367
Japan             4.844667
United Kingdom    4.166836
United States     5.763695
Name: G7 Population in millions, dtype: float64

In [356]:
g7_pop["France":"Italy"].mean() # Mean of a subset of the Series between France and Italy

68.51866666666666

---

### Boolean arrays

(Work in the same way as numpy)


In [357]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [358]:
g7_pop > 80

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [359]:
g7_pop[g7_pop > 80]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [360]:
g7_pop[(g7_pop > 80) | (g7_pop < 40)]

Canada            35.467
Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [361]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]

Germany     80.940
Japan      127.061
Name: G7 Population in millions, dtype: float64

---

### Modifying series

In [362]:
g7_pop["Canada"] = 40.5

In [363]:
g7_pop

Canada             40.500
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [364]:
g7_pop.iloc[-1] = 500

In [365]:
g7_pop

Canada             40.500
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     500.000
Name: G7 Population in millions, dtype: float64

In [366]:
g7_pop[g7_pop < 70]

Canada            40.500
France            63.951
Italy             60.665
United Kingdom    64.511
Name: G7 Population in millions, dtype: float64

In [367]:
g7_pop[g7_pop < 70] = 100

In [368]:
g7_pop

Canada            100.000
France            100.000
Germany            80.940
Italy             100.000
Japan             127.061
United Kingdom    100.000
United States     500.000
Name: G7 Population in millions, dtype: float64