![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118512-dfa1cc1a-46e9-11e8-9547-093d4532451e.png"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Vectorized Operations and Methods on Pandas Series

Series also support vectorized operations and aggregation functions as Numpy, on this lecture we'll see most common ones.

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.options.display.float_format = '{:,.2f}'.format

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

The first thing we'll do is create again the `Series` from our previous lecture: 

In [3]:
g7_pop = pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, dtype=np.float, name='G7 Population in millions')

In [4]:
g7_pop

Canada            35.47
France            63.95
Germany           80.94
Italy             60.66
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

In [5]:
gdp = pd.Series(
    [1785387, 2833687, 3874437, 2167744, 4602367, 2950039, 17348075],
    index=['Canada', 'France', 'Germany', 'Italy',
            'Japan', 'United Kingdom', 'United States'],
    dtype=np.float,
    name='G7 GDP in millions')

In [6]:
gdp

Canada            1,785,387.00
France            2,833,687.00
Germany           3,874,437.00
Italy             2,167,744.00
Japan             4,602,367.00
United Kingdom    2,950,039.00
United States    17,348,075.00
Name: G7 GDP in millions, dtype: float64

In [7]:
g7_pop.head(3)

Canada    35.47
France    63.95
Germany   80.94
Name: G7 Population in millions, dtype: float64

In [8]:
g7_pop.tail(3)

Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## `Series` vectorized operations

In [9]:
g7_pop * 1_000_000

Canada            35,467,000.00
France            63,951,000.00
Germany           80,940,000.00
Italy             60,665,000.00
Japan            127,061,000.00
United Kingdom    64,511,000.00
United States    318,523,000.00
Name: G7 Population in millions, dtype: float64

In [10]:
g7_pop + 1_000_000

Canada           1,000,035.47
France           1,000,063.95
Germany          1,000,080.94
Italy            1,000,060.67
Japan            1,000,127.06
United Kingdom   1,000,064.51
United States    1,000,318.52
Name: G7 Population in millions, dtype: float64

In [11]:
gdp * 1_000_000

Canada            1,785,387,000,000.00
France            2,833,687,000,000.00
Germany           3,874,437,000,000.00
Italy             2,167,744,000,000.00
Japan             4,602,367,000,000.00
United Kingdom    2,950,039,000,000.00
United States    17,348,075,000,000.00
Name: G7 GDP in millions, dtype: float64

**Operation between Series:**

In [12]:
gdp / g7_pop

Canada           50,339.39
France           44,310.28
Germany          47,868.01
Italy            35,733.03
Japan            36,221.71
United Kingdom   45,729.24
United States    54,464.12
dtype: float64

In [13]:
(gdp * 1_000_000) / (g7_pop * 1_000_000)

Canada           50,339.39
France           44,310.28
Germany          47,868.01
Italy            35,733.03
Japan            36,221.71
United Kingdom   45,729.24
United States    54,464.12
dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Using _Universal Functions (Ufuncs)_ to obtain statistical info

We can apply any _Universal Function_ to a Series.

Another useful method is `describe`, which gives you a good "summary" of the `Series`. Let's explore other methods in more detail:

In [14]:
g7_pop.describe()

count     7.00
mean    107.30
std      97.25
min      35.47
25%      62.31
50%      64.51
75%     104.00
max     318.52
Name: G7 Population in millions, dtype: float64

In [15]:
g7_pop.max()

318.523

In [16]:
g7_pop.min()

35.467

In [17]:
g7_pop.mean()

107.30257142857144

In [18]:
g7_pop.std()

97.24996987121581

In [19]:
g7_pop.quantile(.2)

61.3222

In [20]:
g7_pop.quantile(.8)

117.83680000000004

In [21]:
np.log(g7_pop)

Canada           3.57
France           4.16
Germany          4.39
Italy            4.11
Japan            4.84
United Kingdom   4.17
United States    5.76
Name: G7 Population in millions, dtype: float64

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118512-dfa1cc1a-46e9-11e8-9547-093d4532451e.png"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Conditional Selection & Filtering on Pandas Series

In conditional selection (also known as **boolean selection**), we will select subsets of data based on the actual values of the data in the Series by using a boolean vector to filter the data.

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [1]:
import pandas as pd
import numpy as np

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

The first thing we'll do is create again the `Series` from our previous lecture:

In [2]:
data_dic = {
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}

g7_pop = pd.Series(data_dic,
                   name='G7 Population in millions')

In [3]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Summary of selection (from previous lesson):

In [4]:
g7_pop['France']

63.951

In [5]:
g7_pop.loc['France']

63.951

In [6]:
g7_pop.iloc[0]

35.467

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Conditional selection ( boolean arrays)

The same boolean array techniques we saw applied to numpy arrays can be used for Pandas `Series`.

On previous lecture we saw that we can index our `Series` using a list of boolean values:

In [7]:
g7_pop[[False, True,  True, True,  False, False,  False]]

France     63.951
Germany    80.940
Italy      60.665
Name: G7 Population in millions, dtype: float64

More documented:

In [8]:
g7_pop[[
    False, # CA
    True,  # Fr
    True,  # GE
    True,  # IT
    False, # JA
    False, # UK
    False  #US
]]

France     63.951
Germany    80.940
Italy      60.665
Name: G7 Population in millions, dtype: float64

Now we'll go a step further and use a real condition to generate these list of boolean values:

In [9]:
condition = g7_pop > 70

condition

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [10]:
g7_pop[condition]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [11]:
g7_pop.loc[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [12]:
g7_pop.mean()

107.30257142857144

In [13]:
g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [14]:
g7_pop.loc[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [15]:
g7_pop.loc[g7_pop > g7_pop.mean()].size

2

### Operators

#### `or`

In [16]:
g7_pop[(g7_pop > 70) | (g7_pop < 40)]

Canada            35.467
Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

#### `and`

In [17]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]

Germany     80.940
Japan      127.061
Name: G7 Population in millions, dtype: float64

#### `not`

In [18]:
g7_pop.loc[~(g7_pop > 80)]

Canada            35.467
France            63.951
Italy             60.665
United Kingdom    64.511
Name: G7 Population in millions, dtype: float64

In [19]:
g7_pop.loc[g7_pop > 80]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [20]:
g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [21]:
g7_pop.std()

97.24996987121581

In [22]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.std() / 2) | (g7_pop > g7_pop.mean() + g7_pop.std() / 2)]

France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

### Indexing with isin

Consider the `isin()` method of `Series`, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want:

In [23]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [24]:
g7_pop[g7_pop.isin([80, 80.940, 60.451, 35.467])]

Canada     35.467
Germany    80.940
Name: G7 Population in millions, dtype: float64

In [25]:
g7_pop[g7_pop.index.isin(['Canada', 'Italy'])]

Canada    35.467
Italy     60.665
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Modifying series using conditional selection

In [26]:
g7_pop[g7_pop < 70] = 99.99

g7_pop

Canada             99.990
France             99.990
Germany            80.940
Italy              99.990
Japan             127.061
United Kingdom     99.990
United States     318.523
Name: G7 Population in millions, dtype: float64

Also we can combine `+=`, `-=`, `*=` operations while modifying values.

Lets remove 5 million from countries with population >100M:

In [27]:
g7_pop[g7_pop > 100] += 5

g7_pop

Canada             99.990
France             99.990
Germany            80.940
Italy              99.990
Japan             132.061
United Kingdom     99.990
United States     323.523
Name: G7 Population in millions, dtype: float64

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118512-dfa1cc1a-46e9-11e8-9547-093d4532451e.png"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Pandas Series - Sorting

In many use cases `Series` values need to be sorted.

Sorting in Pandas is extremely easy. There are two important methods to be used for Series and DataFrames that will take care of the job: `sort_values` and `sort_index`.

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.options.display.float_format = '{:,.2f}'.format

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

The first thing we'll do is create again the `Series` from our previous lecture:

In [3]:
g7_pop = pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, dtype=np.float, name='G7 Population in millions')

In [4]:
g7_pop

Canada            35.47
France            63.95
Germany           80.94
Italy             60.66
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

In [5]:
gdp = pd.Series(
    [1785387, 2833687, 3874437, 2167744, 4602367, 2950039, 17348075],
    index=['Canada', 'France', 'Germany', 'Italy',
            'Japan', 'United Kingdom', 'United States'],
    dtype=np.float,
    name='G7 GDP in millions')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Sorting values

In [6]:
g7_pop

Canada            35.47
France            63.95
Germany           80.94
Italy             60.66
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

In [7]:
g7_pop.sort_values()

Canada            35.47
Italy             60.66
France            63.95
United Kingdom    64.51
Germany           80.94
Japan            127.06
United States    318.52
Name: G7 Population in millions, dtype: float64

As you can see, sorting is as simple as invoking the `sort_values` method. By default, values are sorted in ascending order, which you can customize with the `ascending` parameter.

In [8]:
g7_pop.sort_values(ascending=False)

United States    318.52
Japan            127.06
Germany           80.94
United Kingdom    64.51
France            63.95
Italy             60.66
Canada            35.47
Name: G7 Population in millions, dtype: float64

In [9]:
g7_pop

Canada            35.47
France            63.95
Germany           80.94
Italy             60.66
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

In [10]:
g7_pop.sort_values(ascending=False, inplace=True)

In [11]:
g7_pop

United States    318.52
Japan            127.06
Germany           80.94
United Kingdom    64.51
France            63.95
Italy             60.66
Canada            35.47
Name: G7 Population in millions, dtype: float64

### Sorting index

`sort_index` works exactly in the same way:

In [12]:
g7_pop.sort_index()

Canada            35.47
France            63.95
Germany           80.94
Italy             60.66
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)