# Pandas

1. Series                               - how a pandas series is different from numpy array
2. Creating a Pandas Series             - like arrays/lists, like dictionary, from another series
3. Indexing                             - iloc, boolean
4. Operations and methods               - arithematic, statistical methods.

In [1]:
import pandas as pd
import numpy as np

## Series

A pandas series has a few distinct features compared to numpy arrays. 
1. Indexes are visible on the left side of a series.
2. Indexes can be manually entered in a series.
3. A series can have a name.

Similarities: 
1. Values in pandas series uses numpy arrays in the background. 

In [2]:
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

In [3]:
#giving name to the series.

g7_pop.name = 'G7 Population in millions'
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [6]:
#getting type of elements in the series. and type of the series.

print(g7_pop.dtype)
print(type(g7_pop.values))
print(type(g7_pop))

float64
<class 'numpy.ndarray'>
<class 'pandas.core.series.Series'>


In [8]:
#Assign index to each value.

g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

### Creating a Pandas Series

In [10]:
#creating pandas series like dictionary.

g7_pop2 = pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, name='G7 Population in millions')
g7_pop2

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [11]:
#creating pandas series like multiple series.

g7_pop3 = pd.Series(
    [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523],
    index = ['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom',
       'United States'],
    name='G7 Population in millions')

g7_pop3

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [14]:
#create a new series by selecting a certain indexes from an existing series.

g7_pop_select = pd.Series(g7_pop, index=['France', 'Germany', 'Italy', 'Spain'])
g7_pop_select

France     63.951
Germany    80.940
Italy      60.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

### Indexing

In [15]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [19]:
g7_pop[['Canada','Italy']]

Canada    35.467
Italy     60.665
Name: G7 Population in millions, dtype: float64

In [17]:
g7_pop.iloc[0:3]

Canada     35.467
France     63.951
Germany    80.940
Name: G7 Population in millions, dtype: float64

### Conditional Selection using boolean arrays

In [20]:
g7_pop > 70

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [21]:
g7_pop[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [26]:
g7_pop[(g7_pop > 80) | (g7_pop < 40)]

Canada            35.467
Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

### Operations and Methods

In [22]:
g7_pop * 1000000

Canada             35467000.0
France             63951000.0
Germany            80940000.0
Italy              60665000.0
Japan             127061000.0
United Kingdom     64511000.0
United States     318523000.0
Name: G7 Population in millions, dtype: float64

In [23]:
g7_pop.mean()

107.30257142857144

In [24]:
np.log(g7_pop)

Canada            3.568603
France            4.158117
Germany           4.393708
Italy             4.105367
Japan             4.844667
United Kingdom    4.166836
United States     5.763695
Name: G7 Population in millions, dtype: float64