# Pandas
___
[pandas official](https://pandas.pydata.org/)

<img src="https://i1.wp.com/numfocus.org/wp-content/uploads/2016/07/pandas-logo-300.png?fit=300%2C300&ssl=1" width="200">

## Hello Pandas
***
Pandas is a python module which builds on top of numpy capabilities, harvesting its numerical efficiency while enabling us to work with heterogeneous data. It does so by wrapping the ndarrays with it's own objects : pandas Dataframe and pandas Series which we will discuss after we import.

In [2]:
import numpy as np  # It is always advised to import numpy when working with pandas
import pandas as pd # The official alias of pandas is pd.

In [2]:
pd.__version__

'0.24.1'

## Pandas Objects
Pandas supply 3 objects for us to work with : 
1. __Index__
1. __Series__
1. __DataFrame__  
The reason for this order is because each series object contains a Index, and each DataFrame contains both Series objects and an Index object. But, we will talk about them in this order : Series, DataFrame, Index.

### Series
***
<img src="https://www.straightpokersupplies.com/media/catalog/product/cache/1/image/1200x1200/9df78eab33525d08d6e5fb8d27136e95/w/o/world-series-of-poker-playing-cards-modiano-2015-16.jpg" width="200">

The first object we are going to discuss is the pandas Series. A series is a "wrapped" __1d__ numpy array. Let's start with creating our very first pandas Series :

In [3]:
s = pd.Series([10, 20, 30, 40], name='random stuff') 

In [4]:
s

0    10
1    20
2    30
3    40
Name: random stuff, dtype: int64

In [5]:
s.name

'random stuff'

In [6]:
s[0], s[1], s[2], s[3]

(10, 20, 30, 40)

In [7]:
s[1:3] # Slicing works as well

1    20
2    30
Name: random stuff, dtype: int64

Series can only be __1 dimensional__!

In [8]:
pd.Series(np.zeros((2, 2))) # this thorws an exception

Exception: Data must be 1-dimensional

Ok, so when we print our series we see that the values it contains plus the index of each value.  
Let's jump back to numpy for a second

In [9]:
a = np.array([10, 20, 30, 40])

In [10]:
a

array([10, 20, 30, 40])

In [11]:
a[0], a[1], a[2], a[3]

(10, 20, 30, 40)

Well, at first glance it looks like the only difference between a series and a numpy array is the fact that the index is printed for us. And the truth is, that this is the major difference - the index.  
A series object contains 2 objects as attributes: 
1. __values__ - a numpy array of values.
1. __index__ - a pandas Index object. (guess what holds the values under the hood? a numpy array)  

Let's explore:

In [12]:
s_values, s_index = s.values, s.index

In [13]:
s_values, s_index

(array([10, 20, 30, 40]), RangeIndex(start=0, stop=4, step=1))

In [14]:
type(s_values), type(s_index)

(numpy.ndarray, pandas.core.indexes.range.RangeIndex)

Ok, so we see that the values attribute is a numpy array and that the index is a RangeIndex object (This is a sub class of pandas Index object which we will talk about soon).  
This means that unlike numpy which sets an implicit index, in a series we can explicitly set an index. Let's see this in action:

In [14]:
s = pd.Series([10, 20, 30, 40], index=list('abcd'))

In [15]:
s.name

In [16]:
s

a    10
b    20
c    30
d    40
dtype: int64

Now we understand why printing the index makes sense. What's more, we can use the index to access the values in our series

In [17]:
s['a'], s['b'], s['c']

(10, 20, 30)

Slicing work here as well!

In [18]:
s['a':'c']

a    10
b    20
c    30
dtype: int64

But, we did not lose our previous capabilities and we can still access our elements using a numeric index. This comes with a caveat we will soon see.

In [19]:
s[0]

10

__Danger__. Be careful with mixing explicit indicies and implicit indicies as in some cases this could cause confusion. the worst scenario is when you set a different numeric index to your series because then you can not use the implicit indexing anymore.

In [20]:
s = pd.Series([10, 20, 30, 30], index=range(1, 5))
s

1    10
2    20
3    30
4    30
dtype: int64

In [21]:
s[0] # This throws an error 

KeyError: 0

In [22]:
s[1] # Some might expect this to give the 2nd value in the array while it will give the first

10

One way to look at pandas series is as a sort of python dictionary. Where the index is the keys and the values are, well, the values.  
This similarity is so apparent that you can use python dictionaries to build pandas series.

In [23]:
d = {
    'dollar' : 3.14,
    'euro'   : 3.5,
    'pound'  : 4.29
}
s = pd.Series(d)

In [24]:
s

dollar    3.14
euro      3.50
pound     4.29
dtype: float64

In [25]:
s['dollar']

3.14

In [26]:
s['euro':]

euro     3.50
pound    4.29
dtype: float64

In [27]:
d.keys(), s.index

(dict_keys(['dollar', 'euro', 'pound']),
 Index(['dollar', 'euro', 'pound'], dtype='object'))

__Summing up__ : A pandas series is a enhanced 1d numpy array which enables us explicit indexing setting.

***
## Exercise
***

__Create a series of size 10 with random values and an implicit index__

In [5]:
# Your code here
s = pd.Series(np.random.rand(10))
s


# 0    0.925635
# 1    0.322262
# 2    0.899349
# 3    0.604293
# 4    0.634586
# 5    0.420359
# 6    0.095573
# 7    0.109929
# 8    0.003602
# 9    0.694641
# dtype: float64

0    0.014032
1    0.502292
2    0.350171
3    0.793361
4    0.439339
5    0.490455
6    0.450276
7    0.288705
8    0.285841
9    0.797059
dtype: float64

__Create the following series: (index on left, values on the right)__
```py
2     1
4     3
6     5
8     7
10    9
```

In [6]:
# Your code here
s = pd.Series(np.arange(1,10,2),index=np.arange(2,11,2))
s

2     1
4     3
6     5
8     7
10    9
dtype: int32

__Use slicing to access the values with index 4 and 8.__

In [17]:
# Your code here
s[1:4:2]

4    3
8    7
dtype: int32

__Create the following series(index on the left, values on the right)__
```py
2squared     4
3squared     9
4squared    16
5squared    25
```

In [32]:
# Your code here
data = np.arange(2,6,1) ** 2
idxs = np.char.array(np.arange(2,6,1)) + b'squared'
pd.Series(data,index=idxs)


b'2squared'     4
b'3squared'     9
b'4squared'    16
b'5squared'    25
dtype: int32

## DataFrame
***

<img src="https://www.tutorialspoint.com/python_pandas/images/structure_table.jpg" width="300">

A pandas dataframe is basically $n$ pandas series stacked vertically(They are the columns) one next to each other. Think of a dataframe as basically an excel sheet where you can name you columns. Another option is to think of it as an enhanced 2d numpy array, where you can access the column and the rows in a "fancy" way.  
Let's explore :

In [38]:
data = np.random.randint(low=10, high=50, size=(15, 3))
df = pd.DataFrame(data, columns=['A', 'B', 'C'])

In [39]:
df

Unnamed: 0,A,B,C
0,31,45,46
1,48,14,41
2,16,11,35
3,44,11,23
4,15,29,25
5,37,17,16
6,20,22,39
7,36,30,25
8,20,28,32
9,33,13,13


We can access each column by it's name :

In [40]:
df['A']

0     31
1     48
2     16
3     44
4     15
5     37
6     20
7     36
8     20
9     33
10    13
11    32
12    13
13    38
14    29
Name: A, dtype: int64

And each columns is a pandas series as we mentioned above:

In [41]:
type(df['A']), type(df['B']), type(df['C'])

(pandas.core.series.Series,
 pandas.core.series.Series,
 pandas.core.series.Series)

And as mentioned above each of these series contains a numpy array.

In [42]:
type(df['A'].values), type(df['B'].values), type(df['C'].values)

(numpy.ndarray, numpy.ndarray, numpy.ndarray)

We mentioned earlier we can view the pandas series as a dictionary, mapping from index to value. We can also think of a dataframe as a dictionary mapping from index (Column name) to a series.  
This helps in remember the following : When we use the square brackets notation to access elements in our dataframe we get back a Series. (some might expect to get a row, so don't be that person).

The dataframe object also contains yet another Index object, which maps to the dataframe rows. we will see how we use these to access the rows later on.  

In [34]:
type(df.index), type(df.columns), type(df.values[0, 0])

(pandas.core.indexes.range.RangeIndex,
 pandas.core.indexes.base.Index,
 numpy.int64)

***
## Exercise
***

__Create the following dataframe__
```py
	A	B	C	D	E
0	0	1	2	3	4
1	5	6	7	8	9
2	10	11	12	13	14
3	15	16	17	18	19
4	20	21	22	23	24
```

In [6]:
# Your Code here

# A	B	C	D	E
# 0	0	1	2	3	4
# 1	5	6	7	8	9
# 2	10	11	12	13	14
# 3	15	16	17	18	19
# 4	20	21	22	23	24

data = np.arange(0,25,1).reshape(5,5)
df = pd.DataFrame(data, columns=['A', 'B', 'C','D','E'])
df

Unnamed: 0,A,B,C,D,E
0,0,1,2,3,4
1,5,6,7,8,9
2,10,11,12,13,14
3,15,16,17,18,19
4,20,21,22,23,24


__Create the following dataframe__
```py
	0	1	2	3	4
0	A	B	C	D	E
```

In [8]:
# Your Code here

data = np.array(['A', 'B', 'C','D','E'])
df = pd.DataFrame(data).T
df

Unnamed: 0,0,1,2,3,4
0,A,B,C,D,E


__Create the following dataframe__:
```py
	Israel	USA	Japan
first	0	1	2
second	3	4	5
```

In [10]:
# Your Code here 
# Israel	USA	Japan
# first	0	1	2
# second	3	4	5

data = np.arange(0,6,1).reshape(2,3)
columns = np.array(['Israel', 'USA', 'JAPAN'])
index = np.array(['first', 'second'])
df = pd.DataFrame(data,columns = columns,index = index)
df

Unnamed: 0,Israel,USA,JAPAN
first,0,1,2
second,3,4,5


## Index
*** 

<img src="https://ecuinc.biz/wp-content/uploads/2017/11/index.jpg" width="200">

We just saw each of our object uses the pandas index. We can think of the pandas index as an immutable numpy array. Meaning, we can perform on the index object a lot of the operations we used on the numpy array, __BUT__,  we can't  change any of the values (which makes sense for an Index). (In other words, An Index is an immutable object).

In [11]:
idx = pd.Index([1, 2, 3, 4, 5, 6, 7, 8])

In [12]:
idx.shape, idx.ndim, idx.size, idx.dtype

((8,), 1, 8, dtype('int64'))

And we can use a lot of the same techniques to access elements in the array

In [48]:
idx[0], idx[1:5], idx[::2]

(1,
 Int64Index([2, 3, 4, 5], dtype='int64'),
 Int64Index([1, 3, 5, 7], dtype='int64'))

But, since this is an immutable object we can not change the values

In [49]:
idx[0] = 12

TypeError: Index does not support mutable operations

And guess what the Index object contains under the hood? (You guessed it! a numpy array)

In [50]:
type(idx.values)

numpy.ndarray

A great functionality of the Index object is it supports set operations. We can preform various set operation between indicies to great new indicies.

Some set operation remainder:  

* __Union__        : $A \cup B = \{a | a\in A ~or~ a\in B \}$ All the elements which are either in A or in B.  
* __Intersection__ : $A \cap B = \{a | a\in A ~and~ a\in B \}$ All the elements which are both in A and in B.  
* __Symmetric difference__ : $A \triangle B = \{a | a \in A ~or~ a\in B~ but ~not ~both \}$

And the operators in python:  
* __Union__ - |  
* __Intersection__ - &  
* __Symmetric Difference__ - ^  

Let see that in action:

In [51]:
ind_1 = pd.Index(np.arange(10))
ind_2 = pd.Index(np.arange(5, 12))
ind_1, ind_2

(Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64'),
 Int64Index([5, 6, 7, 8, 9, 10, 11], dtype='int64'))

In [52]:
ind_1 | ind_2

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='int64')

In [53]:
ind_1 & ind_2

Int64Index([5, 6, 7, 8, 9], dtype='int64')

In [54]:
ind_1 ^ ind_2

Int64Index([0, 1, 2, 3, 4, 10, 11], dtype='int64')

In most of the scenario you will encounter there will not be a need to construct an index object outside a Series or a Dataframe.

## Exercise
***

__Get all the values which in ind_1 minus those in ind_2__

In [15]:
ind_1 = pd.Index(np.arange(10))
ind_2 = pd.Index(np.arange(5, 12))

# Your code here
# Int64Index([0, 1, 2, 3, 4], dtype='int64')
(ind_1 ^ ind_2) & ind_1


Int64Index([0, 1, 2, 3, 4], dtype='int64')

Get all the indices which are either in `ind_1` and `ind_2` but not in `ind_3`

In [31]:
ind_1 = pd.Index(np.arange(0, 10, 2))
print("ind_1= ",ind_1)
ind_2 = pd.Index(np.arange(1, 11, 2))
print("ind_2= ",ind_2)
ind_3 = pd.Index(np.arange(0, 11, 3))
print("ind_3= ", ind_3)

# Your code here
# 0 1 2 3 4 5 6 7 8 9
# 1 2 4 5 7 8 

print((ind_1 | ind_2))
print((ind_1 | ind_2) ^ ind_3)

ind_1=  Int64Index([0, 2, 4, 6, 8], dtype='int64')
ind_2=  Int64Index([1, 3, 5, 7, 9], dtype='int64')
ind_3=  Int64Index([0, 3, 6, 9], dtype='int64')
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
Int64Index([1, 2, 4, 5, 7, 8], dtype='int64')


## Accessing elements
***

### Series
We talked about the fact the a series is similar to 2 different objects : a 1d numpy array and a dictionary.  
So we know we can access elements in the same way we access eleements in each of those objects.  
But there are some nuances we should pay attention to:

In [61]:
s = pd.Series(np.arange(1, 5), list('bcde'))

In [62]:
s

b    1
c    2
d    3
e    4
dtype: int64

Accessing like a numpy array :

In [63]:
s[1], s[2:4]

(2, d    3
 e    4
 dtype: int64)

Accessing like a dictionary:

In [64]:
s['e'], s['b':'d'] # Notice this does tkae the last element!

(4, b    1
 c    2
 d    3
 dtype: int64)

By the way, slicing does not work for python dictionaries.

In [65]:
d = {l:i for i, l in zip(range(5), list('abcde'))}

In [66]:
d['a':'d']

TypeError: unhashable type: 'slice'

And we can start getting 'fancy' with our Series with much of what we saw for numpy.  
We can pass use boolean indexing:

In [62]:
np.random.seed(2611)
s = pd.Series(np.random.randint(low=0, high=20, size=10))
s

0    18
1    18
2     1
3     6
4    16
5    15
6    19
7     0
8    17
9     9
dtype: int64

In [63]:
x = s[s>6]
x

0    18
1    18
4    16
5    15
6    19
8    17
9     9
dtype: int64

In [64]:
s[(s<6) | (s > 17)]

0    18
1    18
2     1
6    19
7     0
dtype: int64

And we can get fancy as well

In [33]:
np.random.seed(2611)
index_lettters = [chr(ord('a') + i) for i in range(10)]
s = pd.Series(np.random.randint(low=0, high=20, size=10), index=index_lettters)
s

a    18
b    18
c     1
d     6
e    16
f    15
g    19
h     0
i    17
j     9
dtype: int32

In [34]:
s[['a', 'd', 'e']]

a    18
d     6
e    16
dtype: int32

In [35]:
s['a']

18

In [73]:
s[[0, 3, 4]]

a    18
d     6
e    16
dtype: int64

Don't forget you can use both the defined index an numeric indices when your defined index is not numeric.

#### loc and iloc 
As you might notice, the fact that we can use both the implicit and explicit index can cause confusion.  
If you want to make sure that both you and the person reading your code knows which index you are referring to you can use the __loc__ and __iloc__ methods.
* __iloc__ - refers to the numeric index
* __loc__ - refers to the explicit index.  
If your explicit index is the same as the implicit index this 2 methods will return similar results.

In [68]:
np.random.seed(2611)
index_lettters = [chr(ord('a') + i) for i in range(10)]
s = pd.Series(np.random.randint(low=0, high=20, size=10), index=index_lettters)
s

a    18
b    18
c     1
d     6
e    16
f    15
g    19
h     0
i    17
j     9
dtype: int64

In [69]:
s.iloc[[1, 2, 3]]

b    18
c     1
d     6
dtype: int64

In [70]:
s.loc['a':'c']

a    18
b    18
c     1
dtype: int64

In [71]:
# This will cause and exception.
s.loc[:4]

TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [4] of <class 'int'>

In [72]:
# This will cause an exception.
s.iloc['a':'d']

TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [a] of <class 'str'>

### DataFrame
As we saw earlier its easy to think of a DataFrame as a dicionray mapping between index and columns, and as we saw we can access column in a dictionary like bracket notion:

In [36]:
np.random.seed(1010)
index_lettters = [chr(ord('a') + i) for i in range(10)]
data = np.random.randint(low=0, high=100, size=(10, 3))
df = pd.DataFrame(data, columns=['A', 'B', 'C'], index=index_lettters)
df

Unnamed: 0,A,B,C
a,36,72,74
b,18,78,67
c,22,42,76
d,98,81,53
e,64,22,90
f,29,24,97
g,95,80,49
h,58,71,70
i,90,45,78
j,35,88,99


In [39]:
type(df['A']['a'])

numpy.int32

In [37]:
df['A']

a    36
b    18
c    22
d    98
e    64
f    29
g    95
h    58
i    90
j    35
Name: A, dtype: int32

So the question arises - how do we access rows in our dataframes? And the answer is through the loc and iloc methods. In a dataframe those methods refers to the rows only. 

In [81]:
# Accessing the 2nd and 3rd row
df.iloc[[2, 3]]

Unnamed: 0,A,B,C
c,22,42,76
d,98,81,53


In [82]:
df.iloc[1:5]

Unnamed: 0,A,B,C
b,18,78,67
c,22,42,76
d,98,81,53
e,64,22,90


Let's take a look at a loc example as well:


In [83]:
df.loc['a']

A    36
B    72
C    74
Name: a, dtype: int64

In [84]:
df.loc[['a', 'b', 'c']],

(    A   B   C
 a  36  72  74
 b  18  78  67
 c  22  42  76,)

Ok, So if you think you are starting to get the hang of it. Now let's get you confused.  
If you use slicing or boolean mask using the bracket notion, this will work on the rows.

<img src="https://media0.giphy.com/media/3oz8xZvvOZRmKay4xy/giphy.gif?cid=790b7611c7f9d5575bd5514b74f3bc3d9757e4daf8ae570f&rid=giphy.gif" width="300">

So slicing works on the rows:

In [85]:
df[1:3]

Unnamed: 0,A,B,C
b,18,78,67
c,22,42,76


Boolean masking works on the rows:

In [86]:
mask = df['A'] > 50 # Get rows where A value is bigger then 50
df[mask]

Unnamed: 0,A,B,C
d,98,81,53
e,64,22,90
g,95,80,49
h,58,71,70
i,90,45,78


Oh, fancy indexing does work on the columns :

In [87]:
df[['A', 'B']]

Unnamed: 0,A,B
a,36,72
b,18,78
c,22,42
d,98,81
e,64,22
f,29,24
g,95,80
h,58,71
i,90,45
j,35,88


To Summarize : 
If you want to access the columns:
- Standard dictionary like accessing work.
- Fancy indexing given the same type as the columns index works as well.

If you want to access the rows:
- use iloc for implicit index.
- use loc for explicit index.
- use slicing using bracket notion.

- use boolean masking using bracket notion.

***
## Exercise
***

In [40]:
np.random.seed(1000)
index_lettters = [chr(ord('a') + i) for i in range(20)]
df = pd.DataFrame(np.random.randint(10, 100, size=(20, 4)), 
                  columns=['House', 'Garden', 'Shed', 'Basement'],
                  index=index_lettters)
df

Unnamed: 0,House,Garden,Shed,Basement
a,61,97,81,74
b,11,71,10,99
c,55,50,46,70
d,52,68,51,30
e,40,98,40,38
f,40,87,92,38
g,95,20,40,24
h,71,79,86,21
i,62,85,42,62
j,60,31,36,17


__Display the *House* and *Garden* Columns__

In [43]:
# Your Code here
df[['House','Garden']]

Unnamed: 0,House,Garden
a,61,97
b,11,71
c,55,50
d,52,68
e,40,98
f,40,87
g,95,20
h,71,79
i,62,85
j,60,31


__Display all the even rows__

In [46]:
# Your Code here
# House	Garden	Shed	Basement
# a	61	97	81	74
# c	55	50	46	70
# e	40	98	40	38
# g	95	20	40	24
# i	62	85	42	62
# k	26	97	53	69
# m	11	21	56	37
# o	79	64	39	21
# q	82	28	12	87
# s	45	61	63	57
df.index
df[::2]

Unnamed: 0,House,Garden,Shed,Basement
a,61,97,81,74
c,55,50,46,70
e,40,98,40,38
g,95,20,40,24
i,62,85,42,62
k,26,97,53,69
m,11,21,56,37
o,79,64,39,21
q,82,28,12,87
s,45,61,63,57


__Display all the rows where the *Garden* value is bigger than 50.__

In [47]:
# Your code here
# House	Garden	Shed	Basement
# a	61	97	81	74
# b	11	71	10	99
# d	52	68	51	30
# e	40	98	40	38
# f	40	87	92	38
# h	71	79	86	21
# i	62	85	42	62
# k	26	97	53	69
# n	57	97	89	27
# o	79	64	39	21
# r	69	51	16	20
# s	45	61	63	57

df[df.Garden> 50]

Unnamed: 0,House,Garden,Shed,Basement
a,61,97,81,74
b,11,71,10,99
d,52,68,51,30
e,40,98,40,38
f,40,87,92,38
h,71,79,86,21
i,62,85,42,62
k,26,97,53,69
n,57,97,89,27
o,79,64,39,21


__Display all the rows where either the *Shed* value is bigger than 50 or the Basement value is lower 50 but not both.__

In [53]:
# Your code here
df[((df.Shed> 50) | (df.Basement < 50)) ^ ((df.Shed> 50) & (df.Basement < 50))]

Unnamed: 0,House,Garden,Shed,Basement
a,61,97,81,74
e,40,98,40,38
g,95,20,40,24
j,60,31,36,17
k,26,97,53,69
l,10,41,40,28
o,79,64,39,21
p,22,15,65,98
r,69,51,16,20
s,45,61,63,57


__Display the 4, 8 and 10th row (don't use letters).__

In [55]:
# Your code here
df.iloc[[4,8,10]]

Unnamed: 0,House,Garden,Shed,Basement
e,40,98,40,38
i,62,85,42,62
k,26,97,53,69


## Universal Functions
***
We saw that numpy universal function is the magic sauce which gives us an amazing speed up when performing element wise operations. Since pandas uses numpy under the hood we want to leverage this functionality with pandas as well.
When dealing with series, numpy ufuncs works as you expect, with the "side effect" that the index is preserved as is. As one would expect.

In [81]:
s = pd.Series([1, 2, 3, 4])
s

0    1
1    2
2    3
3    4
dtype: int64

In [82]:
s**2

0     1
1     4
2     9
3    16
dtype: int64

We can see that each of the element was squared and our index stayed the same.  
If you have a homogenous dataframe you can preform the same on datafrmaes as well:

In [83]:
df = pd.DataFrame(np.arange(20).reshape(4, 5))
df * 10

Unnamed: 0,0,1,2,3,4
0,0,10,20,30,40
1,50,60,70,80,90
2,100,110,120,130,140
3,150,160,170,180,190


Again, the index stays the same while each element in the matrix is multiplied by 10.  

If one of the columns is string, this will fail.

In [84]:
df_with_str = pd.concat((df, pd.Series(['a', 'b', 'c', 'd'])), axis=1)
df_with_str

Unnamed: 0,0,1,2,3,4,0.1
0,0,1,2,3,4,a
1,5,6,7,8,9,b
2,10,11,12,13,14,c
3,15,16,17,18,19,d


In [90]:
df_with_str[0]

Unnamed: 0,0,0.1
0,0,a
1,5,b
2,10,c
3,15,d


In [85]:
df_with_str ** 2 # This fails since we have a string column in our dataframe.

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

## Operating on two pandas object
If you try to preform an binary operation between 2 pandas objects, the operation will be based on the index and pandas will complete the result with nan in case an index appears in only one of the objects.

In [91]:
ser_a = pd.Series(np.arange(5), index=list('abcde'))
ser_b = pd.Series(np.arange(5, 10), index=list('gfedc'))

ser_a, ser_b

(a    0
 b    1
 c    2
 d    3
 e    4
 dtype: int64, g    5
 f    6
 e    7
 d    8
 c    9
 dtype: int64)

In [92]:
ser_a / ser_b

a         NaN
b         NaN
c    0.222222
d    0.375000
e    0.571429
f         NaN
g         NaN
dtype: float64

The result is a Series where the index is equal to the union of both index. Where both series had a value we get a float but where there was only one value we get a nan value, which is pandas way of indicating a missing value.  
If you want to get a more reliable result you can use an explicit operation call and pass in the fill_value parameter which fill any missing value on either side of the series with the passed parameter

In [101]:
ser_a.div(ser_b, fill_value=1)  # If one of the values is missing preform the action with 1.

a    0.000000
b    1.000000
c    0.222222
d    0.375000
e    0.571429
f    0.166667
g    0.200000
dtype: float64

If you try to perform an action between 2 dataframes the same logic takes place only this time the elements will be aligned on both the column index and the row index.

In [102]:
A = pd.DataFrame(np.arange(18).reshape(6,3), columns=['A', 'B', 'C'])
A

Unnamed: 0,A,B,C
0,0,1,2
1,3,4,5
2,6,7,8
3,9,10,11
4,12,13,14
5,15,16,17


In [103]:
B = pd.DataFrame(np.arange(4).reshape(2, 2) * 10, columns=['D', 'E'])
B

Unnamed: 0,D,E
0,0,10
1,20,30


In [104]:
A + B

Unnamed: 0,A,B,C,D,E
0,,,,,
1,,,,,
2,,,,,
3,,,,,
4,,,,,
5,,,,,


If we try to preform an action between a series and dataframe the alignment will be on the 

In [105]:
ser_a = pd.Series([1, 2, 3], index=list('ABC'))
ser_a

A    1
B    2
C    3
dtype: int64

In [106]:
A - ser_a

Unnamed: 0,A,B,C
0,-1,-1,-1
1,2,2,2
2,5,5,5
3,8,8,8
4,11,11,11
5,14,14,14


***
## Exercise
***

In [56]:
np.random.seed(1002)
df_1 = pd.DataFrame(np.random.randint(0, 50, size=(20, 3)), columns=['Sunday', 'Monday', 'Tuesday'])
df_2 = pd.DataFrame(np.random.randint(0, 50, size=(10, 3)), columns=['Sunday', 'Monday', 'Tuesday'])

In [57]:
df_1

Unnamed: 0,Sunday,Monday,Tuesday
0,39,1,46
1,40,20,48
2,26,13,2
3,40,6,9
4,5,42,35
5,38,42,25
6,29,38,39
7,32,39,45
8,22,37,8
9,40,43,7


In [58]:
df_2

Unnamed: 0,Sunday,Monday,Tuesday
0,35,33,4
1,16,19,17
2,17,32,41
3,7,17,1
4,46,43,6
5,6,45,9
6,10,35,7
7,2,17,3
8,39,36,11
9,0,35,31


__Get all the values from `df_1` Sunday column squared__

In [111]:
# Your code here


__get the value of `df_2` Monday + `df_1` Tuesday. Complete missing values on either side with 9.__

In [112]:
# Your code here


__Get all the rows from `df_2` where the *monday* value is bigger then the *monday* value of `df_1`__

In [584]:
indicies = (df_2.index) & (df_1.index)
mask = df_2.iloc[indicies]['Monday'] > df_1.iloc[indicies]['Monday']

In [113]:
# Your code here


## Concatenate and Append
Again, going back to numpy, we can concatenate series and dataframes using the pandas concat method :

In [114]:
ser_a = pd.Series(np.arange(5))
ser_b = pd.Series(np.arange(5, 10), index=np.arange(5, 10))
ser_a, ser_b

(0    0
 1    1
 2    2
 3    3
 4    4
 dtype: int64, 5    5
 6    6
 7    7
 8    8
 9    9
 dtype: int64)

In [115]:
pd.concat([ser_a, ser_b])

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

What's important to know is that when concatenating objects, pandas concate keeps the indicies of the original objects. This could cause some unwanted behaviour and you should pay attention to these issues.

In [116]:
ser_a = pd.Series(np.arange(5))
ser_b = pd.Series(np.arange(5, 10))
pd.concat([ser_a, ser_b])

0    0
1    1
2    2
3    3
4    4
0    5
1    6
2    7
3    8
4    9
dtype: int64

If you do want the new object to have a new "organized" index you can use the ignore_index flag.

In [117]:
pd.concat([ser_a, ser_b], ignore_index=True)

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

We can concat dataframe as well:

In [118]:
df_1 = pd.DataFrame(np.zeros((3, 2)), columns=['A', 'B'])
df_2 = pd.DataFrame(np.ones((4, 2)), columns=['A', 'B'])
pd.concat([df_1, df_2], axis=0)

Unnamed: 0,A,B
0,0.0,0.0
1,0.0,0.0
2,0.0,0.0
0,1.0,1.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0


We can also use append as we saw early on with lists.

In [119]:
df_1.append(df_2)

Unnamed: 0,A,B
0,0.0,0.0
1,0.0,0.0
2,0.0,0.0
0,1.0,1.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0


# References

- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) A thorough tour into Numpy. 

![Python logo](https://www.python.org/static/community_logos/python-logo.png)