### Create DataFrames 

Since a new concept is being introduced, it is beneficial to explore the concept first using simple DataFrames. Once you understand the usage and the capabilities of these concepts, you can think of ways to apply these capabilities as and when needed. 
 

In [1]:
import pandas as pd

In [2]:
df_1 = {"col1":[1,2,3,4], "col2": [5,6,7,8]}
df_2 = {"col1":[11,12,13,14], "col2": [15,16,17,18]}

In [3]:
df1 = pd.DataFrame(df_1)
df2 = pd.DataFrame(df_2)

In [4]:
df1

Unnamed: 0,col1,col2
0,1,5
1,2,6
2,3,7
3,4,8


In [5]:
df2

Unnamed: 0,col1,col2
0,11,15
1,12,16
2,13,17
3,14,18


### Concatenation 

It is used when you want to stick two dataframes together without any consideration given to matching elements. In contrast, the merge command uses a key to stitch two data frames together. 

If the shape of the two concatenating dataframes does not match, NaN values are added to make the dimensions uniform. 


In [7]:
pd.concat([df1, df2], axis = 0)

# Axis 0 represents row wise concatenation

Unnamed: 0,col1,col2
0,1,5
1,2,6
2,3,7
3,4,8
0,11,15
1,12,16
2,13,17
3,14,18


**NOTE**

- Rows in df2 get added to the df1
- Intexes of df2 remain the same as they were before the join. 

In [8]:
pd.concat([df1, df2], axis = 1)

# Axis 0 represents column wise concatenation

Unnamed: 0,col1,col2,col1.1,col2.1
0,1,5,11,15
1,2,6,12,16
2,3,7,13,17
3,4,8,14,18


In [10]:
df1["col3"] = df1["col1"] + df1["col2"]

# After this operation df1 will have 3 columns while df2 has only 2. 

In [11]:
pd.concat([df1, df2], axis = 0)

Unnamed: 0,col1,col2,col3
0,1,5,6.0
1,2,6,8.0
2,3,7,10.0
3,4,8,12.0
0,11,15,
1,12,16,
2,13,17,
3,14,18,


Since there is one extra column in df1, the corresponding vales in df2 become `NaN` or null values. 

### Arithmetic Operators on DataFrames

You can perform element wise operations on dataframes as well. These are very similar to operations you performed on NumPy arrays. 

for example, if you want to add all the elements on `df1` to the correspopnding elements on `df2` you can use the '+' operator. 

In [12]:
df1 + df2 

Unnamed: 0,col1,col2,col3
0,12,20,
1,14,22,
2,16,24,
3,18,26,


As you saw all the elements in `df1` got added to corresponding elements in `df2`
 
But the `df1` had three columns while `df2` had two. So the operation for the third column is incomplete, that is why you see the null values in the result. This is the most significant difference in using operators in pandas and NumPy; this operation would have thrown an error if it was executed using NumPy arrays.  

The same result can be achieved by the `add()` method

In [13]:
df1.add(df2)

Unnamed: 0,col1,col2,col3
0,12,20,
1,14,22,
2,16,24,
3,18,26,


Along with the normal addition this add method also provides additional functionalities. You can read about them [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html)

similar to the '+' operator and the `add()` there are other operators as well 

- `sub()`: ' - '
- `mul()`: ' * '
- `div()`: ' / '
- `floordiv()`: ' // '
- `mod()`: ' % ' 
- `pow()`: ' ** '

In [17]:
# recreating the DataFrames so that the dimentions match. 

df_1 = {"col1":[1,2,3,4], "col2": [5,6,7,8]}
df_2 = {"col1":[11,12,13,14], "col2": [15,16,17,18]}

df1 = pd.DataFrame(df_1)
df2 = pd.DataFrame(df_2)

print (df1)
print (df2)

   col1  col2
0     1     5
1     2     6
2     3     7
3     4     8
   col1  col2
0    11    15
1    12    16
2    13    17
3    14    18


In [18]:
df2 - df1

Unnamed: 0,col1,col2
0,10,10
1,10,10
2,10,10
3,10,10


In [19]:
df2 ** df1

Unnamed: 0,col1,col2
0,11,759375
1,144,16777216
2,2197,410338673
3,38416,11019960576


In [20]:
# recreating the DataFrames so that the dimentions match. 

df_1 = {"col1":[1,2,3,4], "col2": [5,6,7,8]}
df_2 = {"col1":[11,12,13,14]}

df1 = pd.DataFrame(df_1)
df2 = pd.DataFrame(df_2)

print (df1)
print (df2)

   col1  col2
0     1     5
1     2     6
2     3     7
3     4     8
   col1
0    11
1    12
2    13
3    14


In [25]:
df1 + df2

Unnamed: 0,col1,col2
0,12,
1,14,
2,16,
3,18,


One of the advantages of pandas DataFrame is that it can hold data of different data types. 
 
Which leads us to the question What would happen of operators were used on DataFrames which have "non-numerical" data types?

In [21]:
df_1 = {"col1":[1,2,3,4], "col2": [5,6,7,8], "col3": [True,False,False,True], "col4": ["a","b","c","d"] }
df_2 = {"col1":[11,12,13,14], "col2": [15,16,17,18], "col3": [True,False,True,False], "col4": ["e","f","g","h"]}

df1 = pd.DataFrame(df_1)
df2 = pd.DataFrame(df_2)

print (df1)
print (df2)

   col1  col2   col3 col4
0     1     5   True    a
1     2     6  False    b
2     3     7  False    c
3     4     8   True    d
   col1  col2   col3 col4
0    11    15   True    e
1    12    16  False    f
2    13    17   True    g
3    14    18  False    h


In [22]:
df1 +df2 

  f"evaluating in Python space because the {repr(op_str)} "


Unnamed: 0,col1,col2,col3,col4
0,12,20,True,ae
1,14,22,False,bf
2,16,24,True,cg
3,18,26,True,dh


Something very interesting has happened. 
 
Pandas was smart enough to recognise the different data types and use the operators accordingly. 
 
- For int data type, it performed addition 
- For boolean, it performed OR operation
- For string, it performed concatenation 

In [24]:
df1 - df2

TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

This throws an error because there is not '-' in strings and pandas cannot figure out what to do. 

In [25]:
df_1 = {"col1":[1,2,3,4], "col2": [5,6,7,8], "col3": [True,False,False,True], "col4": ["a","b","c","d"] }
df_2 = {"col1": [True,False,True,False], "col2": ["e","f","g","h"], "col3":[11,12,13,14], "col4": [15,16,17,18] }

df1 = pd.DataFrame(df_1)
df2 = pd.DataFrame(df_2)

print (df1)
print (df2)

   col1  col2   col3 col4
0     1     5   True    a
1     2     6  False    b
2     3     7  False    c
3     4     8   True    d
    col1 col2  col3  col4
0   True    e    11    15
1  False    f    12    16
2   True    g    13    17
3  False    h    14    18


In [26]:
df1 + df2

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Since the data types of correcponding columns do not match Pandas throws a type error. 

### Summary

##### 1. `Concatenation` : Used when you want to stich to dataframes together without any reguard to the values. 
a. Even if the shapes do not match the operation is performed. Filling Null values wherever necessary. 
##### 2. `operators` : Can perform element wise operations on Pandas DataFrames. 
a. You can use operators themselves '+' or the function `add()` for the same result.  
b. If the Shape does not match then null values are added. 
c. Can work with differnet data types as well, as long as the operation is defined for that data type. 

In [27]:
import numpy as np 
import pandas as pd

# Defining the three dataframes indicating the gold, silver, and bronze medal counts
# of different countries
gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
                         'Medals': [15, 13, 9]}
                    )
silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],
                        'Medals': [29, 20, 16]}
                    )
bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],
                        'Medals': [40, 28, 27]}
                    )
print(gold)
print(silver)

  Country  Medals
0     USA      15
1  France      13
2  Russia       9
   Country  Medals
0      USA      29
1  Germany      20
2   Russia      16


In [31]:
gold_silver = gold.add(silver, fill_value=0 )
gold_silver

Unnamed: 0,Country,Medals
0,USAUSA,44
1,FranceGermany,33
2,RussiaRussia,25


In [36]:
df1 = gold.merge(silver, how='outer')
df2 = df1.merge(bronze, how='outer')
df2

Unnamed: 0,Country,Medals
0,USA,15
1,France,13
2,Russia,9
3,USA,29
4,Germany,20
5,Russia,16
6,France,40
7,USA,28
8,UK,27


In [39]:
df2.groupby('Country').sum().sort_values('Medals', ascending=False)

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
USA,72
France,53
UK,27
Russia,25
Germany,20
