# ```Series``` and ```DataFrame``` Functions

## Reindexing

Changing the order of the rows and columns in a ```Series``` or a ```DataFrame``` is a purpose of the ```reindexing``` function.

In [1]:
import numpy as np

import pandas as pd

# Create DataFrame

raw_data = {'first_name': ['Jason','Molly','Tina','Jake','Amy'], 'last_name': ['Miller','Jacobson','Alison','Milner','Cooze'], 
           'age': [42,52,36,24,73], 'preTestScore': [4,24,31,2,3], 'postTestScore': [25,94,57,62,70]}

df = pd.DataFrame(raw_data)
print (df)

   age first_name last_name  postTestScore  preTestScore
0   42      Jason    Miller             25             4
1   52      Molly  Jacobson             94            24
2   36       Tina    Alison             57            31
3   24       Jake    Milner             62             2
4   73        Amy     Cooze             70             3


In [2]:
# reindex or change the order of rows

df.reindex ([3,1,4,0,2])

Unnamed: 0,age,first_name,last_name,postTestScore,preTestScore
3,24,Jake,Milner,62,2
1,52,Molly,Jacobson,94,24
4,73,Amy,Cooze,70,3
0,42,Jason,Miller,25,4
2,36,Tina,Alison,57,31


Note: If we invoke a ```Series``` or ```DataFrame``` using an input list containing a label that is not in the original DataFrame index, the new row is filled with null value or NaN.

In [3]:
# reindex or change the order of rows with new inputs

df.reindex ([3,1,7,4,0,2,6])

Unnamed: 0,age,first_name,last_name,postTestScore,preTestScore
3,24.0,Jake,Milner,62.0,2.0
1,52.0,Molly,Jacobson,94.0,24.0
7,,,,,
4,73.0,Amy,Cooze,70.0,3.0
0,42.0,Jason,Miller,25.0,4.0
2,36.0,Tina,Alison,57.0,31.0
6,,,,,


In [4]:
# reindex or change the order of columns

columnsTitles = ['first_name','last_name','age','preTestScore', 'postTestScore']

df.reindex (columns = columnsTitles)

Unnamed: 0,first_name,last_name,age,preTestScore,postTestScore
0,Jason,Miller,42,4,25
1,Molly,Jacobson,52,24,94
2,Tina,Alison,36,31,57
3,Jake,Milner,24,2,62
4,Amy,Cooze,73,3,70


##### Practice: Set new columnsTitles with new indexes and see the result if you add an input which is not in the DataFrame. 

In [5]:
# One more example:

Score = {'student1' : pd.Series([100, 93,87,100], index=['score1', 'score2', 'score3', 'score4']),
      'student2' : pd.Series([93,96,79,98], index=['score1', 'score2', 'score3', 'score4']),
         'student3' : pd.Series([100,99,96,89], index=['score1', 'score2', 'score3', 'score4'])}

df = pd.DataFrame(Score)
print (df)

        student1  student2  student3
score1       100        93       100
score2        93        96        99
score3        87        79        96
score4       100        98        89


In [6]:
df.reindex (['score2', 'score4', 'score1', 'score5'])


Unnamed: 0,student1,student2,student3
score2,93.0,96.0,99.0
score4,100.0,98.0,89.0
score1,100.0,93.0,100.0
score5,,,


## How select multiple rows and columns from a ```DataFrame```

#### By using integer labels```.iloc``` and axis labels```.loc``` functions, you are enable to select multiple rows and columns from a ```DataFrame```

### ```.iloc``` function

In [7]:
raw_data = {'first_name': ['Jason','Molly','Tina','Jake','Amy'], 'last_name': ['Miller','Jacobson','Alison','Milner','Cooze'], 
           'age': [42,52,36,24,73], 'preTestScore': [4,24,31,2,3], 'postTestScore': [25,94,57,62,70]}

df = pd.DataFrame(raw_data)
print (df)

   age first_name last_name  postTestScore  preTestScore
0   42      Jason    Miller             25             4
1   52      Molly  Jacobson             94            24
2   36       Tina    Alison             57            31
3   24       Jake    Milner             62             2
4   73        Amy     Cooze             70             3


In [8]:
# If we run this code, we will get a single row 
df.iloc[3]

age                  24
first_name         Jake
last_name        Milner
postTestScore        62
preTestScore          2
Name: 3, dtype: object

For getting the result in DataFrame format, we can pass this number in a list like:

In [9]:
df.iloc[[3]]

Unnamed: 0,age,first_name,last_name,postTestScore,preTestScore
3,24,Jake,Milner,62,2


In [10]:
df.iloc[[-1]]

Unnamed: 0,age,first_name,last_name,postTestScore,preTestScore
4,73,Amy,Cooze,70,3


In [11]:
#Selecting more than one row using .iloc 
df.iloc[[0,2]]

Unnamed: 0,age,first_name,last_name,postTestScore,preTestScore
0,42,Jason,Miller,25,4
2,36,Tina,Alison,57,31


In [12]:
#everything left to the comma belongs to rows and everything right to the comma belongs to the column.

df.iloc[[0,2],[1]]

Unnamed: 0,first_name
0,Jason
2,Tina


In [13]:
df.iloc[0:3,1:3]

Unnamed: 0,first_name,last_name
0,Jason,Miller
1,Molly,Jacobson
2,Tina,Alison


### ```.loc``` function

```loc``` function operates on the labels in rows or columns

In [14]:
#example (introducing a data frame)

Score = {'student1' : pd.Series([100, 93,87,100], index=['score1', 'score2', 'score3', 'score4']),
      'student2' : pd.Series([93,96,79,98], index=['score1', 'score2', 'score3', 'score4']),
         'student3' : pd.Series([100,99,96,89], index=['score1', 'score2', 'score3', 'score4'])}

df = pd.DataFrame(Score)
print (df)

        student1  student2  student3
score1       100        93       100
score2        93        96        99
score3        87        79        96
score4       100        98        89


In [15]:
df.loc['score3']

student1    87
student2    79
student3    96
Name: score3, dtype: int64

For getting the result in DataFrame format, we can pass this number in a list like:

In [16]:
df.loc[['score3']]

Unnamed: 0,student1,student2,student3
score3,87,79,96


In [17]:
#everything left to the comma belongs to rows and everything right to the comma belongs to the column.

df.loc[['score2','score3'],['student2']]

Unnamed: 0,student2
score2,96
score3,79


In [18]:
df.loc['score1':'score2','student2':'student3']

Unnamed: 0,student2,student3
score1,93,100
score2,96,99


## Arithmetic Operations

#### ```add()```

#### ```sub()```

#### ````mul()````

#### ````div()````


In [19]:
# example of applying arithmetic operations

df = pd.DataFrame ({'first': pd.Series(np.random.randn(4), index = ['a','b','c','d']), 
                    'second': pd.Series(np.random.randn(4), index = ['a','b','c','d']), 
                    'third': pd.Series(np.random.randn(4), index = ['a','b','c','d'])})

print (df)

      first    second     third
a  0.938745  0.569126 -0.291770
b -1.917139 -0.005395  0.908787
c  0.662299  0.564005  0.128514
d -0.394100  1.383906  0.359810


In [20]:
row = df.iloc[3]


df.add(row, axis =1)

Unnamed: 0,first,second,third
a,0.544645,1.953033,0.06804
b,-2.31124,1.378511,1.268597
c,0.268199,1.947911,0.488324
d,-0.788201,2.767813,0.71962


In [21]:
df.sub(row, axis = 1)

Unnamed: 0,first,second,third
a,1.332845,-0.81478,-0.651579
b,-1.523039,-1.389302,0.548977
c,1.0564,-0.819902,-0.231296
d,0.0,0.0,0.0


In [22]:
df.mul(row, axis = 1)

Unnamed: 0,first,second,third
a,-0.36996,0.787618,-0.104982
b,0.755545,-0.007467,0.32699
c,-0.261012,0.78053,0.046241
d,0.155315,1.915197,0.129463
