# DataFrames
- We can think of DataFrames as a bunch of series put together to share the same index.
- Co-relate with tables from DBMS.
- DataFrames contain three arguments - `data`, `index` and `columns`.
- Data and index are same as Series, but column here specifies the name of individual columns.

## In this notebook:
- Creating a DataFrame
- Creating a new column
- Selecting rows and columns
- Deleting rows and columns

## Commonly used functions:
- `dataFrame()` : used to create a dataframe
- `drop()` : used to remove a row or column
- `loc()` : used to select a row
- `iloc()` : used to select a row based upon it's index

In [1]:
import numpy as np
import pandas as pd

In [2]:
from numpy.random import randn

In [3]:
np.random.seed(101) # to ensure same random numbers are generated each time

### Creating a DataFrame
- Three key requirments for a DataFrame:
    - `data` : Data to put in a DataFrame
    - `index` : The names of the rows
    - `columns` : The names of the columns

In [22]:
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
# Creates a 5 x 4 table with A,B,C,D,E as row names and W,X,Y,Z as column names

In [5]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


### Selecting a column
`Syntax`
```
    df[columnName] where columnName is the name of the column to be selected
```

In [6]:
df['W'] # displays 'W' column

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

### Selecting more than one columns
`Syntax`
```
    df[list of columns] where list of columns is a list containing the names of the columns to be selected
```

In [9]:
df[['W','X']]

Unnamed: 0,W,X
A,2.70685,0.628133
B,0.651118,-0.319318
C,-2.018168,0.740122
D,0.188695,-0.758872
E,0.190794,1.978757


### Creating a new column
```
    df[newColumn] = data , where data is any series with same index as DataFrame
```

In [24]:
df ['A'] = pd.Series(np.random.randn(5),index='A B C D E'.split())

In [25]:
df

Unnamed: 0,W,X,Y,Z,A
A,0.484752,-0.116773,1.901755,0.238127,1.02481
B,1.996652,-0.993263,0.1968,-1.136645,-0.925874
C,0.000366,1.025984,-0.156598,-0.031579,1.862864
D,0.649826,2.154846,-0.610259,-0.755325,-1.133817
E,-0.346419,0.147027,-0.479448,0.558769,0.610478


### Removing a row or column
- drop() is used to drop a row or a column
`Syntax`
```
    df.drop(rowName) , Drop a row
    df.drop(columnName,axis = 1) , Drop a column
```

- To delete in actual instead of just viweing use `inPlace = True`
#### Note how axis  =  1 is when dropping a column

In [26]:
df.drop('E')

Unnamed: 0,W,X,Y,Z,A
A,0.484752,-0.116773,1.901755,0.238127,1.02481
B,1.996652,-0.993263,0.1968,-1.136645,-0.925874
C,0.000366,1.025984,-0.156598,-0.031579,1.862864
D,0.649826,2.154846,-0.610259,-0.755325,-1.133817


In [27]:
df.drop('A',axis = 1,inplace=True) # drop the newly created column INPLACE

In [28]:
df

Unnamed: 0,W,X,Y,Z
A,0.484752,-0.116773,1.901755,0.238127
B,1.996652,-0.993263,0.1968,-1.136645
C,0.000366,1.025984,-0.156598,-0.031579
D,0.649826,2.154846,-0.610259,-0.755325
E,-0.346419,0.147027,-0.479448,0.558769


### Selecting a row
- loc() is used to select a row
- iloc() is used to select a row based upon it's index
`Syntax`
```
    df.loc[rowName] , where rowName is the name of the row to be retrieved
    df.iloc[inex] , where index is the index of the row to be retrieved
```

In [30]:
df.loc['C'] # selecting row C

W    0.000366
X    1.025984
Y   -0.156598
Z   -0.031579
Name: C, dtype: float64

In [31]:
df.iloc[2] # also selects row C, but this time based upon index

W    0.000366
X    1.025984
Y   -0.156598
Z   -0.031579
Name: C, dtype: float64

### Selecting a single cell
`Syntax`
```
    df.loc[row,column]
```

In [32]:
df.loc['C','Y'] # selecting C row and Y element

-0.1565979042889875