#### Pandas _Dataframe_

- A DataFrame represents a **rectangular table of data and contains an ordered collection of columns**, 
  each of which can be a different value type (numeric, string, boolean, etc.)
- DataFrame **has both a row and column index** - **We can use either of these**.
- It can also be considered as **a dictionary of Series all sharing the same index**.
- Data is stored as **one or more two-dimensional blocks** rather than a list, dict, or some other collection of one-dimensional arrays

##### Creating a dataframe

```python
pandas.DataFrame(data, index = np.arange(n), columns, dtype, copy = FALSE)
```

A dataframe can be created from:
- Lists
- dict
- Series
- Numpy ndarrays
- Another dataframe

In [66]:
import pandas as pd
import numpy as np

In [67]:
# From numpy arrays
data = np.arange(10, 100, 10)
df1 = pd.DataFrame(data)
df1

Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50
5,60
6,70
7,80
8,90


In [68]:
# From lists
marks = [[1, 'sara', 10],
         [2, 'rose', 9],
         [3, 'tim', 10],
         [4, 'john', 8.5]]

# Note how columns are specified
df2 = pd.DataFrame(marks, columns=['Roll No', 'Name', 'Mark'])
df2

Unnamed: 0,Roll No,Name,Mark
0,1,sara,10.0
1,2,rose,9.0
2,3,tim,10.0
3,4,john,8.5


In [69]:
# From dictionary
# Keys will be column names, values ndarrays of same length.
employee_data = {
                 'Name': ['ash', 'bill', 'cynthia', 'doe', 'erin', 'fred'],
                 'Salary': [1000, 1200, 1200, 1300, 1350, 1250]
                }

df_emp = pd.DataFrame(employee_data)
df_emp

Unnamed: 0,Name,Salary
0,ash,1000
1,bill,1200
2,cynthia,1200
3,doe,1300
4,erin,1350
5,fred,1250


Adding custom index

In [70]:
df_emp2 = pd.DataFrame(employee_data, index= ['r1', 'r2', 'r3', 'r4', 'r5', 'r6'])
df_emp2

Unnamed: 0,Name,Salary
r1,ash,1000
r2,bill,1200
r3,cynthia,1200
r4,doe,1300
r5,erin,1350
r6,fred,1250


In [71]:
employee_data2 = [
    {
        'Name': 'jane',
        'Salary': 50000,
        'Age': 23
    },
    {
        'Name': 'clark',
        'Salary': 50000,
    },
    {
        'Name': 'kent',
        'Salary': 35000,
        'Age': 22
    }
]

df_emp3 = pd.DataFrame(employee_data2)
df_emp3

Unnamed: 0,Name,Salary,Age
0,jane,50000,23.0
1,clark,50000,
2,kent,35000,22.0


In [72]:
# Specifying some columns only
df_emp4 = pd.DataFrame(employee_data2, columns = ['Name', 'Salary', 'Address'])
df_emp4

Unnamed: 0,Name,Salary,Address
0,jane,50000,
1,clark,50000,
2,kent,35000,


In [73]:
# From dict of Series
df5 = pd.DataFrame({'one': pd.Series([1, 2, 3]), 'two': pd.Series([1, 2, 3, 4])})
df5

Unnamed: 0,one,two
0,1.0,1
1,2.0,2
2,3.0,3
3,,4


In [74]:
# Adding column
df5['three'] = pd.Series([2, 4, 8, 16])
df5

Unnamed: 0,one,two,three
0,1.0,1,2
1,2.0,2,4
2,3.0,3,8
3,,4,16


In [75]:
# Addition of columns
df5['four'] = df5['one'] + df5['three']
df5

Unnamed: 0,one,two,three,four
0,1.0,1,2,3.0
1,2.0,2,4,6.0
2,3.0,3,8,11.0
3,,4,16,


Deleting columns

In [76]:
df5.pop('four')

0     3.0
1     6.0
2    11.0
3     NaN
Name: four, dtype: float64

Selecting a row

In [77]:
df_emp2.loc['r3'] # Or df.iloc[2]

Name      cynthia
Salary       1200
Name: r3, dtype: object

In [78]:
# Set of rows
df_emp2['r2':'r4'] # Including 'r4'

Unnamed: 0,Name,Salary
r2,bill,1200
r3,cynthia,1200
r4,doe,1300


Adding a row

In [None]:
df6 = pd.DataFrame([{"one": 1, "two": 2, "three": 3}])
df5 = df5.append(df6) # Or df5.append([{"one": 1, "two": 2, "three": 3}])

In [80]:
df5

Unnamed: 0,one,two,three
0,1.0,1,2
1,2.0,2,4
2,3.0,3,8
3,,4,16
0,1.0,2,3


Delete a row

In [82]:
df5.drop(0) # Or df.drop([...columns to drop], inplace=True, axis=1)

Unnamed: 0,one,two,three
1,2.0,2,4
2,3.0,3,8
3,,4,16
