### pandas.DataFrame

A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

### Create DataFrame
A pandas DataFrame can be created using various inputs like −

    1.Lists
    2.dict
    3.Series
    4.Numpy ndarrays
    5.Another DataFrame
    
In the subsequent sections of this chapter, we will see how to create a DataFrame using these inputs.

In [1]:
## Create an Empty DataFrame
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


### Create a DataFrame from Lists
The DataFrame can be created using a single list or a list of lists.

In [2]:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

   0
0  1
1  2
2  3
3  4
4  5


In [3]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
print(data)
print("\n")
df = pd.DataFrame(data,columns=['Name','Roll No'])
print(df)

[['Alex', 10], ['Bob', 12], ['Clarke', 13]]


     Name  Roll No
0    Alex       10
1     Bob       12
2  Clarke       13


In [3]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)

     Name   Age
0    Alex  10.0
1     Bob  12.0
2  Clarke  13.0


### Create a DataFrame from Dict of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be <b>range(n)</b>, where n is the array length.

In [4]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print (df)

   Age   Name
0   28    Tom
1   34   Jack
2   29  Steve
3   42  Ricky


Let us now create an indexed DataFrame using arrays.

In [5]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)

       Age   Name
rank1   28    Tom
rank2   34   Jack
rank3   29  Steve
rank4   42  Ricky


### Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

In [7]:
import pandas as pd
data = [{'a': 1, 'b': 2,'z':10},{'a': 5, 'b': 10, 'c': 20,'z':100}]
df = pd.DataFrame(data)
print(df)

   a   b     c    z
0  1   2   NaN   10
1  5  10  20.0  100


In [8]:
#The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices.

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print(df)

        a   b     c
first   1   2   NaN
second  5  10  20.0


In [15]:
## The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.

import pandas as pd
data = [{'a': 1, 'b': 2, },{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data,columns=['b', 'a'],index=['first', 'second'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'])
print (df1)
print("\n")
print (df2)

         b  a
first    2  1
second  10  5


        a   b     c
first   1   2   NaN
second  5  10  20.0


### Addition of Rows

Add new rows to a DataFrame using the append function. This function will append the rows at the end.

In [5]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print(df)

   a  b
0  1  2
1  3  4
0  5  6
1  7  8


### Deletion of Rows

Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows will be dropped.

If you observe, in the above example, the labels are duplicate. Let us drop a label and will see how many rows will get dropped.

In [7]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

print(df)

# Drop rows with label 0
df = df.drop(1)

print(df)

   a  b
0  1  2
1  3  4
0  5  6
1  7  8
   a  b
0  1  2
0  5  6


### Slice Rows

Multiple rows can be selected using ‘ : ’ operator.

In [11]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)
print("\n")
print(df[2:4])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


   one  two
c  3.0    3
d  NaN    4


### Row Selection, Addition, and Deletion

We will now understand row selection, addition and deletion through examples. Let us begin with the concept of selection.

In [19]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)
print("\n")
print (df.loc['d'])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


one    NaN
two    4.0
Name: d, dtype: float64


In [9]:
##Selection by integer location
##Rows can be selected by passing integer location to an iloc function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.iloc[3])

one    NaN
two    4.0
Name: d, dtype: float64


In [10]:
#Slice Rows

#Multiple rows can be selected using ‘ : ’ operator.

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)
print("\n")
print (df[2:4])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


   one  two
c  3.0    3
d  NaN    4
