## DataFrame
A DataFrame represents a rectangular table of data and contains an ordered collec‐
tion of columns, each of which can be a different value type (numeric, string,
boolean, etc.). The DataFrame has both a row and column index; it can be thought of
as a dict of Series all sharing the same index. Under the hood, the data is stored as one
or more two-dimensional blocks rather than a list, dict, or some other collection of
one-dimensional arrays

There are many ways to construct a DataFrame, though one of the most common is
from a dict of equal-length lists or NumPy arrays:

In [2]:
import pandas as pd
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002, 2003],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}

frame = pd.DataFrame(data)
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


Selecting the head and the tail of the objects


In [10]:
frame.head()# default, no parameter pass gives first five heads in the data. You specify the no of heads to get

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [12]:
frame.tail(3)

Unnamed: 0,state,year,pop
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


If you specify a sequence of columns, the DataFrame’s columns will be arranged in
that order:

In [15]:
pd.DataFrame(data, columns=[ 'state', 'pop','year'])

Unnamed: 0,state,pop,year
0,Ohio,1.5,2000
1,Ohio,1.7,2001
2,Ohio,3.6,2002
3,Nevada,2.4,2001
4,Nevada,2.9,2002
5,Nevada,3.2,2003


Let's change the default index to something better

In [4]:

pd.DataFrame(data, columns=[ 'state', 'pop','year'], index = ['245','244','765', '897','765','897'])




Unnamed: 0,state,pop,year
245,Ohio,1.5,2000
244,Ohio,1.7,2001
765,Ohio,3.6,2002
897,Nevada,2.4,2001
765,Nevada,2.9,2002
897,Nevada,3.2,2003


In [11]:
# Access columns
Frame1 = pd.DataFrame(data, columns=[ 'state', 'pop','year'], index = ['245','244','765', '897','765','897'])
# Frame1=['state']
Frame1.state

245      Ohio
244      Ohio
765      Ohio
897    Nevada
765    Nevada
897    Nevada
Name: state, dtype: object

In [15]:
#Rows can also be retrieved by position or name with the special loc attribute
Frame1.loc['245']

state    Ohio
pop       1.5
year     2000
Name: 245, dtype: object

Columns can be modified by assignment. For example, the empty 'debt' column
could be assigned a scalar value or an array of values:

In [46]:
import numpy as np

Frame1 = pd.DataFrame(data, columns=[ 'state', 'pop','year'], index = ['245','244','765', '897','765','897'])

Frame1.index.name = "country code"#name the index
Frame1['debt'] = np.arange(6.)# adding another colum
Frame1

Unnamed: 0_level_0,state,pop,year,debt
country code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
245,Ohio,1.5,2000,0.0
244,Ohio,1.7,2001,1.0
765,Ohio,3.6,2002,2.0
897,Nevada,2.4,2001,3.0
765,Nevada,2.9,2002,4.0
897,Nevada,3.2,2003,5.0
