# Elements of a Pandas Dataframe

In [1]:
import pandas as pd
import numpy as np

## Components of a DataFrame 
A DataFrame has:
1. column labels
2. row labels
3. values

In [2]:
columns = ['Col1', 'Col2', 'Col3']
index = ['Row1', 'Row2', 'Row3']
data = [[ 0, 1, 2],
       [ 10, 11, 12],
       [ 20, 21, 22]]

df = pd.DataFrame(data=data, columns=columns, index=index)
df

Unnamed: 0,Col1,Col2,Col3
Row1,0,1,2
Row2,10,11,12
Row3,20,21,22


In [3]:
# column labels as list
df.columns.tolist()

['Col1', 'Col2', 'Col3']

In [4]:
# row labels as list
df.index.tolist()

['Row1', 'Row2', 'Row3']

In [5]:
# data as nested list
df.values.tolist()

[[0, 1, 2], [10, 11, 12], [20, 21, 22]]

## DataFrame Column and Row Labels

Column labels and row labels are each instances of pd.Index (or one of its subclasses).

As they are instances of the same class, the same methods can be applied to each.

df.columns contains the column labels.  
df.index contains the row labels.  

In [6]:
# column index
df.columns

Index(['Col1', 'Col2', 'Col3'], dtype='object')

In [7]:
isinstance(df.columns, pd.Index)

True

In [8]:
# the index is iterable
[col for col in df.columns]

['Col1', 'Col2', 'Col3']

In [9]:
# row index
df.index

Index(['Row1', 'Row2', 'Row3'], dtype='object')

In [10]:
isinstance(df.index, pd.Index)

True

In [11]:
# the index is iterable
[row for row in df.index]

['Row1', 'Row2', 'Row3']

## Dataframe Values

Each column of a dataframe is a pd.Series containing a single datatype.

Pandas DataFrames and Series are built on top of numpy.  A pd.Series allows only a single data type, just as a numpy array only allows a single data type.

In [12]:
# single dataframe column
s_col1 = df['Col1']
type(s_col1)

pandas.core.series.Series

In [13]:
# many dataframe operations create a view into the data, not a copy
s_col1 is df['Col1']

True

In [14]:
# easy to make an explict copy
s_col1 = df['Col1'].copy()
s_col1 is df['Col1']

False

In [15]:
s_col1.dtype

dtype('int64')

In [16]:
s_col1.shape

(3,)

## Take Care to Distinguish Between pd.Index and df.index

**pd.Index:** This is a class.  
**df.columns:** This is an instance of pd.Index (or one of its subclasses)  
**df.index:**   This is an instance of pd.Index (or one of its subclasses)