# Pandas DataFrames

The Pandas `DataFrame` is similar to R's `data.frame` object: it is defined as a two-dimensional labeled data structure with columns of potentially different types.

In general, the Pandas `DataFrame` consists of three main components: the data, the index, and the columns.

A DataFrame can contain data that is:
* a Pandas DataFrame
* a Pandas Series: a one-dimensional labeled array capable of holding any data type with axis labels or index. An example of a Series object is one column from a DataFrame.
* a Numpy ndarray, which can be a record or structured
* a two-dimensional ndarray
* dictionaries of one-dimensional ndarrays, lists, dictionaries or Series.

Note that `np.ndarray` is the actual data type, while `np.array()` is a function to make arrays from other data structures.

*Structured arrays* allow data to be manipulated by named fields: in the first example below, a structured array of three tuples is created. The first element of each tuple will be called ‘foo’ and will be of type int, while the second element will be named ‘bar’ and will be a float.

*Record arrays* expand the properties of structured arrays. They enable fields of structured arrays to be accessed by attribute rather than by index. You see below that the ‘foo’ values are accessed in the r2 record array.

In [8]:
import numpy as np
import pandas as pd

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [10]:
# A structured array
my_array = np.ones(3, dtype=([('foo', int), ('bar', float)]))
my_array['foo']

# A record array
my_array2 = my_array.view(np.recarray)
my_array2.foo

array([1, 1, 1])

array([1, 1, 1])

Besides `data` you can also specify the `index` and `column` names. The `index` specifies rows and `column` specifies column names. These two components of the `DataFrame` are incredibly useful for manipulating data.

## Creating DataFrames

We can both create a `DataFrame` from scratch, or convert other data structures. We will start with the latter approach.

To make a `DataFrame` from a `NumPy` array, simply pass it to `pandas.DataFrame()` in the `data` argument:

In [17]:
data = np.array([['','Col1','Col2'],
                ['Row1',1,2],
                ['Row2',3,4]])
                
df = pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:])
df

Unnamed: 0,Col1,Col2
Row1,1,2
Row2,3,4


This approach to making data frames will be the same for all the structures that `DataFrame()` can take on as input, such as dictionaries, `Series` and other `DataFrame`s.

In [14]:
# 2D array to DataFrame 
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
pd.DataFrame(my_2darray)

# dictionary to DataFrame 
my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
pd.DataFrame(my_dict)

# list to DataFrame 
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
pd.DataFrame(my_df)

# Series to DataFrame
my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United States":"Washington", "Belgium":"Brussels"})
pd.DataFrame(my_series)
# Note that the index contains the keys of the original dictionary, but that they are sorted.

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6


Unnamed: 0,1,2,3
0,1,1,2
1,3,2,4


Unnamed: 0,A
0,4
1,5
2,6
3,7


Unnamed: 0,0
Belgium,Brussels
India,New Delhi
United Kingdom,London
United States,Washington


## Interrogating the shape of a DataFrame

Use the `shape` property or the `len()` function in combination with the `index` property.

These two options return slightly different information: 

* `shape` returns the DateaFrame's dimensions in the format (number of rows, number of columns). 
* `len()` in combination with the `index` property returns the number of rows.

To get the names of a DataFrame's columns, do `list(my_dataframe.columns.values)`

In [21]:
df
df.shape
len(df.index)
list(df.columns.values)

Unnamed: 0,Col1,Col2
Row1,1,2
Row2,3,4


(2, 2)

2

['Col1', 'Col2']

## Selecting from a DataFrame