## DatFrames

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

In [1]:
import numpy as np
import pandas as pd

In [2]:
people_df = pd.read_csv("people-example.csv")

people_df

Unnamed: 0,First Name,Last Name,Country,age
0,Bob,Smith,United States,24
1,Alice,Williams,Canada,23
2,Malcolm,Jone,England,22
3,Felix,Brown,USA,23
4,Alex,Cooper,Poland,23
5,Tod,Campbell,United States,22
6,Derek,Ward,Switzerland,25


In [3]:
people_df.head() # Prints out first 5 rows

Unnamed: 0,First Name,Last Name,Country,age
0,Bob,Smith,United States,24
1,Alice,Williams,Canada,23
2,Malcolm,Jone,England,22
3,Felix,Brown,USA,23
4,Alex,Cooper,Poland,23


In [4]:
people_df.tail() # Prints out last 5 rows

Unnamed: 0,First Name,Last Name,Country,age
2,Malcolm,Jone,England,22
3,Felix,Brown,USA,23
4,Alex,Cooper,Poland,23
5,Tod,Campbell,United States,22
6,Derek,Ward,Switzerland,25


In [5]:
people_df.head(2) # prints out first 2 rows, similarly for tail()

Unnamed: 0,First Name,Last Name,Country,age
0,Bob,Smith,United States,24
1,Alice,Williams,Canada,23


In [6]:
# Let's get the column names
people_df.columns

Index(['First Name', 'Last Name', 'Country', 'age'], dtype='object')

In [8]:
people_df["First Name"] # One way of getting the value of a column

0        Bob
1      Alice
2    Malcolm
3      Felix
4       Alex
5        Tod
6      Derek
Name: First Name, dtype: object

In [9]:
people_df.Country # Second way of getting the value of a column

0    United States
1           Canada
2          England
3              USA
4           Poland
5    United States
6      Switzerland
Name: Country, dtype: object

In [10]:
# we can create a new dataframe taking the subset of columns from our main dataframe
names_df = pd.DataFrame(people_df, columns = ["First Name", "Last Name"])

names_df

Unnamed: 0,First Name,Last Name
0,Bob,Smith
1,Alice,Williams
2,Malcolm,Jone
3,Felix,Brown
4,Alex,Cooper
5,Tod,Campbell
6,Derek,Ward


In [11]:
# We can add a new column in the dataframe and pandas will automatically add NaN to the newly added column
names_df = pd.DataFrame(people_df, columns = ["Title", "First Name", "Last Name"])

names_df

Unnamed: 0,Title,First Name,Last Name
0,,Bob,Smith
1,,Alice,Williams
2,,Malcolm,Jone
3,,Felix,Brown
4,,Alex,Cooper
5,,Tod,Campbell
6,,Derek,Ward


In [12]:
# We can also retrieve row by index
people_df.ix[4]

First Name      Alex
Last Name     Cooper
Country       Poland
age               23
Name: 4, dtype: object

In [15]:
names_df["Title"] = "Dr."

names_df

Unnamed: 0,Title,First Name,Last Name
0,Dr.,Bob,Smith
1,Dr.,Alice,Williams
2,Dr.,Malcolm,Jone
3,Dr.,Felix,Brown
4,Dr.,Alex,Cooper
5,Dr.,Tod,Campbell
6,Dr.,Derek,Ward


In [16]:
titles = ["Mr.", "Mrs.", "Dr.", "Er.", "Mrs.", "Mr.", "Mr."]

names_df["Title"] = titles

names_df

Unnamed: 0,Title,First Name,Last Name
0,Mr.,Bob,Smith
1,Mrs.,Alice,Williams
2,Dr.,Malcolm,Jone
3,Er.,Felix,Brown
4,Mrs.,Alex,Cooper
5,Mr.,Tod,Campbell
6,Mr.,Derek,Ward


In [19]:
# We can add the titles like a Series

names_df["Title"] = "" # Lets remove all the Titles

titles = pd.Series(["Mr.", "Mrs."], index=[2, 4])

names_df["Title"] = titles

names_df
# Notice how the two titles were added on the indices 2, 4

Unnamed: 0,Title,First Name,Last Name
0,,Bob,Smith
1,,Alice,Williams
2,Mr.,Malcolm,Jone
3,,Felix,Brown
4,Mrs.,Alex,Cooper
5,,Tod,Campbell
6,,Derek,Ward


In [20]:
# We can also delete a column
del names_df["Title"]

names_df

Unnamed: 0,First Name,Last Name
0,Bob,Smith
1,Alice,Williams
2,Malcolm,Jone
3,Felix,Brown
4,Alex,Cooper
5,Tod,Campbell
6,Derek,Ward


In [21]:
# We can also create a dataframe using a dictionary
fruits = { "Apple":[5, 8, 3], "Pear":[23, 49, 72], "Mango":[9, 6, 2] }

fruits_df = pd.DataFrame(fruits)

fruits_df

Unnamed: 0,Apple,Mango,Pear
0,5,9,23
1,8,6,49
2,3,2,72
