The following code is to help you play with the concept of Dataframe in Pandas.

You can think of a Dataframe as something with rows and columns. It is
similar to a spreadsheet, a database table, or R's data.frame object.

*This playground is inspired by Greg Reda's post on Intro to Pandas Data Structures:
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

In [2]:
import numpy as np
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"


In [3]:
'''
To create a dataframe, you can pass a dictionary of lists to the Dataframe constructor:
1) The key of the dictionary will be the column name
2) The associating list will be the values within that column.
'''
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
        'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
                 'Lions', 'Lions'],
        'wins': [11, 8, 10, 15, 11, 6, 10, 4],
        'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
type(data)
# pass a dict to init df
football = pd.DataFrame(data)
#Also print the index of row which starts with 0
football



dict

Unnamed: 0,losses,team,wins,year
0,5,Bears,11,2010
1,8,Bears,8,2011
2,6,Bears,10,2012
3,1,Packers,15,2011
4,5,Packers,11,2012
5,10,Lions,6,2010
6,6,Lions,10,2011
7,12,Lions,4,2012


In [4]:
'''
Pandas also has various functions that will help you understand some basic
information about your data frame. Some of these functions are:
1) dtypes: to get the datatype for each column
2) describe: useful for seeing basic statistics of the dataframe's numerical columns
3) head: displays the first five rows of the dataset
4) tail: displays the last five rows of the dataset
'''
football.dtypes

football.describe()

football.head()

football.tail()

"\nPandas also has various functions that will help you understand some basic\ninformation about your data frame. Some of these functions are:\n1) dtypes: to get the datatype for each column\n2) describe: useful for seeing basic statistics of the dataframe's numerical columns\n3) head: displays the first five rows of the dataset\n4) tail: displays the last five rows of the dataset\n"

losses     int64
team      object
wins       int64
year       int64
dtype: object

Unnamed: 0,losses,wins,year
count,8.0,8.0,8.0
mean,6.625,9.375,2011.125
std,3.377975,3.377975,0.834523
min,1.0,4.0,2010.0
25%,5.0,7.5,2010.75
50%,6.0,10.0,2011.0
75%,8.5,11.0,2012.0
max,12.0,15.0,2012.0


Unnamed: 0,losses,team,wins,year
0,5,Bears,11,2010
1,8,Bears,8,2011
2,6,Bears,10,2012
3,1,Packers,15,2011
4,5,Packers,11,2012


Unnamed: 0,losses,team,wins,year
3,1,Packers,15,2011
4,5,Packers,11,2012
5,10,Lions,6,2010
6,6,Lions,10,2011
7,12,Lions,4,2012


## Column selection

You can think of a DataFrame as a group of Series that share an index.
This makes it easy to select specific columns that you want from the DataFrame. 

Also a couple pointers:
- Selecting a single column from the DataFrame will return a **Series**
- Selecting multiple columns from the DataFrame will return a **DataFrame**

In [8]:

football['year']  # return a Series (a column)
type(football['year'])

football.year  # use dot '.', shorthand for football['year']
football[['year', 'wins', 'losses']]  # return a DataFrame

0    2010
1    2011
2    2012
3    2011
4    2012
5    2010
6    2011
7    2012
Name: year, dtype: int64

pandas.core.series.Series

0    2010
1    2011
2    2012
3    2011
4    2012
5    2010
6    2011
7    2012
Name: year, dtype: int64

Unnamed: 0,year,wins,losses
0,2010,11,5
1,2011,8,8
2,2012,10,6
3,2011,15,1
4,2012,11,5
5,2010,6,10
6,2011,10,6
7,2012,4,12


## Row selection

Return a **DataFrame**

Some of the basic and common methods are:
- Slicing
- An individual index (through the functions iloc or loc)
- Boolean indexing

You can also combine multiple selection requirements through boolean
operators like & (and) or | (or)


In [6]:
football.iloc[[0]] # first row
football.loc[[0]] # first row
football[3:5] # [3,5) -> 3,4
football[football.wins > 10]
football[(football.wins > 10) & (football.team == "Packers")]

'\nRow selection can be done through multiple ways.\n\nSome of the basic and common methods are:\n   1) Slicing\n   2) An individual index (through the functions iloc or loc)\n   3) Boolean indexing\n\nYou can also combine multiple selection requirements through boolean\noperators like & (and) or | (or)\n'

Unnamed: 0,losses,team,wins,year
0,5,Bears,11,2010


Unnamed: 0,losses,team,wins,year
0,5,Bears,11,2010


Unnamed: 0,losses,team,wins,year
3,1,Packers,15,2011
4,5,Packers,11,2012


Unnamed: 0,losses,team,wins,year
0,5,Bears,11,2010
3,1,Packers,15,2011
4,5,Packers,11,2012


Unnamed: 0,losses,team,wins,year
3,1,Packers,15,2011
4,5,Packers,11,2012
