# DataFrames in pandas
A set of examples that exhibit some of the core features of the [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) data type in the `pandas` module.

In [23]:
import numpy as np
import pandas as pd

## Basic concept
A DataFrame is a two-dimensional tabular data struture.  It is easily visualized like a spreadsheet, with rows and columns.

In [25]:
# create a DataFrame from a dictionary containing labeled pandas Series
df = pd.DataFrame({
    'name': pd.Series( ['Foo', 'Bar', 'Baz'] ),
    'email': pd.Series( ['fo1258@foo.edu', 'br9876@foo.edu', 'bz2292@foo.edu'] ),
    'midterm exam': pd.Series( [99, 64, 87] ),
    'final exam': pd.Series( [94, 72, 81] )
})
df

Unnamed: 0,name,email,midterm exam,final exam
0,Foo,fo1258@foo.edu,99,94
1,Bar,br9876@foo.edu,64,72
2,Baz,bz2292@foo.edu,87,81


### Columns as Series
Each column is a named `pandas` Series.

In [10]:
df['midterm exam']

0    99
1    64
2    87
Name: midterm exam, dtype: int64

In [20]:
# prove that a column of a DataFrame is a Series
type( df['midterm exam'] )

pandas.core.series.Series

## Rows
Each row is also considered a `pandas` Series.

In [30]:
# get a row by its index
df.loc[1]

name                       Bar
email           br9876@foo.edu
midterm exam                64
final exam                  72
Name: 1, dtype: object

In [31]:
# prove that a row of a DataFrame is a Series
type( df.loc[1] )

pandas.core.series.Series

In [32]:
# get a row by its integer index
df.iloc[2]

name                       Baz
email           bz2292@foo.edu
midterm exam                87
final exam                  81
Name: 2, dtype: object

## Filtering rows


In [33]:
# match a criterion
df[ df['name'] == 'Bar' ]

Unnamed: 0,name,email,midterm exam,final exam
1,Bar,br9876@foo.edu,64,72


In [36]:
# match multiple criteria using & or | logic operators
df[ (df['name'] != 'Bar') & (df['midterm exam'] > 50) ]

Unnamed: 0,name,email,midterm exam,final exam
0,Foo,fo1258@foo.edu,99,94
2,Baz,bz2292@foo.edu,87,81


## Filtering columns

Extracting a **single column** is straightforward with square bracket syntax.

In [51]:
# fetch the 'name' column - this returns a Series
df['name']

0    Foo
1    Bar
2    Baz
Name: name, dtype: object

The easiest way to extract **multiple columns** from a dataframe is by supplying a list of column names.

In [50]:
# fetch the 'name' and 'final exam' columns - this returns a DataFrame
df[ ['name', 'final exam'] ]

Unnamed: 0,name,final exam
0,Foo,94
1,Bar,72
2,Baz,81


## Filtering rows and columns

It is possible to use two sets of brackets to perform both row and column filters in one expression.

In [57]:
# find one row by its index, and fetch one column from the results - this returns a single value
df.loc[2]['final exam']

81

In [52]:
# filter rows by criteria, and fetch one column from the results - this returns a Series
df[ df['name'] != 'Baz']['midterm exam']

0    99
1    64
Name: midterm exam, dtype: int64

In [49]:
# filter rows, and fetch multiple columns from the results - this returns a DataFrame
df[ df['name'] != 'Baz'][ ['name', 'midterm exam'] ] 

Unnamed: 0,name,midterm exam
0,Foo,99
1,Bar,64
