## Pandas for the NBA Lover
## Introduction to the DataFrame

Welcome to the tutorial! 

For our first introduction to pandas, we will be examining how the NBA has changed over the past 20 years. By doing so, we will learn about the **DataFrame**, the core data structure of the pandas library. 

By the end of this tutorial, you will be familiar with the basics of a DataFrame. You will also be familiar with the two pandas objects that help power the DataFrame, as well as some terms, definitions, and rules that relate to how DataFrame's are used.

---

### 2.1 What is a DataFrame?

First things first, let's import the pandas library.

In [1]:
import pandas as pd

A DataFrame is a pandas data structure that makes it easy to work with tabular data.

Data can be loaded into a DataFrame in a variety of ways, some of which we will use in later sections.  Let's start  by loading a `.csv` file containing averages statistics from the past 20 NBA regular seasons. The first few rows of this file look like this:

In [2]:
!cat ../data/season_avg.csv | head -n 6

Season,FG,FGA,3P,3PA,FT,FTA,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,FG%,3P%,Pace
2018-19,40.7,88.7,11.0,31.2,17.7,23.2,10.4,34.5,44.8,23.9,7.8,5.1,14.2,21.6,110.2,0.459,0.352,99.7
2017-18,39.6,86.1,10.5,29.0,16.6,21.7,9.7,33.8,43.5,23.2,7.7,4.8,14.3,19.9,106.3,0.46,0.362,97.3
2016-17,39.0,85.4,9.7,27.0,17.8,23.1,10.1,33.4,43.5,22.6,7.7,4.7,14.0,19.9,105.6,0.457,0.358,96.4
2015-16,38.2,84.6,8.5,24.1,17.7,23.4,10.4,33.3,43.8,22.3,7.8,5.0,14.4,20.3,102.7,0.452,0.354,95.8
2014-15,37.5,83.6,7.8,22.4,17.1,22.8,10.9,32.4,43.3,22.0,7.7,4.8,14.4,20.2,100.0,0.449,0.35,93.9


Loading it into a DataFrame is easy:

In [3]:
df = pd.read_csv("../data/season_avg.csv")
type(df)

pandas.core.frame.DataFrame

#### 2.1.1 Quick Peek into Functionality

The `.head()` method prints the DataFrame's first 5 rows:

In [4]:
df.head()

Unnamed: 0,Season,FG,FGA,3P,3PA,FT,FTA,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,FG%,3P%,Pace
0,2018-19,40.7,88.7,11.0,31.2,17.7,23.2,10.4,34.5,44.8,23.9,7.8,5.1,14.2,21.6,110.2,0.459,0.352,99.7
1,2017-18,39.6,86.1,10.5,29.0,16.6,21.7,9.7,33.8,43.5,23.2,7.7,4.8,14.3,19.9,106.3,0.46,0.362,97.3
2,2016-17,39.0,85.4,9.7,27.0,17.8,23.1,10.1,33.4,43.5,22.6,7.7,4.7,14.0,19.9,105.6,0.457,0.358,96.4
3,2015-16,38.2,84.6,8.5,24.1,17.7,23.4,10.4,33.3,43.8,22.3,7.8,5.0,14.4,20.3,102.7,0.452,0.354,95.8
4,2014-15,37.5,83.6,7.8,22.4,17.1,22.8,10.9,32.4,43.3,22.0,7.7,4.8,14.4,20.2,100.0,0.449,0.35,93.9


The `.shape` property tells us our DataFrame has 20 rows and 19 columns, respectively:

In [5]:
df.shape

(20, 19)

The `.describe()` method gives us a quick summary of our data:

In [6]:
df.describe().round(decimals=1)

Unnamed: 0,FG,FGA,3P,3PA,FT,FTA,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,FG%,3P%,Pace
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,37.2,82.2,6.9,19.3,18.2,24.1,11.2,31.2,42.4,21.8,7.6,4.9,14.5,21.2,99.5,0.5,0.4,92.9
std,1.4,2.4,1.8,5.1,1.0,1.3,0.7,1.5,1.0,0.8,0.3,0.2,0.4,1.1,4.2,0.0,0.0,2.6
min,35.0,79.0,4.8,13.7,16.6,21.7,9.7,29.8,41.0,20.6,7.2,4.6,14.0,19.6,93.4,0.4,0.3,90.1
25%,36.1,80.8,5.5,15.6,17.7,23.2,10.9,30.3,41.8,21.3,7.3,4.8,14.3,20.3,96.8,0.4,0.4,91.2
50%,37.1,81.4,6.4,18.1,18.4,24.3,11.1,30.6,42.2,21.8,7.7,4.9,14.4,21.0,99.2,0.5,0.4,92.0
75%,37.7,83.2,7.7,21.7,18.8,24.9,12.0,32.0,43.0,22.2,7.8,5.1,14.7,21.9,100.6,0.5,0.4,93.9
max,40.7,88.7,11.0,31.2,19.7,26.3,12.4,34.5,44.8,23.9,7.9,5.3,15.5,23.3,110.2,0.5,0.4,99.7


The cells above demonstrate that we can loosely define a DataFrame as a structure equipped with numerous methods for working with tabular data. For a more formal definition, let's now look at two of the objects that power a DataFrame. Becoming familiar with these two objects will lay the groundwork for using pandas effectively.  

---

### 2.2 DataFrame Concepts

**Formal Definition**: A DataFrame is a collection of one of more `Series` objects that share a common `Index`. To understand what this means, let's start by examining one of the columns in our DataFrame.

To select a single column in a DataFrame, we can treat the DataFrame as if it is dictionary, using the column name as a key.

In [17]:
col = df['Season']
col

0     2018-19
1     2017-18
2     2016-17
3     2015-16
4     2014-15
5     2013-14
6     2012-13
7     2011-12
8     2010-11
9     2009-10
10    2008-09
11    2007-08
12    2006-07
13    2005-06
14    2004-05
15    2003-04
16    2002-03
17    2001-02
18    2000-01
19    1999-00
Name: Season, dtype: object

In [16]:
type(col)

pandas.core.series.Series


<div class="alert"> <sup>[1]</sup> Handles like the ones that make a box easier to carry, not the filthy ones that Kyrie has</div>