# Introduction to Pandas


Now that you guys are Python masters, we are going to move straight into learning more about how to use a specific Python library that is super popular and used by data scientists and analysts around the world. 

### What is it (a high-level overview)?

You can probably guess what it's called by the title (HINT: It's called Pandas.). Pandas is powerful because it allows you to work with data without having to write a bunch of conditionals / loops like you guys learned about earlier. Instead, Pandas relies on reading data input into objects that are easier to deal with!

Some of the features of Pandas in an overview include:

* Types of labeled arrays, main ones being Series/TimeSeries (1-dim arrays) and DataFrame (2-dim arrays)
* Index objects allowing for single and multi-axes indexing
* Ability to append and transform datasets / data input fairly easily
* Date range generation and custom date offsets
* Input/Output tools: loading data from CSVs or other flat files and loading into tabular objects called PyTables
* Rolling mean, rolling standard deviation, etc. with changing inputs
* Static and Rolling regression + analysis

## Let's get started!
<img src="https://cdn-images-1.medium.com/max/1200/1*tiFm2E0nCXp4Bc1Rk8OhdA.jpeg" width="300" heigh="300">


### Imports

We're gonna get started with importing necessary libraries so we can actually practice using the Pandas library. The others to import include NumPy and MatPlotLib - [NumPy](http://www.numpy.org/) is a Python Library and is used here for it's powerful and easy-to-do-matrix-math-with array objects, while [MatPlotLib](https://matplotlib.org/) is used to visualize our data, natural and / or modified. You can click on the links to learn more about them, but we're not gonna go into details for now.  

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Inputting Data

As mentioned earlier, we are able to input data from various sources - one type of file is the CSV file.
Let's read in a CSV file containing data about different airlines into a DataFrame - we'll talk more about what that means later.

In [None]:
airlines_df = pd.read_csv('airlines.csv')
airlines_df

### Data Overview

We can get an overview of the dataset statistics, using built in methods as described (and available to you to try out) below...

In [None]:
print("head(num_rows): Printing first 5 rows of dataset...")
airlines_df.head() #5 is default

In [None]:
print("tail(num_rows): Printing last 3 rows of dataset...")
airlines_df.tail(3)

In [None]:
print("describe(): A statistical summary of the dataset...")
airlines_df.describe()

In [None]:
print("columns: lists the columns within the dataset...")
airlines_df.columns

In [None]:
print("index: lists the indices of the dataset...")
airlines_df.index

### Data Selection / "Slicing"

If we wanted to look at a subset of our dataset, persay only certain columns or a few rows or some combination of the two, we are able to easily look at some specific "slice" of our dataset using what was taught before about array accessing and slicing (if you don't remember, don't worry! Comments below will give brief but informative explanations of what is going on.)

### Rows

In [None]:
airlines_df[2:5] #slice taking rows 2 through 5-1 (=4)

In [None]:
airlines_df[6:] #slice taking rows 6 through end (4407, to be exact)

##### Rows - by location

In [None]:
airlines_df.iloc[[2:5,7:]] #What do you think this means? Hint - we are accessing specific rows, not ranges.

In [None]:
airlines_df.iloc[[2,5,9]] #What do you think this means? Hint - we are accessing specific rows, not ranges.

### Columns

In [None]:
airlines_df['# of Delays.Late Aircraft'] #slice taking column '# of Delays.Late Aircraft'

##### Multiple columns by label

In [None]:
airlines_df.loc[:,['# of Delays.Late Aircraft','Month Name']]