# `pandas` Part 2: this notebook is a 2nd lesson on `pandas`
## The main objective of this tutorial is to slice up some DataFrames using `pandas`
>- Reading data into DataFrames is step 1
>- But most of the time we will want to select specific pieces of data from our datasets 

# Learning Objectives
## By the end of this tutorial you will be able to:
1. Select specific data from a pandas DataFrame
2. Insert data into a DataFrame

## Files Needed for this lesson: `winemag-data-130k-v2.csv`
>- Download this csv from Canvas prior to the lesson

## The general steps to working with pandas:
1. import pandas as pd
>- Note the `as pd` is optional but is a common alias used for pandas and makes writing the code a bit easier
2. Create or load data into a pandas DataFrame or Series
>- In practice, you will likely be loading more datasets than creating but we will learn both
3. Reading data with `pd.read_`
>- Excel files: `pd.read_excel('fileName.xlsx')`
>- Csv files: `pd.read_csv('fileName.csv')`
4. After steps 1-3 you will want to check out your DataFrame
>- Use `shape` to see how many records and columns are in your DataFrame
>- Use `head()` to show the first 5-10 records in your DataFrame
5. Then you will likely want to slice up your data into smaller subset datasets
>- This step is the focus of this lesson

Narrated type-along videos are available:

- Part 1: https://youtu.be/uA96V-u8wkE
- Part 2: https://youtu.be/fsc0G77c5Kc

# First, check your working directory

# Step 1: Import pandas and give it an alias

# Step 2 Read Data Into a DataFrame
>- Knowing how to create your own data can be useful
>- However, most of the time we will read data into a DataFrame from a csv or Excel file

## File Needed: `winemag-data-130k-v2.csv`
>- Make sure you download this file from Canvas and place in your working directory

### Read the csv file with `pd.read_csv('fileName.csv`)
>- Set the index to column 0

### Check how many rows/records and columns are in the the `wine_reviews` DataFrame
>- Use `shape`

### Check a couple of rows of data

### Now we can access columns in the dataframe using syntax similar to how we access values in a dictionary

### To get a single value...

### Using the indexing operator and attribute selection like we did above should seem familiar
>- We have accessed data like this using dictionaries
>- However, pandas also has it's own selection/access operators, `loc` and `iloc`
>- For basic operations, we can use the familiar dictionary syntax
>- As we get more advanced, we should use `loc` and `iloc`
>- It might help to think of `loc` as "label based location" and `iloc` as "index based location"

### Both `loc` and `iloc` start with with the row then the column
#### Use `iloc` for index based location similar to what we have done with lists and dictionaries
#### Use `loc` for label based location. This uses the column names vs indexes to retrieve the data we want. 

# First, let's look at index based selection using `iloc`

## As we work these examples, remember we specify row first then column

### Selecting the first row using `iloc`
>- For the wine reviews dataset this is our header row

### To return all the rows of a particular column with `iloc`
>- To get everything, just put a `:` for row and/or column

### To return the first three rows of the first column...

### To return the second and third rows...

### We can also pass a list for the rows to get specific values

### Can we pass lists for both rows and columns...?

### We can also go from the end of the rows just like we did with lists
>- The following gets the last 5 records for country in the dataset

### To get the last 5 records for all columns...

# Label-Based Selection with `loc`
## With `loc`, we use the names of the columns to retrieve data

### Get all the records for the following fields/columns using `loc`:
>- taster_name
>- taster_twitter_handle
>- points

# Notice we have been using the default index so far
## We can change the index with `set_index`

# Conditional Selection
>- Suppose we only want to analyze data for one country, reviewer, etc... 
>- Or we want to pull the data only for points and/or prices above a certain criteria

## Which wines are from the US with 95 or greater points?

# Some notes on our previous example:
>- We just quickly took at dataset that has almost 130K rows and reduced it to one that has 993 
>- This tells us that less that 1% of the wines are from the US and have ratings of 95 or higher
>- With some simple slicing using pandas we already have some decent start to an analytics project 

# Q: What are all the wines from Italy or that have a rating higher than 95?
>- To return the results for an "or" question use the pipe `|` between your conditions  

# Q: What are all the wines from Italy or France? 
>- We can do this with an or statement or the `isin()` selector
>- Note: if you know SQL, this is the same thing as the IN () statement 
>- Using `isin()` replaces multiple "or" statements and makes your code a little shorter

# Q: What are all the wines without prices?
>- Here we can use the `isnull` method to show when values are not entered for a particular column

# What are all the wines with prices? 
>- Use `notnull()`

# We can also add columns/fields to our DataFrames