# Pandas Basics (Learn Enough to be Dangerous!)

In [0]:
# Conventional way to import pandas
import pandas as pd

# 1. What is Pandas?

Pandas is an open source Python library that provides high-performance, easy-to-use data structures and data analysis tools.

Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.



# 2. Best way to Install Pandas

If you use Anaconda Python:


```
conda install pandas
```

or 

```
pip install pandas
```


# 3. How do I read a tabular data file  into Pandas



In [0]:
# Read a dataset 
data = pd.read_csv('sample_data/california_housing_train.csv')

In [0]:
# Examine the first 5 rows

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


# 4. How can I see the documentation for a certain function?

In [0]:
? pd.read_csv

# 5. How do I select a column from a DataFrame?

In [0]:
# Get column names


In [0]:
# Select Population column

In [0]:
# Or equivalently, use dot notation

Bracket notation will always work, whereas dot notation has limitations:

* Dot notation doesn't work if there are **spaces** in the Series name.
* Dot notation doesn't work if the Series has the same name as a **DataFrame method or attribute** (like 'head' or 'shape')


# 6. Why do some Pandas commands end with parentheses (and others don't)?

# Let's load a dataset of top-rated IMDb movies into a DataFrame
# Note that here we are loading data from csv file hosted in a website
import pandas as pd
movies = pd.read_csv('http://bit.ly/imdbratings')


movies.shape

**Methods** end with parentheses, while **attributes** don't

In [0]:
#Combine Data and print statements

There are 979 movies in the dataset and we have 6 columns
There are 979 movies in the dataset


# Example method: show the first 10 rows.  (Note that we use parentheses)
movies.head(10)

In [0]:
# Another example methods: calculate summary statistics


In [0]:
# Now we will get an attribute (note thta it doesn't need parentheses)
# number of rows and columns of our dataframe

In [0]:
# Another attribute: data type of each column


# 7. How do I rename columns in a Pandas DataFrame?

In [0]:
# read a dataset of UFO reports into a DataFrame
ufo = pd.read_csv('http://bit.ly/uforeports')

In [0]:
# Examine the data:


In [0]:
# Examine the column names:

In [0]:
# rename two of the columns by using the 'rename' method

? ufo.rename

In [0]:
# replace all of the column names by overwriting the 'columns' attribute


# 8. How do I remove columns from a Pandas DataFrame?

In [0]:
# read a dataset of UFO reports into a DataFrame
ufo = pd.read_csv('http://bit.ly/uforeports')


In [0]:

# remove a single column (axis=1 refers to columns)


In [0]:
# remove multiple columns at once


In [0]:
# remove multiple rows at once (axis=0 refers to rows)


# 9. How do I sort a DataFrame?

In [0]:
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('http://bit.ly/imdbratings')


In [0]:
# sort the 'title' Series in ascending order (returns a Series)

In [0]:
# sort in descending order instead


In [0]:
# sort in descending order instead

# 10. How do I filter rows of a Pandas DataFrame by column value?

In [0]:
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('http://bit.ly/imdbratings')


In [0]:
# examine the number of rows and columns

***Goal***: Filter the DataFrame rows to only show movies with a 'duration' of at least 200 minutes.

In [0]:
# create a list in which each element refers to a DataFrame row: True if the row satisfies the condition, False otherwise


In [0]:
# confirm that the list has the same length as the DataFrame


In [0]:
# examine the first five list elements


In [0]:
# convert the list to a Series

In [0]:
# use bracket notation with the boolean Series to tell the DataFrame which rows to display


In [0]:
# simplify the steps above: no need to write a for loop to create 'is_long' since pandas will broadcast the comparison

# or equivalently, write it in one line (no need to create the 'is_long' object)


In [0]:
# select the 'genre' Series from the filtered DataFrame


# or equivalently, use the 'loc' method


# 11. How do I apply multiple filter criteria to a Pandas DataFrame?

In [0]:
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('http://bit.ly/imdbratings')


In [0]:
# filter the DataFrame to only show movies with a 'duration' of at least 200 minutes


Understanding** logical operators:**

* **and**: True only if **both sides** of the operator are True
* **or**: True if **either side** of the operator is True

In [0]:
# demonstration of the 'and' operator


In [0]:
# demonstration of the 'or' operator


Rules for specifying **multiple filter ** criteria in pandas:

* use **&** instead of **and**
* use **|** instead of **or**
* add **parentheses** around each condition to specify evaluation order



**Goal:** Further filter the DataFrame of long movies (duration >= 200) to only show movies which also have a 'genre' of 'Drama'

In [0]:
# Use the '&' operator to specify that both conditions are required

**Goal:** Filter the original DataFrame to show movies with a 'genre' of 'Crime' or 'Drama' or 'Action'



In [0]:
# use the '|' operator to specify that a row can match any of the three criteria


# or equivalently, use the 'isin' method


Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."
5,8.9,12 Angry Men,NOT RATED,Drama,96,"[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals..."
9,8.9,Fight Club,R,Drama,139,"[u'Brad Pitt', u'Edward Norton', u'Helena Bonh..."
11,8.8,Inception,PG-13,Action,148,"[u'Leonardo DiCaprio', u'Joseph Gordon-Levitt'..."
12,8.8,Star Wars: Episode V - The Empire Strikes Back,PG,Action,124,"[u'Mark Hamill', u'Harrison Ford', u'Carrie Fi..."
13,8.8,Forrest Gump,PG-13,Drama,142,"[u'Tom Hanks', u'Robin Wright', u'Gary Sinise']"
