# Creating Pandas DataFrames & Selecting Data

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Pandas-DataFrames" data-toc-modified-id="Pandas-DataFrames-1">Pandas DataFrames</a></span><ul class="toc-item"><li><span><a href="#Loading-Data-into-a-Python-Notebook" data-toc-modified-id="Loading-Data-into-a-Python-Notebook-1.1">Loading Data into a Python Notebook</a></span><ul class="toc-item"><li><span><a href="#Prepping-a-DataFrame" data-toc-modified-id="Prepping-a-DataFrame-1.1.1">Prepping a DataFrame</a></span></li><li><span><a href="#About-this-dataset" data-toc-modified-id="About-this-dataset-1.1.2">About this dataset</a></span></li></ul></li><li><span><a href="#Selecting-rows-in-a-DataFrame" data-toc-modified-id="Selecting-rows-in-a-DataFrame-1.2">Selecting rows in a DataFrame</a></span><ul class="toc-item"><li><span><a href="#Selecting-a-specific-row" data-toc-modified-id="Selecting-a-specific-row-1.2.1">Selecting a specific row</a></span></li></ul></li><li><span><a href="#Selecting-rows-and-columns-in-a-DataFrame" data-toc-modified-id="Selecting-rows-and-columns-in-a-DataFrame-1.3">Selecting rows and columns in a DataFrame</a></span></li></ul></li></ul></div>

## Pandas DataFrames

Pandas has a few powerful data structures:
+ A table with multiple columns is a **DataFrame.**
+ A column of a DataFrame, or a list-like object, is a **Series.**

DataFrames can load data through a number of **different data structures and files**, including lists and dictionaries, csv files, excel files, and database records (more on that [here](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe)).

### Loading Data into a Python Notebook

> In this and the next few lessons, we will be analyzing web traffic data from Watsi, an organization that allows people to fund healthcare costs for people around the world.

In [1]:
import pandas as pd
data = pd.read_csv('/users/bm/downloads/python-for-data-analysis/clone_of_python_tutorial.csv')

In [2]:
# Print first 5 rows of the dataframe
data.head()

Unnamed: 0,referrer,timestamp,title,url,user_agent,user_id,referrer_domain,website_section,platform
0,https://www.google.com/,2016-02-05 00:48:23,Watsi | Fund medical treatments for people aro...,https://watsi.org/,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4...,CHAROLETTE S,google,,Desktop
1,https://themeteorchef.com/snippets/making-use-...,2016-02-24 23:12:10,Watsi | The Meteor Chef,https://watsi.org/team/the-meteor-chef,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,WARREN Q,themeteorchef.com,team,Desktop
2,https://watsi.org/,2015-12-25 17:59:35,Watsi | Give the gift of health with a Watsi G...,https://watsi.org/gift-cards,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1...,MITCHEL O,watsi.org,gift-cards,Desktop
3,,2016-02-05 21:19:30,Watsi | Fund medical treatments for people aro...,https://watsi.org/,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2...,MICHEL O,,,Desktop
4,https://watsi.org/fund-treatments,2016-02-14 19:30:08,Watsi | Fund medical treatments for people aro...,https://watsi.org/,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2...,ANDREE N,watsi.org,,Desktop


#### Prepping a DataFrame

In [3]:
# To do a text cleanup, use the fillna() method to replace missing values with empty strings
data = data.fillna('')

#### About this dataset

In this lesson you'll be working with web traffic data from a nonprofit called **Watsi**. Every row in this dataset corresponds to a person visiting a watsi.org page (this is known as a **pageview**). The general flow of pageviews is referred to as **web traffic**.

Every pageview (row in the dataset) is composed of:

+ `referrer` The url that referred the user to the site (if available). For example, if someone arrived at the page through a Facebook link, `referrer` would be https://www.facebook.com  

+ `timestamp` The time the event occurred  

+ `title` The title of the page the user visited on the Watsi website  

+ `url` The url the user visited. For example, https://watsi.org/team/the-meteor-chef

+ `user_agent` The software the user used to accessed the site, including platform, browser, and extensions

+ `user_id` A unique id for each user (normally they’d be numbers—we've turned them into anonymous names instead)

+ `referrer_domain` The domain of the url that referred the user to the site. For example, "facebook.com"

+ `website_section` The section of the website visited. For example, the section of https://watsi.org/team/the-meteor-chef is "team"

+ `platform` The device platform the user visited from. Possible values are "Desktop" and "Mobile"

Just like you can select a value in a list or dictionary using brackets, you can use brackets to select a column in the DataFrame.

In [4]:
data['url']

0                                 https://watsi.org/
1             https://watsi.org/team/the-meteor-chef
2                       https://watsi.org/gift-cards
3                                 https://watsi.org/
4                                 https://watsi.org/
                            ...                     
4995               https://watsi.org/fund-treatments
4996                              https://watsi.org/
4997                              https://watsi.org/
4998    https://watsi.org/profile/6705ce017f7e-sarah
4999        https://watsi.org/fund-treatments?page=4
Name: url, Length: 5000, dtype: object

The url column you got back has a list of numbers on the left. This is called the index, which uniquely identifies rows in the DataFrame. You will use the index to select individual rows, similar to how you select items from a list.

### Selecting rows in a DataFrame

In [5]:
# To select the first three rows of the DataFrame:
data[:3]

Unnamed: 0,referrer,timestamp,title,url,user_agent,user_id,referrer_domain,website_section,platform
0,https://www.google.com/,2016-02-05 00:48:23,Watsi | Fund medical treatments for people aro...,https://watsi.org/,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4...,CHAROLETTE S,google,,Desktop
1,https://themeteorchef.com/snippets/making-use-...,2016-02-24 23:12:10,Watsi | The Meteor Chef,https://watsi.org/team/the-meteor-chef,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,WARREN Q,themeteorchef.com,team,Desktop
2,https://watsi.org/,2015-12-25 17:59:35,Watsi | Give the gift of health with a Watsi G...,https://watsi.org/gift-cards,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1...,MITCHEL O,watsi.org,gift-cards,Desktop


<i>To select rows "from index 4 up to index 6":</i>
```python
data[4:7]
```
<i>To select rows "from index 4997 onward":</i>
```python
data[4997:]
```

#### Selecting a specific row

In [6]:
# To select a specific row
data.iloc[1]

referrer           https://themeteorchef.com/snippets/making-use-...
timestamp                                        2016-02-24 23:12:10
title                                        Watsi | The Meteor Chef
url                           https://watsi.org/team/the-meteor-chef
user_agent         Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
user_id                                                     WARREN Q
referrer_domain                                    themeteorchef.com
website_section                                                 team
platform                                                     Desktop
Name: 1, dtype: object

### Selecting rows and columns in a DataFrame

In [7]:
# To select the first three rows of the 'title' column
data['title'][:3]

0    Watsi | Fund medical treatments for people aro...
1                              Watsi | The Meteor Chef
2    Watsi | Give the gift of health with a Watsi G...
Name: title, dtype: object