# Hands-On With Python

#### In this walk-through, we will be covering four main techniques to get you comfortable manipulating data with Python.

### Overview:
1. Reading data into a Pandas data frame (and previewing it)
2. Selecting/manipulating data within a data frame
3. Aggregating data
4. Plotting data (histogram & box plot)

### 1. Reading data into a data frame
Here, we import the pandas package (commonly referred to as "pd").

In [50]:
import pandas as pd

Next, we read in the rentals.csv file (with the `encoding='latin-1'` parameter) and preview the data using `rentals_df.head()`

In [65]:
rentals_df = pd.read_csv('rentals.csv', encoding='latin-1')

rentals_df.head()

Unnamed: 0,Trip id,Starttime,Stoptime,Bikeid,Tripduration,From station id,From station name,To station id,To station name,Usertype
0,27193394,4/1/2017 0:59,4/1/2017 1:43,70024,2659,1012,North Shore Trail & Fort Duquesne Bridge,1045,S 27th St & Sidney St. (Southside Works),Customer
1,27193815,4/1/2017 1:43,4/1/2017 1:48,70353,284,1037,Frew St & Schenley Dr,1038,Boulevard of the Allies & Parkview Ave,Subscriber
2,27194749,4/1/2017 2:35,4/1/2017 3:01,70067,1576,1010,10th St & Penn Ave (David L. Lawrence Conventi...,1010,10th St & Penn Ave (David L. Lawrence Conventi...,Customer
3,27201194,4/1/2017 5:05,4/1/2017 7:46,70191,9695,1005,Forbes Ave & Grant St,1043,Coltart Ave & Forbes Ave,Customer
4,27203670,4/1/2017 5:35,4/1/2017 5:54,70353,1164,1038,Boulevard of the Allies & Parkview Ave,1047,S 22nd St & E Carson St,Subscriber


We can use the `.shape` attribute of the data frame to see the count of records and number of columns.

In [59]:
rentals_df.shape

(24423, 10)

In [60]:
print("There are {} records and {} columns in the rentals.csv file.".format(rentals_df.shape[0], rentals_df.shape[1]))

There are 24423 records and 10 columns in the rentals.csv file.


Below, we can see what data types Python assigned to each column using `rentals_df.dtypes`

In [61]:
rentals_df.dtypes

Trip id               int64
Starttime            object
Stoptime             object
Bikeid                int64
Tripduration          int64
From station id       int64
From station name    object
To station id         int64
To station name      object
Usertype             object
dtype: object

### 2. Selecting and manipulating data
Here, we will see how to select, rename and do some basic calculations with data frames.

The data set shows us the trip duration (in seconds), but we would like to see the duration in minutes.

To convert the trip duration from seconds to minutes, we create a new column `Tripduration_mins` by taking the value in `Tripduration` and dividing it by 60. Then we use the `.head()` method to check the results.

It is important to note that we are **not** looping through each element - the operation of "divide by 60" is applied to each element automatically.

In [66]:
rentals_df['Tripduration_mins'] = rentals_df['Tripduration'] / 60

rentals_df.tail()

Unnamed: 0,Trip id,Starttime,Stoptime,Bikeid,Tripduration,From station id,From station name,To station id,To station name,Usertype,Tripduration_mins
24418,33295197,6/30/2017 23:38,6/30/2017 23:58,70240,1212,1023,Liberty Ave & Baum Blvd,1028,Penn Ave & Putnam St (Bakery Square),Customer,20.2
24419,33295206,6/30/2017 23:38,6/30/2017 23:56,70178,1030,1023,Liberty Ave & Baum Blvd,1028,Penn Ave & Putnam St (Bakery Square),Customer,17.166667
24420,33295257,6/30/2017 23:45,7/1/2017 0:07,70490,1351,1010,10th St & Penn Ave (David L. Lawrence Conventi...,1019,42nd St & Butler St,Customer,22.516667
24421,33295318,6/30/2017 23:55,7/1/2017 0:02,70019,424,1049,S 12th St & E Carson St,1045,S 27th St & Sidney St. (Southside Works),Customer,7.066667
24422,33295336,6/30/2017 23:58,7/1/2017 0:15,70400,1008,1021,Taylor St & Liberty Ave,1024,S Negley Ave & Baum Blvd,Customer,16.8


Now, we want to create another column that just has the date of the `Starttime` date time column - so that we can see how many rides occur on a given day.

To do this, we use the `pd.to_datetime` function, which takes a column and format as parameters. We now have a date time object that we can format into any string format we want.

In [67]:
# Parse the date time object from a string
rentals_df['Starttime_dt'] = pd.to_datetime(rentals_df['Starttime'], format='%m/%d/%Y %H:%M')

# Format the date time object as a string
rentals_df['Startdate'] = rentals_df['Starttime_dt'].dt.strftime('%m/%d/%Y')

rentals_df.head()

Unnamed: 0,Trip id,Starttime,Stoptime,Bikeid,Tripduration,From station id,From station name,To station id,To station name,Usertype,Tripduration_mins,Starttime_dt,Startdate
0,27193394,4/1/2017 0:59,4/1/2017 1:43,70024,2659,1012,North Shore Trail & Fort Duquesne Bridge,1045,S 27th St & Sidney St. (Southside Works),Customer,44.316667,2017-04-01 00:59:00,04/01/2017
1,27193815,4/1/2017 1:43,4/1/2017 1:48,70353,284,1037,Frew St & Schenley Dr,1038,Boulevard of the Allies & Parkview Ave,Subscriber,4.733333,2017-04-01 01:43:00,04/01/2017
2,27194749,4/1/2017 2:35,4/1/2017 3:01,70067,1576,1010,10th St & Penn Ave (David L. Lawrence Conventi...,1010,10th St & Penn Ave (David L. Lawrence Conventi...,Customer,26.266667,2017-04-01 02:35:00,04/01/2017
3,27201194,4/1/2017 5:05,4/1/2017 7:46,70191,9695,1005,Forbes Ave & Grant St,1043,Coltart Ave & Forbes Ave,Customer,161.583333,2017-04-01 05:05:00,04/01/2017
4,27203670,4/1/2017 5:35,4/1/2017 5:54,70353,1164,1038,Boulevard of the Allies & Parkview Ave,1047,S 22nd St & E Carson St,Subscriber,19.4,2017-04-01 05:35:00,04/01/2017


Our data frame is getting a bit crowded. For the purposes of the rest of this analysis, we don't need all of these columns. 

Let's drop a few of the columns we used in calculations and no longer need.

In [68]:
rentals_df.drop(['Starttime', 'Stoptime', 'Starttime_dt', 'Tripduration'], axis=1, inplace=True)

rentals_df.head()

Unnamed: 0,Trip id,Bikeid,From station id,From station name,To station id,To station name,Usertype,Tripduration_mins,Startdate
0,27193394,70024,1012,North Shore Trail & Fort Duquesne Bridge,1045,S 27th St & Sidney St. (Southside Works),Customer,44.316667,04/01/2017
1,27193815,70353,1037,Frew St & Schenley Dr,1038,Boulevard of the Allies & Parkview Ave,Subscriber,4.733333,04/01/2017
2,27194749,70067,1010,10th St & Penn Ave (David L. Lawrence Conventi...,1010,10th St & Penn Ave (David L. Lawrence Conventi...,Customer,26.266667,04/01/2017
3,27201194,70191,1005,Forbes Ave & Grant St,1043,Coltart Ave & Forbes Ave,Customer,161.583333,04/01/2017
4,27203670,70353,1038,Boulevard of the Allies & Parkview Ave,1047,S 22nd St & E Carson St,Subscriber,19.4,04/01/2017


### 3. Aggregating data
Here, we will see how to group data in preparation for plotting.