# <span style='color:#3b748a'>The rental data for San Francisco is TOO large to upload to GitHub.</span>

<img src="../images/bikes_banner.jpg" width="1000" />

## <span style="color:#37535e">Bicycle Share Usage</span>

##  <span style='color:#3b748a'>Cleaning San Francisco Ford GoBike data</span>

<span style='color:#4095b5'>This notebook loads and cleans 12 months (September 2017 - August 2018) of data from the San Francisco Ford GoBike bicycle share. There is data going back to 2017.</span>

<span style='color:#4095b5'>Each row (observation) of data describes one bike ride on which a bike is taken. Each rental includes a starting place and time, a ending place and time, as well as duration, user, and bike information. </span>

## <span style='color:#3b748a'>Table of contents</span>
* <span style='color:#4095b5'>I.  <a href="#checking"><span style='color:#4095b5'>Data checking functions.</span></a></span>
* <span style='color:#4095b5'>II. <a href="#cleaning"><span style='color:#4095b5'>Data cleaning functions.</span></a></span>
* <span style='color:#4095b5'>III. <a href="#convert"><span style='color:#4095b5'>Extract Sep-Dec from 2017 Q3 data.</span></a></span>
* <span style='color:#4095b5'>IV. <a href="#import"><span style='color:#4095b5'>Import all data.</span></a></span>
* <span style='color:#4095b5'>V. <a href="#clean"><span style='color:#4095b5'>Clean all data.</span></a></span>
* <span style='color:#4095b5'>VI. <a href="#merge"><span style='color:#4095b5'>Merge the dataframes into 1 big one.</span></a></span>
* <span style='color:#4095b5'>VII. <a href="#explore"><span style='color:#4095b5'>Explore the data.</span></a></span>
* <span style='color:#4095b5'>VIII. <a href="#write"><span style='color:#4095b5'>Write the full dataframe to a csv file.</span></a></span>

## <span style='color:#3b748a'>External data required</span>
<ul>
    <li><span style='color:#4095b5'>../data/sf/2018xx-fordgobike-tripdata.csv for each month in 2018; NOT available in GitHub</span></li>
    <li><span style='color:#4095b5'>../data/sf/2017-fordgobike-tripdata.csv NOT available in GitHub</span></li>
</ul>

## <span style='color:#3b748a'>Links</span>
<ul>
    <li><a href="https://s3.amazonaws.com/fordgobike-data/index.html"><span style='color:#4095b5'>San Francisco Ford GoBike data</span></a></li>
    <li><a href="plot_sf.ipynb"><span style='color:#4095b5'>Plotting SF data.</span></a></li>
</ul>
<hr>

In [1]:
# Let's get the administrative stuff done first
# import all the libraries and set up the plotting

import pandas as pd
import numpy as np
from datetime import datetime,timedelta
from geopy.distance import vincenty

# Gloabal variables to track 
trivial_duration = 0
trivial_distance = 0
outliers_latlon = 0
outliers_duration = 0
outliers_distance = 0

# GnBu_d
colors = ['#37535e', '#3b748a', '#4095b5', '#52aec9', '#72bfc4', '#93d0bf']

<hr>
<a name="checking"> </a>
## <span style='color:#3b748a'>I. Data checking functions</span>

In [2]:
# Check which non-numeric columns are missing values and what the possible values are for each object column

def check_cols(df):
    cols = df.select_dtypes([np.object]).columns
    for col in cols:
        print("{} is {} and values are {}.".format(col,df[col].dtype,df[col].unique()))
        n_nan = df[col].isnull().sum()
        if n_nan > 0:
            print("{} has {} NaNs".format(col,n_nan))
            
    cols = df.select_dtypes([np.int64,np.float64,np.uint64]).columns
    for col in cols:
        print("{} is {} and values are {} to {}.".format(col,df[col].dtype,df[col].min(),df[col].max()))
        n_nan = df[col].isnull().sum()
        if n_nan > 0:
            print("{} has {} NaNs".format(col,n_nan))
    return

In [3]:
# Check which numeric columns are missing values

def check_data(df):
    s = df.shape

    # Check for null values
    null_data = df.isnull().sum()
    null_data_count = sum(df.isnull().sum())
    print("Rows: {}\t Cols: {}\t NaNs: {}".format(s[0],s[1],null_data_count))
    if  null_data_count > 0:
        print("Columns with NaN: {}".format(list(null_data[null_data > 0].index)))

    return

<hr>
<a name="cleaning"></a>
## <span style='color:#3b748a'> II. Data cleaning functions</span>

<span style='color:#4095b5'>These functions clean the trip data.</span>

### <span style='color:#4095b5'>Drop columns *NOT* in Atlanta data.</span>
<span style='color:#52aec9'>I might want to add some back at some point.</span>

In [4]:
def drop_columns(df):
    cols_drop = ['start_station_id', 'end_station_id', 'user_type', 'member_birth_year', 
                 'member_gender', 'bike_share_for_all_trip']

    # Can't drop a column that isn't there
    cols_drop = list(set(df.columns) & set(cols_drop))
    df.drop(cols_drop, axis=1, inplace=True)

    return df

### <span style='color:#4095b5'>Rename columns to match Atlanta data names.</span>

In [5]:
def rename_columns(df):
    df.rename(columns={'start_station_name' : 'Start Hub', 
                       'start_station_latitude' : 'Start Latitude',
                       'start_station_longitude' : 'Start Longitude',
                       'start_time' : 'Start Time', 
                       'end_station_name' : 'End Hub', 
                       'end_station_latitude' :'End Latitude', 
                       'end_station_longitude' : 'End Longitude', 
                       'end_time' : 'End Time', 
                       'bike_id' :'Bike Name',
                       'duration_sec' : 'Duration'
                      }, inplace=True)
    return df


### <span style='color:#4095b5'>Merge with hub data.</span>
<span style='color:#52aec9'>We may have to use the start/end hubs to get start/end lat/long.</span>

In [6]:
def calc_latlong(df, df_hubs):
    return df

### <span style='color:#4095b5'>Drop rows with nulls.</span>
<span style='color:#52aec9'>Some hubs are null. We need to figure out how to NOT drop these!!</span>

In [7]:
def drop_nans(df):
    hub = "Unknown"
    
    df['Start Hub'] = df['Start Hub'].fillna(hub)
    df['End Hub'] = df['End Hub'].fillna(hub)

    return df

### <span style='color:#4095b5'>Use appropriate datatypes.</span>
<span style='color:#52aec9'>For example, fix Date/Time objects and cast Latitude and Longitude to floats.</span>

In [8]:
def clean_datatypes(df):
    df['Start Latitude'] = df['Start Latitude'].astype(float)
    df['Start Longitude'] = df['Start Longitude'].astype(float)
    df['End Latitude'] = df['End Latitude'].astype(float)
    df['End Longitude'] = df['End Longitude'].astype(float)

    # Turn times in datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df['End Time'] = pd.to_datetime(df['End Time'])

    # CREATE dates in datetime
    df['Start Date'] = df['Start Time'].dt.date
    df['End Date'] = df['End Time'].dt.date

    # Fix the durations
    df['Duration'] = pd.to_timedelta(df['Duration'], unit='s')
    
    return df

### <span style='color:#4095b5'>Calculate distances.</span>
<span style='color:#52aec9'>Poor approximation. If bike was checked-out and returned to same station, will be trivial distance.</span>

In [9]:
def distance_calc (row):
    start = (row['Start Latitude'], row['Start Longitude'])
    stop = (row['End Latitude'], row['End Longitude'])

    return vincenty(start, stop).miles

In [10]:
def calc_distances(df):
    df['Distance [Miles]'] = df.apply (lambda row: distance_calc (row),axis=1)
    return df

### <span style='color:#4095b5'>Reorder columns.</span>
<span style='color:#52aec9'>Make order same as Atlanta data.</span>

In [11]:
def reorder_cols(df):
    columns = ['Start Hub', 'Start Latitude', 'Start Longitude', 'Start Date',
       'Start Time', 'End Hub', 'End Latitude', 'End Longitude', 'End Date',
       'End Time', 'Bike Name', 'Distance [Miles]', 'Duration']

    df = df.reindex(columns=columns)
    return df

### <span style='color:#4095b5'>Drop trivial trips.</span>
<span style='color:#52aec9'>Trivial trips have time less than 3 mins. We cannot drop for trivial distance, since we compute distance.</span>

In [12]:
def drop_trivial_trips_distance(df):
    df = df[df["Distance [Miles]"] > 0.02].copy()
    return df

In [13]:
def drop_trivial_trips_duration(df):
    df = df[df["Duration"] >= pd.to_timedelta('00:03:00')].copy()
    return df

In [14]:
def drop_trivial_trips(df):
    global trivial_duration
    global trivial_distance

    rows = df.shape[0]
    df = drop_trivial_trips_duration(df)
    rows_duration = df.shape[0]
    trivial_duration += rows-rows_duration

    # Calculated distance, don't drop
    # df = drop_trivial_trips_distance(df)
    rows_distance = df.shape[0]
    trivial_distance += rows_duration-rows_distance

    return df

### <span style='color:#4095b5'>Drop outliers.</span>
<ul>
    <li><span style='color:#52aec9'>Only use trips near San Francisco.</span></li> 
    <li><span style='color:#52aec9'>Don't keep trips 24 hours or longer.</span></li> 
     <li><span style='color:#52aec9'>Don't keep trips further than 100 miles.</span></li> 
</ul>

In [38]:
def drop_outliers_latlon(df):
    df = df[df["Start Latitude"] < 39].copy()
    df = df[df["End Latitude"] < 39].copy()
#     df = df[df["Start Latitude"] > 33.5].copy()
#     df = df[df["End Latitude"] > 33.5].copy()

#     df = df[df["Start Longitude"] < -83.0].copy()
#     df = df[df["End Longitude"] < -83.0].copy()

    return df

In [16]:
def drop_outliers_duration(df):
    df = df[df["Duration"] < pd.to_timedelta('24:00:00')].copy()
    return df

In [17]:
def drop_outliers_distance(df):
    df_temp = df[df["Distance [Miles]"] >= 100.0]
    if df_temp.shape[0]:
        print("Long trip: ", df_temp[['Start Latitude','Start Longitude', 'Start Time', 
                                     'End Latitude', 'End Longitude', 'End Time', 
                                     'Distance [Miles]', 'Duration']])
    df = df[df["Distance [Miles]"] < 100.0].copy()
    return df

In [18]:
def drop_outliers(df):
    global outliers_latlon
    global outliers_duration
    global outliers_distance
    
    rows = df.shape[0]
    df = drop_outliers_latlon(df)
    rows_latlon = df.shape[0]
    outliers_latlon += rows - rows_latlon
    
    df = drop_outliers_duration(df)
    rows_duration = df.shape[0]
    outliers_duration += rows_latlon - rows_duration
    
    df = drop_outliers_distance(df)
    rows_distance = df.shape[0]
    outliers_distance += rows_duration - rows_distance
    
    return df

### <span style='color:#4095b5'>Pull all of the cleaning together.</span>

In [19]:
def clean_df(df, df_hubs=None):
    global trivial_duration
    global trivial_distance
    global outliers_latlon
    global outliers_duration
    global outliers_distance

    df = drop_columns(df)
    df = rename_columns(df)
    df = calc_latlong(df, df_hubs)
    df = drop_nans(df)
    df = clean_datatypes(df)
    df = calc_distances(df)
    df = reorder_cols(df)
    df = drop_trivial_trips(df)
    df = drop_outliers(df)

    # Information about rows dropped
    print("Trivial dur: {} dist: {}".format(trivial_duration, 
                                                                              trivial_distance))
    print("Outlier loc: {} dur: {} dist: {}".format(outliers_latlon,
                                                     outliers_duration,
                                                     outliers_distance))
    return df

<hr>
<a name="convert"></a>
## <span style='color:#3b748a'> III. Extract July-Dec 2017 from 2017 data.</span>
<ul>
    <li><span style='color:#4095b5'>Trip data is quarterly.</span></li>
    <li><span style='color:#4095b5'>The file is too huge to easily use.</span></li>
    <li><span style='color:#4095b5'>This should be generalized.</span></li>
</ul>

In [20]:
if False:
    d = "2017"
    df = pd.read_csv("../data/sf/"+str(d)+"-fordgobike-tripdata.csv")

    # Turn times in datetime
    df['Start Time'] = pd.to_datetime(df['start_time'])
    df['End Time'] = pd.to_datetime(df['end_time'])

    # 2017Q3 = 2017-07-01, 2017-08-01, 2017-09-01
    df7 = df[df['Start Time'] < datetime.strptime('2017-08-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df7 = df7[df7['Start Time'] >= datetime.strptime('2017-07-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df7.drop(['Start Time', 'End Time'], axis=1, inplace=True)
    print(df7.shape)
    df7.to_csv('../data/sf/201707-fordgobike-tripdata.csv', index=False)
    
    # 2017Q3 = 2017-07-01, 2017-08-01, 2017-09-01
    df8 = df[df['Start Time'] < datetime.strptime('2017-09-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df8 = df8[df8['Start Time'] >= datetime.strptime('2017-08-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df8.drop(['Start Time', 'End Time'], axis=1, inplace=True)
    print(df8.shape)
    df8.to_csv('../data/sf/201708-fordgobike-tripdata.csv', index=False)
    
    # 2017Q3 = 2017-07-01, 2017-08-01, 2017-09-01
    df9 = df[df['Start Time'] < datetime.strptime('2017-10-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df9 = df9[df9['Start Time'] >= datetime.strptime('2017-09-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df9.drop(['Start Time', 'End Time'], axis=1, inplace=True)
    print(df9.shape)
    df9.to_csv('../data/sf/201709-fordgobike-tripdata.csv', index=False)
    
    # 2017Q4 = 2017-10-01, 2017-11-01, 2017-12-01
    df10 = df[df['Start Time'] < datetime.strptime('2017-11-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df10 = df10[df10['Start Time'] >= datetime.strptime('2017-10-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df10.drop(['Start Time', 'End Time'], axis=1, inplace=True)
    print(df10.shape)
    df10.to_csv('../data/sf/201710-fordgobike-tripdata.csv', index=False)
    
    # 2017Q4 = 2017-10-01, 2017-11-01, 2017-12-01
    df11 = df[df['Start Time'] < datetime.strptime('2017-12-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df11 = df11[df11['Start Time'] >= datetime.strptime('2017-11-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df11.drop(['Start Time', 'End Time'], axis=1, inplace=True)
    print(df11.shape)
    df11.to_csv('../data/sf/201711-fordgobike-tripdata.csv', index=False)
    
    # 2017Q4 = 2017-10-01, 2017-11-01, 2017-12-01
    df12 = df[df['Start Time'] >= datetime.strptime('2017-12-01 00:00:00', '%Y-%m-%d %H:%M:%S')].copy()
    df12.drop(['Start Time', 'End Time'], axis=1, inplace=True)
    print(df12.shape)
    df12.to_csv('../data/sf/201712-fordgobike-tripdata.csv', index=False)

<hr>
<a name="import"></a>
## <span style='color:#3b748a'> IV. Import all data from San Fransisco.</span>


In [21]:
# Sna Francisco data is monthly for 2018, ant ehn one big csv for 2017
# For now, just load one year
# 201801-fordgobike-tripdata.csv
trip_data = ['201709',
             '201710', '201711', '201712',
             '201801', '201802', '201803',
             '201804', '201805', '201806',
             '201807', '201808']

In [22]:
# Dictionary of DataFrames, one for each month
df_data = dict()
for d in trip_data:
    df_data[d] = pd.read_csv("../data/sf/"+str(d)+"-fordgobike-tripdata.csv")

<hr>
<a name="clean"></a>

## <span style='color:#3b748a'>V. Clean all data from San Francisco.</span>
<ul>
    <li><span style='color:#4095b5'>Reform the data to match Atlanta data.</span></li>
    <li><span style='color:#4095b5'>Drop the trivial trips.</span></li>
    <li><span style='color:#4095b5'>Drop the outliers.</span></li>
    <li><span style='color:#4095b5'>Use appropriate coumn types.</span></li>
</ul>

In [23]:
# For each month, clean the DataFrame
print("Cleaning the data:")
for d in trip_data:
    print("Month: {} \nRows: {}\t Cols: {}\t NaNs: {}".format(d, 
                                                    df_data[d].shape[0], 
                                                    df_data[d].shape[1], 
                                                    sum(df_data[d].isnull().sum())))
    df_data[d] = clean_df(df_data[d])
    check_data(df_data[d])

Cleaning the data:
Month: 201709 
Rows: 98558	 Cols: 15	 NaNs: 26242
Trivial dur: 3483 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 95075	 Cols: 13	 NaNs: 0
Month: 201710 
Rows: 108937	 Cols: 15	 NaNs: 27222
Trivial dur: 7723 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 104697	 Cols: 13	 NaNs: 0
Month: 201711 
Rows: 95612	 Cols: 15	 NaNs: 19005
Trivial dur: 11747 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 91588	 Cols: 13	 NaNs: 0
Month: 201712 
Rows: 86539	 Cols: 15	 NaNs: 16586
Trivial dur: 15427 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 82859	 Cols: 13	 NaNs: 0
Month: 201801 
Rows: 94802	 Cols: 16	 NaNs: 15640
Trivial dur: 19559 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 90670	 Cols: 13	 NaNs: 0
Month: 201802 
Rows: 106718	 Cols: 16	 NaNs: 16148
Trivial dur: 24418 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 101859	 Cols: 13	 NaNs: 0
Month: 201803 
Rows: 111382	 Cols: 16	 NaNs: 18032
Trivial dur: 29483 dist: 0
Outlier loc: 0 dur: 0 dist: 0
Rows: 106317	 Cols: 13	 NaNs: 0
Month: 2018

<hr>
<a name="merge"></a>

## <span style='color:#3b748a'> VI. Merge the DataFrames into 1 big DataFrame</span>


In [24]:
n_rows = 0
df = pd.DataFrame()
for d in trip_data:
    n_rows += df_data[d].shape[0]
    df = df.append(df_data[d])

if n_rows != df.shape[0]:
    print("There is a problem with the DataFrame merge!")

<hr>
<a name="explore"></a>

## <span style='color:#3b748a'> VII. Explore the data.</span>

In [25]:
df.head()    

Unnamed: 0,Start Hub,Start Latitude,Start Longitude,Start Date,Start Time,End Hub,End Latitude,End Longitude,End Date,End Time,Bike Name,Distance [Miles],Duration
0,San Francisco City Hall (Polk St at Grove St),37.77865,-122.41823,2017-09-30,2017-09-30 19:14:38.382,3rd St at Townsend St,37.778742,-122.392741,2017-10-01,2017-10-01 17:53:24.600,2757,1.395296,22:38:46
1,The Embarcadero at Sansome St,37.80477,-122.403234,2017-09-30,2017-09-30 18:12:21.667,Berry St at 4th St,37.77588,-122.39317,2017-10-01,2017-10-01 16:53:48.361,2371,2.067208,22:41:26
2,San Francisco Public Library (Grove St at Hyde...,37.778768,-122.415929,2017-09-30,2017-09-30 16:50:35.182,Laguna St at Hayes St,37.776435,-122.426244,2017-10-01,2017-10-01 14:29:55.132,3195,0.587119,21:39:19
3,San Antonio Park,37.79014,-122.242373,2017-09-30,2017-09-30 19:16:34.261,San Antonio Park,37.79014,-122.242373,2017-10-01,2017-10-01 13:54:31.463,736,0.0,18:37:57
4,2nd St at Townsend St - Coming Soon,37.780526,-122.390288,2017-09-30,2017-09-30 15:57:38.683,Webster St at Grove St,37.777053,-122.429558,2017-10-01,2017-10-01 11:41:13.690,73,2.162976,19:43:35


In [26]:
df.shape

(1522374, 13)

In [27]:
df.columns

Index(['Start Hub', 'Start Latitude', 'Start Longitude', 'Start Date',
       'Start Time', 'End Hub', 'End Latitude', 'End Longitude', 'End Date',
       'End Time', 'Bike Name', 'Distance [Miles]', 'Duration'],
      dtype='object')

In [28]:
df.describe()

Unnamed: 0,Start Latitude,Start Longitude,End Latitude,End Longitude,Bike Name,Distance [Miles],Duration
count,1522374.0,1522374.0,1522374.0,1522374.0,1522374.0,1522374.0,1522374
mean,37.76801,-122.3538,37.76818,-122.3532,2094.845,1.025729,0 days 00:15:55.454902
std,0.09913807,0.1179655,0.0990363,0.1174223,1159.709,0.6318636,0 days 00:46:05.212341
min,37.31285,-122.4737,37.28,-122.4737,10.0,0.0,0 days 00:03:00
25%,37.77106,-122.4117,37.77166,-122.4103,1126.0,0.5852783,0 days 00:06:25
50%,37.78127,-122.3983,37.78165,-122.3974,2157.0,0.8816876,0 days 00:09:48
75%,37.79539,-122.2948,37.79539,-122.2994,3040.0,1.3239,0 days 00:15:12
max,37.88022,-121.84,37.88022,-121.84,4466.0,42.37529,0 days 23:59:29


In [29]:
if False:
    check_data(df)
    check_cols(df)

In [36]:
df_unknown = df[(df["Start Hub"] == "Unknown") | (df["Start Hub"] == "Unknown")].copy()
df_unknown.shape

(7650, 13)

In [37]:
df_unknown.describe()

Unnamed: 0,Start Latitude,Start Longitude,End Latitude,End Longitude,Bike Name,Distance [Miles],Duration
count,7650.0,7650.0,7650.0,7650.0,7650.0,7650.0,7650
mean,37.403454,-121.939016,37.402868,-121.938736,4172.396209,0.66971,0 days 00:28:40.141176
std,0.010805,0.013237,0.01168,0.013528,85.166273,0.729365,0 days 01:34:34.690294
min,37.32,-121.99,37.28,-122.0,3745.0,0.0,0 days 00:03:00
25%,37.4,-121.95,37.4,-121.95,4123.0,0.0,0 days 00:07:22
50%,37.4,-121.94,37.4,-121.94,4172.0,0.550182,0 days 00:13:14
75%,37.41,-121.93,37.41,-121.93,4245.0,0.88223,0 days 00:26:42
max,37.45,-121.84,37.44,-121.84,4293.0,9.988703,0 days 23:46:14


In [30]:
# Check dates (Sep 2017 - July 2018 has 334 days)
print("Min start date: {}".format(df['Start Date'].min()))
print("Min end date: {}".format(df['End Date'].min()))
print("Max start date: {}".format(df['Start Date'].max()))
print("Max end date: {}".format(df['End Date'].max()))
print("Number of days: {}".format(len(set(df['Start Date']))))

Min start date: 2017-09-01
Min end date: 2017-09-01
Max start date: 2018-08-31
Max end date: 2018-09-01
Number of days: 365


#### <span style='color:#4095b5'>Fewest rentals</span>
<li><span style='color:#4095b5'>26 nov 2017 - ??</span></li>
    
#### <span style='color:#4095b5'>Most rentals</span>
<li><span style='color:#4095b5'>25 jul 2018 - Phish?</span></li>

#### <span style='color:#4095b5'>Outliers on Upper side of Total or Avg Duration per day of week</span>
<li><span style='color:#4095b5'>3 sep 2017 - Labor day weekend</span></li>
<li><span style='color:#4095b5'>23-24 nov 2017 - Thanksgiving</span></li>
<li><span style='color:#4095b5'>25 dec 2017</span></li>
<li><span style='color:#4095b5'>1 jan 2018</span></li>


<hr>
<a name="write"></a>

## <span style='color:#3b748a'>VIII. Write the full DataFrame to a csv file.</span>

In [31]:
df.to_csv('../data/sf/trips_all.csv', index=False)