# 2016 US Bike Share Activity Snapshot - 

# Update to be Used for Pandas Frame to Prepare Tableau Geo Tag Analysis


<a id='intro'></a>
## Introduction

> **Tip**: Quoted sections like this will provide helpful instructions on how to navigate and use a Jupyter notebook.

Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles for short trips, typically 30 minutes or less. Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, you will perform an exploratory analysis on data provided by [Motivate](https://www.motivateco.com/), a bike-share system provider for many major cities in the United States. You will compare the system usage between three large cities: New York City, Chicago, and Washington, DC. You will also see if there are any differences within each system for those users that are registered, regular users and those users that are short-term, casual users.

<a id='wrangling'></a>
## Data Collection and Wrangling

Now it's time to collect and explore our data. In this project, we will focus on the record of individual trips taken in 2016 from our selected cities: New York City, Chicago, and Washington, DC. Each of these cities has a page where we can freely download the trip data.:

- New York City (Citi Bike): [Link](https://www.citibikenyc.com/system-data)
- Chicago (Divvy): [Link](https://www.divvybikes.com/system-data)
- Washington, DC (Capital Bikeshare): [Link](https://www.capitalbikeshare.com/system-data)

If you visit these pages, you will notice that each city has a different way of delivering its data. Chicago updates with new data twice a year, Washington DC is quarterly, and New York City is monthly. **However, you do not need to download the data yourself.** The data has already been collected for you in the `/data/` folder of the project files. While the original data for 2016 is spread among multiple files for each city, the files in the `/data/` folder collect all of the trip data for the year into one file per city. Some data wrangling of inconsistencies in timestamp format within each city has already been performed for you. In addition, a random 2% sample of the original data is taken to make the exploration more manageable. 

**Question 2**: However, there is still a lot of data for us to investigate, so it's a good idea to start off by looking at one entry from each of the cities we're going to analyze. Run the first code cell below to load some packages and functions that you'll be using in your analysis. Then, complete the second code cell to print out the first trip recorded from each of the cities (the second line of each data file).

> **Tip**: You can run a code cell like you formatted Markdown cells above by clicking on the cell and using the keyboard shortcut **Shift** + **Enter** or **Shift** + **Return**. Alternatively, a code cell can be executed using the **Play** button in the toolbar after selecting it. While the cell is running, you will see an asterisk in the message to the left of the cell, i.e. `In [*]:`. The asterisk will change into a number to show that execution has completed, e.g. `In [1]`. If there is output, it will show up as `Out [1]:`, with an appropriate number to match the "In" number.

In [459]:
## import all necessary packages and functions.
import csv # read and write csv files
import pandas as pd
from datetime import datetime # operations to parse dates
from pprint import pprint # use to print data structures like dictionaries in
                          # a nicer way than the base print function.

In [460]:
def print_first_point(filename):
    """
    This function prints and returns the first data point (second row) from
    a csv file that includes a header row.
    """
    # print city name for reference
    city = filename.split('-')[0].split('/')[-1]
    print('\nCity: {}'.format(city))

    
    with open(filename, 'r') as f_in:
        ## TODO: Use the csv library to set up a DictReader object. ##
        ## see https://docs.python.org/3/library/csv.html           ##
        trip_reader = csv.DictReader(f_in)
        
        ## TODO: Use a function on the DictReader object to read the     ##
        ## first trip from the data file and store it in a variable.     ##
        ## see https://docs.python.org/3/library/csv.html#reader-objects ##
        first_trip = trip_reader.__next__()

        ## TODO: Use the pprint library to print the first trip. ##
        ## see https://docs.python.org/3/library/pprint.html     ##
        pprint(first_trip)
        
    # output city name and first trip for later testing
    return (city, first_trip)

# list of files for each city
data_files = ['NYC-CitiBike-2016.csv',
              'Chicago-Divvy-2016.csv',
              'Washington-CapitalBikeshare-2016.csv',]

# print the first trip from each file, store in dictionary
example_trips = {}
for data_file in data_files:
    city, first_trip = print_first_point(data_file)
    example_trips[city] = first_trip


City: NYC
OrderedDict([('tripduration', '839'),
             ('starttime', '1/1/2016 00:09:55'),
             ('stoptime', '1/1/2016 00:23:54'),
             ('start station id', '532'),
             ('start station name', 'S 5 Pl & S 4 St'),
             ('start station latitude', '40.710451'),
             ('start station longitude', '-73.960876'),
             ('end station id', '401'),
             ('end station name', 'Allen St & Rivington St'),
             ('end station latitude', '40.72019576'),
             ('end station longitude', '-73.98997825'),
             ('bikeid', '17109'),
             ('usertype', 'Customer'),
             ('birth year', ''),
             ('gender', '0')])

City: Chicago
OrderedDict([('trip_id', '9080545'),
             ('starttime', '3/31/2016 23:30'),
             ('stoptime', '3/31/2016 23:46'),
             ('bikeid', '2295'),
             ('tripduration', '926'),
             ('from_station_id', '156'),
             ('from_station_name', 'Clar

If everything has been filled out correctly, you should see below the printout of each city name (which has been parsed from the data file name) that the first trip has been parsed in the form of a dictionary. When you set up a `DictReader` object, the first row of the data file is normally interpreted as column names. Every other row in the data file will use those column names as keys, as a dictionary is generated for each row.

This will be useful since we can refer to quantities by an easily-understandable label instead of just a numeric index. For example, if we have a trip stored in the variable `row`, then we would rather get the trip duration from `row['duration']` instead of `row[0]`.

<a id='condensing'></a>
### Condensing the Trip Data

It should also be observable from the above printout that each city provides different information. Even where the information is the same, the column names and formats are sometimes different. To make things as simple as possible when we get to the actual exploration, we should trim and clean the data. Cleaning the data makes sure that the data formats across the cities are consistent, while trimming focuses only on the parts of the data we are most interested in to make the exploration easier to work with.

You will generate new data files with five values of interest for each trip: trip duration, starting month, starting hour, day of the week, and user type. Each of these may require additional wrangling depending on the city:

- **Duration**: This has been given to us in seconds (New York, Chicago) or milliseconds (Washington). A more natural unit of analysis will be if all the trip durations are given in terms of minutes.
- **Month**, **Hour**, **Day of Week**: Ridership volume is likely to change based on the season, time of day, and whether it is a weekday or weekend. Use the start time of the trip to obtain these values. The New York City data includes the seconds in their timestamps, while Washington and Chicago do not. The [`datetime`](https://docs.python.org/3/library/datetime.html) package will be very useful here to make the needed conversions.
- **User Type**: It is possible that users who are subscribed to a bike-share system will have different patterns of use compared to users who only have temporary passes. Washington divides its users into two types: 'Registered' for users with annual, monthly, and other longer-term subscriptions, and 'Casual', for users with 24-hour, 3-day, and other short-term passes. The New York and Chicago data uses 'Subscriber' and 'Customer' for these groups, respectively. For consistency, you will convert the Washington labels to match the other two.

In [461]:
def duration_in_mins(datum, city):
    """
    Takes as input a dictionary containing info about a single trip (datum) and
    its origin city (city) and returns the trip duration in units of minutes.
    
    Remember that Washington is in terms of milliseconds while Chicago and NYC
    are in terms of seconds. 
    
    HINT: The csv module reads in all of the data as strings, including numeric
    values. You will need a function to convert the strings into an appropriate
    numeric type when making your transformations.
    see https://docs.python.org/3/library/functions.html
    """

    # YOUR CODE HERE   
    # PLEASE NOTE I DEFINED CITY ON TOP OF THIS CELL

    
    if city == 'Washington':
        return int(datum['Duration (ms)'])/(1000*60)
    else:
        return int(datum['tripduration'])/60
    

# Some tests to check that your code works. There should be no output if all of
# the assertions pass. The `example_trips` dictionary was obtained from when
# you printed the first trip from each of the original data files.
tests = {'NYC': 13.9833,
         'Chicago': 15.4333,
         'Washington': 7.1231}

for city in tests:
    assert abs(duration_in_mins(example_trips[city], city) - tests[city]) < .001
    

In [462]:
def time_of_trip(datum, city):
    """
    Takes as input a dictionary containing info about a single trip (datum) and
    its origin city (city) and returns the month, hour, and day of the week in
    which the trip was made.
    
    Remember that NYC includes seconds, while Washington and Chicago do not.
    
    HINT: You should use the datetime module to parse the original date
    strings into a format that is useful for extracting the desired information.
    see https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
    """
    
    # YOUR CODE HERE
    
    day_of_week_dict = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 
                        5: 'Saturday', 6: 'Sunday'}
    
    if city == 'NYC':
    
                time_of_trip = datetime.strptime(datum['starttime'],'%m/%d/%Y %X')
                day_of_week_int = time_of_trip.weekday()
                day_of_week = day_of_week_dict [day_of_week_int]
                month = time_of_trip.month
                hour = time_of_trip.hour
    
    elif city == 'Chicago':
        
                time_of_trip = datetime.strptime(datum['starttime'],'%m/%d/%Y %H:%M')
                day_of_week_int = time_of_trip.weekday()
                day_of_week = day_of_week_dict [day_of_week_int]
                month = time_of_trip.month
                hour = time_of_trip.hour

            
    elif city == 'Washington':
        
                time_of_trip = datetime.strptime(datum['Start date'],'%m/%d/%Y %H:%M')
                day_of_week_int = time_of_trip.weekday()
                day_of_week = day_of_week_dict [day_of_week_int]
                month = time_of_trip.month
                hour = time_of_trip.hour
    
    return (month, hour, day_of_week)

    
# Some tests to check that your code works. There should be no output if all of
# the assertions pass. The `example_trips` dictionary was obtained from when
# you printed the first trip from each of the original data files.
tests = {'NYC': (1, 0, 'Friday'),
         'Chicago': (3, 23, 'Thursday'),
         'Washington': (3, 22, 'Thursday')}

for city in tests:
    assert time_of_trip(example_trips[city], city) == tests[city]

In [463]:
def type_of_user(datum, city):

    if city == 'Washington':
        if datum['Member Type'] == 'Registered':
            user_type = 'Subscriber'
        elif datum['Member Type'] == 'Casual':
            user_type = 'Customer'

    else:
        user_type = datum['usertype']

    return user_type

tests = {'NYC': 'Customer',
         'Chicago': 'Subscriber',
         'Washington': 'Subscriber'}

for city in tests:
    assert type_of_user(example_trips[city], city) == tests[city]

**Question 3b**: Now, use the helper functions you wrote above to create a condensed data file for each city consisting only of the data fields indicated above. In the `/examples/` folder, you will see an example datafile from the [Bay Area Bike Share](http://www.bayareabikeshare.com/open-data) before and after conversion. Make sure that your output is formatted to be consistent with the example file.

In [464]:
def condense_data(in_file, out_file, city):
    """
    This function takes full data from the specified input file
    and writes the condensed data to a specified output file. The city
    argument determines how the input file will be parsed.
    
    HINT: See the cell below to see how the arguments are structured!
    """
    
    with open(out_file, 'w') as f_out, open(in_file, 'r') as f_in:
        # set up csv DictWriter object - writer requires column names for the
        # first row as the "fieldnames" argument
        out_colnames = ['duration', 'month', 'hour', 'day_of_week', 'user_type']        
        trip_writer = csv.DictWriter(f_out, fieldnames = out_colnames)
        trip_writer.writeheader()
        
        ## TODO: set up csv DictReader object ##
        trip_reader = csv.DictReader(f_in)
        
        # collect data from and process each row
        for row in trip_reader:
            # set up a dictionary to hold the values for the cleaned and trimmed
            # data point
            
            new_point = {'duration': duration_in_mins(row, city), 'month': (time_of_trip(row, city)[0]),
                         'hour': (time_of_trip(row, city)[1]), 'day_of_week': (time_of_trip(row, city)[2]),
                         'user_type': type_of_user(row, city)}
            
            ## TODO: use the helper functions to get the cleaned data from  ##
            ## the original data dictionaries.                              ##
            ## Note that the keys for the new_point dictionary should match ##
            ## the column names set in the DictWriter object above.         ##
            
            trip_writer.writerow(new_point)
                
            ## TODO: write the processed information to the output file.     ##
            ## see https://docs.python.org/3/library/csv.html#writer-objects ##

In [465]:
# Run this cell to check your work
city_info = {'Washington': {'in_file': 'Washington-CapitalBikeshare-2016.csv',
                            'out_file': 'Washington-2016-Summary.csv'},
             'Chicago': {'in_file': 'Chicago-Divvy-2016.csv',
                         'out_file': 'Chicago-2016-Summary.csv'},
             'NYC': {'in_file': 'NYC-CitiBike-2016.csv',
                     'out_file': 'NYC-2016-Summary.csv'}}

for city, filenames in city_info.items():
    condense_data(filenames['in_file'], filenames['out_file'], city)
    print_first_point(filenames['out_file'])


City: Washington
OrderedDict([('duration', '7.123116666666666'),
             ('month', '3'),
             ('hour', '22'),
             ('day_of_week', 'Thursday'),
             ('user_type', 'Subscriber')])

City: Chicago
OrderedDict([('duration', '15.433333333333334'),
             ('month', '3'),
             ('hour', '23'),
             ('day_of_week', 'Thursday'),
             ('user_type', 'Subscriber')])

City: NYC
OrderedDict([('duration', '13.983333333333333'),
             ('month', '1'),
             ('hour', '0'),
             ('day_of_week', 'Friday'),
             ('user_type', 'Customer')])


In [466]:
#creating dataframes

NYC_addon = pd.read_csv('NYC-2016-Summary.csv')
WAS_addon = pd.read_csv('Chicago-2016-Summary.csv') 
CHI_addon = pd.read_csv('Washington-2016-Summary.csv')

In [467]:
NYC = pd.read_csv('NYC-CitiBike-2016.csv')
WAS = pd.read_csv('Chicago-Divvy-2016.csv') 
CHI = pd.read_csv('Washington-CapitalBikeshare-2016.csv')

In [468]:
frames_NYC = [NYC, NYC_addon]
NYC_long = pd.concat(frames_NYC, axis=1)
frames_WAS = [WAS, WAS_addon]
WAS_long = pd.concat(frames_WAS, axis=1)
frames_CHI = [CHI, CHI_addon]
CHI_long = pd.concat(frames_CHI, axis=1)

In [469]:
NYC_long.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276798 entries, 0 to 276797
Data columns (total 20 columns):
tripduration               276798 non-null int64
starttime                  276798 non-null object
stoptime                   276798 non-null object
start station id           276798 non-null int64
start station name         276798 non-null object
start station latitude     276798 non-null float64
start station longitude    276798 non-null float64
end station id             276798 non-null int64
end station name           276798 non-null object
end station latitude       276798 non-null float64
end station longitude      276798 non-null float64
bikeid                     276798 non-null int64
usertype                   276081 non-null object
birth year                 245137 non-null float64
gender                     276798 non-null int64
duration                   276798 non-null float64
month                      276798 non-null int64
hour                       276798 non-n

In [470]:
# creating new columns for origin/destination

NYC_long = NYC_long.append(NYC_long)
NYC_long ['Origin-Destination'] = 'Origin'


In [471]:
NYC_long.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 553596 entries, 0 to 276797
Data columns (total 21 columns):
tripduration               553596 non-null int64
starttime                  553596 non-null object
stoptime                   553596 non-null object
start station id           553596 non-null int64
start station name         553596 non-null object
start station latitude     553596 non-null float64
start station longitude    553596 non-null float64
end station id             553596 non-null int64
end station name           553596 non-null object
end station latitude       553596 non-null float64
end station longitude      553596 non-null float64
bikeid                     553596 non-null int64
usertype                   552162 non-null object
birth year                 490274 non-null float64
gender                     553596 non-null int64
duration                   553596 non-null float64
month                      553596 non-null int64
hour                       553596 non-n

In [472]:
NYC_long.head()

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,...,bikeid,usertype,birth year,gender,duration,month,hour,day_of_week,user_type,Origin-Destination
0,839,1/1/2016 00:09:55,1/1/2016 00:23:54,532,S 5 Pl & S 4 St,40.710451,-73.960876,401,Allen St & Rivington St,40.720196,...,17109,Customer,,0,13.983333,1,0,Friday,Customer,Origin
1,686,1/1/2016 00:21:17,1/1/2016 00:32:44,3143,5 Ave & E 78 St,40.776829,-73.963888,3132,E 59 St & Madison Ave,40.763505,...,23514,Subscriber,1960.0,1,11.433333,1,0,Friday,Subscriber,Origin
2,315,1/1/2016 00:33:11,1/1/2016 00:38:26,3164,Columbus Ave & W 72 St,40.777057,-73.978985,3178,Riverside Dr & W 78 St,40.784145,...,14536,Subscriber,1971.0,1,5.25,1,0,Friday,Subscriber,Origin
3,739,1/1/2016 00:40:51,1/1/2016 00:53:11,223,W 13 St & 7 Ave,40.737815,-73.999947,276,Duane St & Greenwich St,40.717488,...,24062,Subscriber,1969.0,1,12.316667,1,0,Friday,Subscriber,Origin
4,1253,1/1/2016 00:44:16,1/1/2016 01:05:09,484,W 44 St & 5 Ave,40.755003,-73.980144,151,Cleveland Pl & Spring St,40.722104,...,16380,Customer,,0,20.883333,1,0,Friday,Customer,Origin


In [473]:
NYC_long.iloc[0]

tripduration                                   839
starttime                        1/1/2016 00:09:55
stoptime                         1/1/2016 00:23:54
start station id                               532
start station name                 S 5 Pl & S 4 St
start station latitude                     40.7105
start station longitude                   -73.9609
end station id                                 401
end station name           Allen St & Rivington St
end station latitude                       40.7202
end station longitude                       -73.99
bikeid                                       17109
usertype                                  Customer
birth year                                     NaN
gender                                           0
duration                                   13.9833
month                                            1
hour                                             0
day_of_week                                 Friday
user_type                      

In [474]:
NYC_long.iloc[276798]

tripduration                                   839
starttime                        1/1/2016 00:09:55
stoptime                         1/1/2016 00:23:54
start station id                               532
start station name                 S 5 Pl & S 4 St
start station latitude                     40.7105
start station longitude                   -73.9609
end station id                                 401
end station name           Allen St & Rivington St
end station latitude                       40.7202
end station longitude                       -73.99
bikeid                                       17109
usertype                                  Customer
birth year                                     NaN
gender                                           0
duration                                   13.9833
month                                            1
hour                                             0
day_of_week                                 Friday
user_type                      

In [475]:
NYC_long['Origin-Destination'].iloc[276798:] = 'Destination'
NYC_long.iloc[276798]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


tripduration                                   839
starttime                        1/1/2016 00:09:55
stoptime                         1/1/2016 00:23:54
start station id                               532
start station name                 S 5 Pl & S 4 St
start station latitude                     40.7105
start station longitude                   -73.9609
end station id                                 401
end station name           Allen St & Rivington St
end station latitude                       40.7202
end station longitude                       -73.99
bikeid                                       17109
usertype                                  Customer
birth year                                     NaN
gender                                           0
duration                                   13.9833
month                                            1
hour                                             0
day_of_week                                 Friday
user_type                      

In [476]:
NYC_long['Origin-Destination'].value_counts()

Origin         276798
Destination    276798
Name: Origin-Destination, dtype: int64

In [477]:
# Merge location information in one latitude and one longitude column
import numpy as np

NYC_long['start station latitude'].iloc[276798:] = np.nan
NYC_long['start station longitude'].iloc[276798:] = np.nan
NYC_long['end station latitude'].iloc[:276797] = np.nan
NYC_long['end station longitude'].iloc[:276797] = np.nan
NYC_long.iloc[0]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


tripduration                                   839
starttime                        1/1/2016 00:09:55
stoptime                         1/1/2016 00:23:54
start station id                               532
start station name                 S 5 Pl & S 4 St
start station latitude                     40.7105
start station longitude                   -73.9609
end station id                                 401
end station name           Allen St & Rivington St
end station latitude                           NaN
end station longitude                          NaN
bikeid                                       17109
usertype                                  Customer
birth year                                     NaN
gender                                           0
duration                                   13.9833
month                                            1
hour                                             0
day_of_week                                 Friday
user_type                      

In [478]:
NYC_long.iloc[276798]

tripduration                                   839
starttime                        1/1/2016 00:09:55
stoptime                         1/1/2016 00:23:54
start station id                               532
start station name                 S 5 Pl & S 4 St
start station latitude                         NaN
start station longitude                        NaN
end station id                                 401
end station name           Allen St & Rivington St
end station latitude                       40.7202
end station longitude                       -73.99
bikeid                                       17109
usertype                                  Customer
birth year                                     NaN
gender                                           0
duration                                   13.9833
month                                            1
hour                                             0
day_of_week                                 Friday
user_type                      

In [479]:
NYC_long.head()

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,...,bikeid,usertype,birth year,gender,duration,month,hour,day_of_week,user_type,Origin-Destination
0,839,1/1/2016 00:09:55,1/1/2016 00:23:54,532,S 5 Pl & S 4 St,40.710451,-73.960876,401,Allen St & Rivington St,,...,17109,Customer,,0,13.983333,1,0,Friday,Customer,Origin
1,686,1/1/2016 00:21:17,1/1/2016 00:32:44,3143,5 Ave & E 78 St,40.776829,-73.963888,3132,E 59 St & Madison Ave,,...,23514,Subscriber,1960.0,1,11.433333,1,0,Friday,Subscriber,Origin
2,315,1/1/2016 00:33:11,1/1/2016 00:38:26,3164,Columbus Ave & W 72 St,40.777057,-73.978985,3178,Riverside Dr & W 78 St,,...,14536,Subscriber,1971.0,1,5.25,1,0,Friday,Subscriber,Origin
3,739,1/1/2016 00:40:51,1/1/2016 00:53:11,223,W 13 St & 7 Ave,40.737815,-73.999947,276,Duane St & Greenwich St,,...,24062,Subscriber,1969.0,1,12.316667,1,0,Friday,Subscriber,Origin
4,1253,1/1/2016 00:44:16,1/1/2016 01:05:09,484,W 44 St & 5 Ave,40.755003,-73.980144,151,Cleveland Pl & Spring St,,...,16380,Customer,,0,20.883333,1,0,Friday,Customer,Origin


In [480]:
NYC_long['start station latitude'].fillna(0, inplace=True)
NYC_long['end station latitude'].fillna(0, inplace=True)

NYC_lat = NYC_long['start station latitude'] + NYC_long['end station latitude']

NYC_lat = pd.DataFrame(data=NYC_lat,  columns=['latitude']) 

NYC_lat.head()

Unnamed: 0,latitude
0,40.710451
1,40.776829
2,40.777057
3,40.737815
4,40.755003


In [481]:
NYC_lat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 553596 entries, 0 to 276797
Data columns (total 1 columns):
latitude    553596 non-null float64
dtypes: float64(1)
memory usage: 8.4 MB


In [482]:
NYC_long['start station longitude'].fillna(0, inplace=True)
NYC_long['end station longitude'].fillna(0, inplace=True)

NYC_lon = NYC_long['start station longitude'] + NYC_long['end station longitude']

NYC_lon = pd.DataFrame(data=NYC_lon,  columns=['longitude']) 

NYC_lon.head()

Unnamed: 0,longitude
0,-73.960876
1,-73.963888
2,-73.978985
3,-73.999947
4,-73.980144


In [483]:
NYC_lon.tail()

Unnamed: 0,longitude
276793,-73.972826
276794,-74.003664
276795,-73.976206
276796,-73.941342
276797,-73.993722


In [484]:
NYC_lon.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 553596 entries, 0 to 276797
Data columns (total 1 columns):
longitude    553596 non-null float64
dtypes: float64(1)
memory usage: 8.4 MB


In [485]:
NYC_long.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 553596 entries, 0 to 276797
Data columns (total 21 columns):
tripduration               553596 non-null int64
starttime                  553596 non-null object
stoptime                   553596 non-null object
start station id           553596 non-null int64
start station name         553596 non-null object
start station latitude     553596 non-null float64
start station longitude    553596 non-null float64
end station id             553596 non-null int64
end station name           553596 non-null object
end station latitude       553596 non-null float64
end station longitude      553596 non-null float64
bikeid                     553596 non-null int64
usertype                   552162 non-null object
birth year                 490274 non-null float64
gender                     553596 non-null int64
duration                   553596 non-null float64
month                      553596 non-null int64
hour                       553596 non-n

In [486]:
NYC_lon.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 553596 entries, 0 to 276797
Data columns (total 1 columns):
longitude    553596 non-null float64
dtypes: float64(1)
memory usage: 8.4 MB


In [487]:
NYC_geo = NYC_long

NYC_geo['latitude'] = NYC_lat
NYC_geo['longitude'] = NYC_lon

NYC_geo.head()

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,...,birth year,gender,duration,month,hour,day_of_week,user_type,Origin-Destination,latitude,longitude
0,839,1/1/2016 00:09:55,1/1/2016 00:23:54,532,S 5 Pl & S 4 St,40.710451,-73.960876,401,Allen St & Rivington St,0.0,...,,0,13.983333,1,0,Friday,Customer,Origin,40.710451,-73.960876
1,686,1/1/2016 00:21:17,1/1/2016 00:32:44,3143,5 Ave & E 78 St,40.776829,-73.963888,3132,E 59 St & Madison Ave,0.0,...,1960.0,1,11.433333,1,0,Friday,Subscriber,Origin,40.776829,-73.963888
2,315,1/1/2016 00:33:11,1/1/2016 00:38:26,3164,Columbus Ave & W 72 St,40.777057,-73.978985,3178,Riverside Dr & W 78 St,0.0,...,1971.0,1,5.25,1,0,Friday,Subscriber,Origin,40.777057,-73.978985
3,739,1/1/2016 00:40:51,1/1/2016 00:53:11,223,W 13 St & 7 Ave,40.737815,-73.999947,276,Duane St & Greenwich St,0.0,...,1969.0,1,12.316667,1,0,Friday,Subscriber,Origin,40.737815,-73.999947
4,1253,1/1/2016 00:44:16,1/1/2016 01:05:09,484,W 44 St & 5 Ave,40.755003,-73.980144,151,Cleveland Pl & Spring St,0.0,...,,0,20.883333,1,0,Friday,Customer,Origin,40.755003,-73.980144


In [488]:
NYC_geo['latitude'].tail()

276793    40.752554
276794    40.743174
276795    40.775794
276796    40.698617
276797    40.756458
Name: latitude, dtype: float64

In [489]:
NYC_geo.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 553596 entries, 0 to 276797
Data columns (total 23 columns):
tripduration               553596 non-null int64
starttime                  553596 non-null object
stoptime                   553596 non-null object
start station id           553596 non-null int64
start station name         553596 non-null object
start station latitude     553596 non-null float64
start station longitude    553596 non-null float64
end station id             553596 non-null int64
end station name           553596 non-null object
end station latitude       553596 non-null float64
end station longitude      553596 non-null float64
bikeid                     553596 non-null int64
usertype                   552162 non-null object
birth year                 490274 non-null float64
gender                     553596 non-null int64
duration                   553596 non-null float64
month                      553596 non-null int64
hour                       553596 non-n

In [490]:
WAS_long.head()

Unnamed: 0,trip_id,starttime,stoptime,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear,duration,month,hour,day_of_week,user_type
0,9080545,3/31/2016 23:30,3/31/2016 23:46,2295,926,156,Clark St & Wellington Ave,166,Ashland Ave & Wrightwood Ave,Subscriber,Male,1990.0,15.433333,3,23,Thursday,Subscriber
1,9080521,3/31/2016 22:59,3/31/2016 23:02,3439,198,259,California Ave & Francis Pl,276,California Ave & North Ave,Subscriber,Male,1974.0,3.3,3,22,Thursday,Subscriber
2,9080479,3/31/2016 22:24,3/31/2016 22:26,4337,124,344,Ravenswood Ave & Lawrence Ave,242,Damen Ave & Leland Ave,Subscriber,Female,1992.0,2.066667,3,22,Thursday,Subscriber
3,9080475,3/31/2016 22:22,3/31/2016 22:41,3760,1181,318,Southport Ave & Irving Park Rd,458,Broadway & Thorndale Ave,Subscriber,Female,1979.0,19.683333,3,22,Thursday,Subscriber
4,9080443,3/31/2016 22:08,3/31/2016 22:19,1270,656,345,Lake Park Ave & 56th St,426,Ellis Ave & 60th St,Subscriber,Female,1997.0,10.933333,3,22,Thursday,Subscriber


In [491]:
CHI_long.head()

Unnamed: 0,Duration (ms),Start date,End date,Start station number,Start station,End station number,End station,Bike number,Member Type,duration,month,hour,day_of_week,user_type
0,427387,3/31/2016 22:57,3/31/2016 23:04,31602,Park Rd & Holmead Pl NW,31207,Georgia Ave and Fairmont St NW,W20842,Registered,7.123117,3,22,Thursday,Subscriber
1,587551,3/31/2016 22:46,3/31/2016 22:56,31105,14th & Harvard St NW,31266,11th & M St NW,W21385,Registered,9.792517,3,22,Thursday,Subscriber
2,397979,3/31/2016 22:46,3/31/2016 22:53,31634,3rd & Tingey St SE,31108,4th & M St SW,W00773,Registered,6.632983,3,22,Thursday,Subscriber
3,444282,3/31/2016 22:42,3/31/2016 22:50,31200,Massachusetts Ave & Dupont Circle NW,31201,15th & P St NW,W21397,Registered,7.4047,3,22,Thursday,Subscriber
4,780875,3/31/2016 22:21,3/31/2016 22:34,31203,14th & Rhode Island Ave NW,31604,3rd & H St NW,W00213,Registered,13.014583,3,22,Thursday,Subscriber


In [492]:
NYC_geo.to_csv('NYC_bikeshare_master.csv')
WAS_long.to_csv('WAS_bikeshare_master.csv')
CHI_long.to_csv('CHI_bikeshare_master.csv')