# Explore US Bikeshare Data

In this project, I used Python to explore data related to bike share systems for three major cities in the Unitied States - Chicago, New York City, and Washington.
I wrote code to import the data and answering interesting questions about it by computing descriptive statistics.
I also wrote a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

### What Software I used? 

To complete this project, the following software requirements apply:
* You should have Python 3, Numpy, and pandas installed using Anaconda.
* A text editor, like Sublime or Atom(which I used).
* A terminal application (Terminal on Mac and Linux or Windows)

<img src = 'divvy.jpg' alt='alt text' title='title'/>

## Bike Share Data overview

Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, I used data provided by motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. I compared the system usage between three large cities: Chicago, New York City, and Washington, DC.

### The Datasets

Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:
* Start Time (e.g., 2017-01-01 00:07:57)
* End Time (e.g., 2017-01-01 00:20:53)
* Trip Duration (in seconds - e.g., 776)
* Start Station (e.g., Broadway & Barry Ave)
* End Station (e.g., Sedgwick St & North Ave)
* User Type (Subscriber or Customer)

The Chicago and New York City files also have the following two columns:
* Gender
* Birth Year

In [79]:
import time
import pandas as pd
import numpy as np
from datetime import datetime

In [80]:
CITY_DATA={ 'chicago': 'chicago.csv',
            'new york city': 'new_york_city.csv',
            'washington': 'washington.csv' }

In [81]:
months=['january', 'february', 'march', 'april', 'may', 'june',
      'july', 'august', 'september', 'october', 'november', 'december']
days=['saturday', 'sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday']
cities=['chicago', 'new york city', 'washington']

In [82]:
month_dic={1:'January',
           2:'February',
           3:'March',
           4:'April',
           5:'May',
           6:'June',
           7:'July',
           8:'August',
           9:'September',
           10:'October',
           11:'November',
           12:'December'}

In [83]:
def convert_hours(hrs):
    '''
    Function Usage: Convert the input hours to 12 hours based system
    PARAMETER: the hours we want to convert
    '''
    s = str(hrs)
    if hrs>12:
        return str(hrs-12)+' pm'
    elif hrs==12:
        return s+' pm'
    else:
        return s+' am'

In [84]:
def user_input(num):
    '''
    Function Usage : To get & validate user input
    PARAMETER: num >> used for specifying which input we are using(e.g. city, day, month)
    '''
    input_dict={0: [cities, 'city'],
                1: [days, 'day'],
                2: [months, 'month']}
    while True:
        input_str = input("Please enter {} name: ".format(input_dict[num][1])).lower()
        if input_str in input_dict[num][0] or input_str == 'all':
            return input_str
        else:
            print("Invalid input please enter the {} name".format(input_dict[num][1]))

In [85]:
def get_filters():
    print('Hello! Let\'s explore some US bikeshare data!')
    input_city = user_input(0)
    input_weekday = user_input(1)
    input_month = user_input(2)
    print('-'*40)

    return input_city, input_month, input_weekday

In [86]:
def load_data(city, month, day):
    data = pd.read_csv(CITY_DATA[city])
    data['Start Time'] = pd.to_datetime(data['Start Time'])
    data['month'] = data['Start Time'].dt.month
    data['day_of_week'] = data['Start Time'].dt.day
    #Checking whether the use want to display all (months, days) or specified ones
    if month != 'all':
        month = months.index(month) + 1
        data = data[data['month'] ==  month]
    #all months
    else:
        month = np.arange(0, 13)
        data = data[data['month'].isin(month)]

    if day != 'all':
        day = days.index(day) + 1
        data = data[data['day_of_week'] == day]
    #all days
    else:
        day = np.arange(0, 8)
        data=data[data['day_of_week'].isin(day)]

    return data

In [87]:
def time_stats(df):
    """
    Function Usage: Displays statistics on the most frequent times of travel.
    """
    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()
    # display the most common month
    print("the most common month is: ", month_dic[df.month.mode()[0]])
    # display the most common day of week
    print("the most common day of week is: ", df.day_of_week.mode()[0])
    # display the most common start hour
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    print("the most common start hour is: ", convert_hours(df['Start Time'].dt.hour.mode()[0]))
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

In [88]:
def station_stats(df):
    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # TO DO: display most commonly used start station
    print("The most common start station is: ",df['Start Station'].mode()[0])
    # TO DO: display most commonly used end station
    print("The most common end station is: ",df['End Station'].mode()[0])
    # TO DO: display most frequent combination of start station and end station trip
    print("The most frequent start and end station is: ",
        " / ".join(df.groupby(['Start Station', 'End Station']).size().idxmax()))
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


In [89]:
def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""
    print('\nCalculating Trip Duration...\n')
    start_time = time.time()
    # TO DO: display total travel time
    print("Total travel time is: ",df['Trip Duration'].sum())
    # TO DO: display mean travel time
    print("Average travel time is: ",df['Trip Duration'].mean())
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


In [90]:
def user_stats(df):
    """Displays statistics on bikeshare users."""
    print('\nCalculating User Stats...\n')
    start_time = time.time()
    # TO DO: Display counts of user types
    print("Counts of user types", df['User Type'].value_counts())
    # TO DO: Display counts of gender
    print("\n")
    if 'Gender' and 'Birth Year' in df:
        print("Counts of user gender", df['Gender'].value_counts())
        print("\n")
        # TO DO: Display earliest, most recent, and most common year of birth
        print("Earliest year of birth", df['Birth Year'].min())
        print("Most recent year of birth",df['Birth Year'].max())
        print("Most common year of birth",df['Birth Year'].mode())
        print("\nThis took %s seconds." % (time.time() - start_time))
        print('-'*40)
    else:
        

SyntaxError: unexpected EOF while parsing (<ipython-input-90-b0f9f0884425>, line 19)

In [91]:
def main():
    cnt = 5
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)
        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)
        
        #While loop to check if the user wants to display raw data rows
        while True:
            answer = input("\n Would you like to see a few rows of raw data? Enter yes or no. \n")
            if answer.lower() == 'yes':
                print(df[:cnt])
                cnt+=5
                #Condition to check if we have reached the end of the dataframe
                if cnt > len(df):
                    print("Sorry, there is no more data to view! \n")
                    break
            elif answer.lower() == 'no':
                break
            else:
                print("Please enter a valid input (yes or no).\n")
        #Condition to check if the user want to start over again 
        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


In [92]:
if __name__ == "__main__":
	main()

Hello! Let's explore some US bikeshare data!
Please enter city name: chicago
Please enter day name: all
Please enter month name: all
----------------------------------------

Calculating The Most Frequent Times of Travel...

the most common month is:  June
the most common day of week is:  3
the most common start hour is:  5 pm

This took 0.016361713409423828 seconds.
----------------------------------------

Calculating The Most Popular Stations and Trip...

The most common start station is:  Streeter Dr & Grand Ave
The most common end station is:  Streeter Dr & Grand Ave
The most frequent start and end station is:  Lake Shore Dr & Monroe St / Streeter Dr & Grand Ave

This took 0.022796630859375 seconds.
----------------------------------------

Calculating Trip Duration...

Total travel time is:  51268592
Average travel time is:  870.493615865254

This took 0.0009129047393798828 seconds.
----------------------------------------

Calculating User Stats...

Counts of user types Subscrib