# Explore US Bikeshare Data

## Project Detail

### Bike Share Data
Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, you will use data provided by Motivate(https://www.motivateco.com/), a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

### The Datasets
Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

> * Start Time (e.g., 2017-01-01 00:07:57)
> * End Time (e.g., 2017-01-01 00:20:53)
> * Trip Duration (in seconds - e.g., 776)
> * Start Station (e.g., Broadway & Barry Ave)
> * End Station (e.g., Sedgwick St & North Ave)
> * User Type (Subscriber or Customer)

The Chicago and New York City files also have the following two columns:

> * Gender
> * Birth Year


### Statistics Computed

You will learn about bike share use in Chicago, New York City, and Washington by computing a variety of descriptive statistics. In this project, you'll write code to provide the following information:

#1 Popular times of travel (i.e., occurs most often in the start time)

* most common month
* most common day of week
* most common hour of day

#2 Popular stations and trip

* most common start station
* most common end station
* most common trip from start to end (i.e., most frequent combination of start station and end station)

#3 Trip duration

* total travel time
* average travel time

#4 User info

* counts of each user type
* counts of each gender (only available for NYC and Chicago)
* earliest, most recent, most common year of birth (only available for NYC and Chicago)

### The Files

To answer these questions using Python, you will need to write a Python script. To help guide your work in this project, a template with helper code and comments is provided in a bikeshare.py file, and you will do your scripting in there also. You will need the three city dataset files too:

* chicago.csv
* new_york_city.csv
* washington.csv

All four of these files are zipped up in the Bikeshare file in the resource tab in the sidebar on the left side of this page. You may download and open up that zip file to do your project work on your local machine.

Some versions of this project also include a Project Workspace page in the classroom where the bikeshare.py file and the city dataset files are all included, and you can do all your work with them there.

### Understanding the Data
Let's use pandas to better understand the bike share data!

* What columns are in this dataset?
* Are there any missing values?
* What are the different types of values in each column?


Some useful pandas methods:

* df.head()
* df.columns
* df.describe()
* df.info()
* df['column_name'].value_counts()
*     df['column_name'].unique()


### Practice Problem #1: Compute the Most Popular Start Hour
Use pandas to load chicago.csv into a dataframe, and find the most frequent hour when people start traveling. There isn't an hour column in this dataset, but you can create one by extracting the hour from the "Start Time" column. To do this, you can convert "Start Time" to the datetime datatype using the pandas to_datetime() method and extracting properties such as the hour with these properties.

Hint: Another way to describe the most common value in a column is the mode.

In [114]:
import pandas as pd

filename = pd.read_csv('chicago.csv')

# load data file into a dataframe
df = pd.DataFrame(filename)

# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'],yearfirst=True)

# extract hour from the Start Time column to create an hour column
df['hour'] =df['Start Time'].dt.hour

# find the most common hour (from 0 to 23)
popular_hour = df['hour'].mode()[0]

print('Most Frequent Start Hour:', popular_hour)
print(df.head())

Most Frequent Start Hour: 17
   Unnamed: 0          Start Time             End Time  Trip Duration  \
0     1423854 2017-06-23 15:09:32  2017-06-23 15:14:53            321   
1      955915 2017-05-25 18:19:03  2017-05-25 18:45:53           1610   
2        9031 2017-01-04 08:27:49  2017-01-04 08:34:45            416   
3      304487 2017-03-06 13:49:38  2017-03-06 13:55:28            350   
4       45207 2017-01-17 14:53:07  2017-01-17 15:02:01            534   

                   Start Station                   End Station   User Type  \
0           Wood St & Hubbard St       Damen Ave & Chicago Ave  Subscriber   
1            Theater on the Lake  Sheffield Ave & Waveland Ave  Subscriber   
2             May St & Taylor St           Wood St & Taylor St  Subscriber   
3  Christiana Ave & Lawrence Ave  St. Louis Ave & Balmoral Ave  Subscriber   
4         Clark St & Randolph St  Desplaines St & Jackson Blvd  Subscriber   

   Gender  Birth Year  hour  
0    Male      1992.0    15  
1  

### Practice Problem #2: Display a Breakdown of User Types
There are different types of users specified in the "User Type" column. Find how many there are of each type and store the counts in a pandas Series in the user_types variable.

Hint: What pandas function returns a Series with the counts of each unique value in a column?

In [87]:
# print value counts for each user type
user_types = df['User Type'].value_counts()


print(user_types)

Subscriber    330
Customer       70
Name: User Type, dtype: int64


### Practice Problem #3: Load and Filter the Dataset
This is a bit of a bigger task, which involves choosing a dataset to load and filtering it based on a specified month and day. In the quiz below, you'll implement the load_data() function, which you can use directly in your project. There are four steps:

1. Load the dataset for the specified city. Index the global CITY_DATA dictionary object to get the corresponding filename for the given city name.

2. Create month and day_of_week columns. Convert the "Start Time" column to datetime and extract the month number and weekday name into separate columns using the datetime module.

3. Filter by month. Since the month parameter is given as the name of the month, you'll need to first convert this to the corresponding month number. Then, select rows of the dataframe that have the specified month and reassign this as the new dataframe.

4. Filter by day of week. Select rows of the dataframe that have the specified day of week and reassign this as the new dataframe. (Note: Capitalize the day parameter with the title() method to match the title case used in the day_of_week column!)


In [123]:
import pandas as pd

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - pandas DataFrame containing city data filtered by month and day
    """
    
    # load data file into a dataframe
    df =  pd.DataFrame(pd.read_csv(CITY_DATA[city]))

    # convert the Start Time column to datetime
    df['Start Time'] =  pd.to_datetime(df['Start Time'],yearfirst=True)

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name

    
    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) +1
    
        # filter by month to create the new dataframe
        df = df[df['month']==month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    
    return df
    
df = load_data('chicago', 'march', 'friday')
print(df.head())


     Unnamed: 0          Start Time             End Time  Trip Duration  \
37       395803 2017-03-24 15:35:55  2017-03-24 15:46:10            615   
93       395735 2017-03-24 15:32:04  2017-03-24 15:52:53           1249   
175      395402 2017-03-24 15:10:29  2017-03-24 15:19:44            555   
190      393400 2017-03-24 12:29:30  2017-03-24 12:48:56           1166   
198      427496 2017-03-31 08:25:53  2017-03-31 08:39:09            796   

                      Start Station                      End Station  \
37            Dearborn St & Erie St          State St & Van Buren St   
93        Sedgwick St & Webster Ave      Western Ave & Winnebago Ave   
175         Franklin St & Monroe St          Aberdeen St & Monroe St   
190  Southport Ave & Wellington Ave       Lake Shore Dr & North Blvd   
198       Clinton St & Jackson Blvd  Racine Ave (May St) & Fulton St   

      User Type  Gender  Birth Year  month day_of_week  
37   Subscriber    Male      1989.0      3      Friday  
93

In [102]:
CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

CITY_DATA['chicago']

'chicago.csv'

In [None]:
import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
    
    # get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    city = input("Would you like to see data for Chicago, New York City, or Washington? : ")
    
    date_type = input("Would you like to filter the data by month, day, or not at all? (Type 'none' for no time fillter) : ")
    
    if date_type == 'month':
        month = input(" Which month - January, February, March, April, May, or June? : ")
        day='all'
    elif date_type =='day' :
        day = input("Which day - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday? : ")
        month='all'
    elif date_type == 'none':
        month = 'all'
        day='all'
    

    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    df =  pd.DataFrame(pd.read_csv(CITY_DATA[city.lower()]))
    df['Start Time'] =  pd.to_datetime(df['Start Time'],yearfirst=True)
    df['s_month'] = df['Start Time'].dt.month
    df['s_day_of_week'] = df['Start Time'].dt.weekday_name
    df['s_hour'] =df['Start Time'].dt.hour

    df['End Time'] =  pd.to_datetime(df['End Time'],yearfirst=True)
    df['e_month'] = df['End Time'].dt.month
    df['e_day_of_week'] = df['End Time'].dt.weekday_name
    df['e_hour'] =df['End Time'].dt.hour


    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) +1
    
        # filter by month to create the new dataframe
        df = df[df['month']==month.lower()]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['s_day_of_week'] == day.title()]

    return df


def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    print("The most popular month is {}.".format(df.s_month.mode()[0]))
    
    # display the most common day of week
    print("The most popular day is {}.".format(df.s_day_of_week.mode()[0]))

    # display the most common start hour
    print("The most popular start hour is {}.".format(df.s_hour.mode()[0]))

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station

    print("The most popular start station is {} .".format(df['Start Station'].mode()[0]))
    # display most commonly used end station

    print("The most popular end station is {} .".format(df['End Station'].mode()[0]))
    # display most frequent combination of start station and end station trip
    print("The most popular combination of start station and end station trip is {} . ".format((df['Start Station']+df['End Station']).mode()[0]))
    
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time ( Seconds)
    total_time = np.sum(df['Trip Duration'])
    print("The total time for traveling in {} is {} hours {} miniutes {} seconds.".format(city,total_time/3600,total_time%3600/60,total_time%60))
    # display mean travel time
    mean_time=np.mean(df['Trip Duration'])
    print("The mean time for trabeling in {} is {} hours {} miniutes {} seconds.".format(city, mean_time/3600,mean_time%3600/60,mean_time%60)
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Display counts of user types
    user_types = df['User Type'].value_counts()
    print("User Type")
    print(user_types)
    # Display counts of gender
    gender_types = df['Gender'].value_counts()
    print("Gender")
    print(gender_types)
    # Display earliest, most recent, and most common year of birth
    oldest=df['Birth Year'].min()
    youngest = df['Birth Year'].max()
    common_year=df['Birth Year'].mode()[0]
    print("Oldest Customer was borned in {}".format(oldest))
    print("Youngeset customer was borned in {}".format(youngest))
    print("Common year of that customers were borned is {}".format(comomon_year))
          
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
	main()
