# Table of Contents

* [Introduction](#Introduction)
* [Ask](#Ask)
    * [Guiding questions](#ask-guiding)
    * [Key tasks](#ask-key)
    * [Deliverable](#Deliverable)
* [Prepare](#Prepare)
    * [Guiding questions](#prepare-guiding)
    * [Key tasks](#prepare-key)
    * [Deliverable](#prepare-deliverable)
* [Process](#process)
    * [Loading data](#process-code-loading)
        * [Data cleaning](#process-code-datacleaning)
        * [Manipulating the data](#process-code-manipulating)
    * [Guiding questions](#process-guiding)
    * [Key tasks](#process-key)
    * [Deliverable](#process-deliverable)
* [Analyze](#analyze)
     * [Type Of Ride](#analyze-ridetype)
     * [Year](#analyze-year)
     * [Month](#analyze-month)
     * [Week](#analyze-week)
     * [Day](#analyze-day)
     * [Casual vs Members](#analyze-casvmem)
    * [Guiding questions](#analyze-guiding)
    * [Key tasks](#analyze-key)
    * [Deliverable](#analyze-deliverable)
* [Share](#share)
    * [Guiding questions](#share-guiding)
    * [Key tasks](#share-key)
    * [Deliverable](#share-deliverable)
* [Act](#act)
    * [Guiding questions](#act-guiding)
    * [Key tasks](#act-key)
    * [Deliverable](#act-deliverable)
* [Conclusion](#conclusion)


## Introduction

Welcome to my version of the Google Data Analytics Capstone Project - Case Study 1.

You can find the complete information about the Capstone Project from [Google Data Analytics Capstone: Complete a Case Study](https://www.coursera.org/learn/google-data-analytics-capstone).

In order to answer the key business questions, I followed the steps of the data analysis process : 
* Ask
* Prepare
* Process
* Analyze
* Share
* Act

<a id="Ask"></a>
## The Ask Phase

For the ask step, first let's get some context from the cyclistic document:

    Scenario
    
    You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes dierently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
    
    Characters and teams
    Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also oering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
    Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
    Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
    Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.
    


<a id="ask-guiding"></a>
#### Guiding questions

* **What is the problem you are trying to solve?**

The differences in how annual members and casual riders use Cyclistic bikes and the best strategies for converting casual riders into members?

* **How can your insights drive business decisions?**

The insights will help maximize the annual memberships by converting casual riders into annual members.

<a id="ask-key"></a>
#### Key Tasks
1. Identify the business task.
2. Consider key stakeholders.

#### Deliverable
* A clear statement of the business task

Increasing the number of annual memberships by developing a new marketing approach to convert casual riders into annual members based on an analysis of how annual members and casual riders utilize cyclistic bicycles differently and also how digital media could affect the marketing tactics.

<a id="Prepare"></a>
## The Prepare Phase

The project will use the [data](https://divvy-tripdata.s3.amazonaws.com/index.html) provided by Google.

<a id="prepare-guiding"></a>
#### Guiding questions

* **Where is your data located?** 

The [data](https://divvy-tripdata.s3.amazonaws.com/index.html ) is provided by Google.

* **How is the data organized?** 

The data is separated into months for each year and saved in csv format.

* **Are there issues with bias or credibility in this data? Does your data ROCCC?** 

There are no issues with bias or credibility, and the data meets ROCCC requirements (Reliable Original Comprehensive Current Cited).

* **How are you addressing licensing, privacy, security, and accessibility?** 

Through [license agreement](https://www.divvybikes.com/data-license-agreement) which has made the data public and use at your own risk.

* **How did you verify the data’s integrity?**

All the files have consistent columns and each column has the correct data type and through the [license agreement](https://www.divvybikes.com/data-license-agreement) to ensure the reliability of the data and the source of the data. 

* **How does it help you answer your question?** 

Provides Cyclistic’s historical trip data to analyze and identify trends.

* **Are there any problems with the data?**

In order to increase the conversion of casual riders into annual members, the focus is less on the rider and more on the company. Much more information on riders could be provided in order to properly comprehend their thought processes. Apart from that there is no problem with the data.

<a id="prepare-key"></a>
#### Key Tasks

1. Download data and store it appropriately.  
2. Identify how it’s organized.  
3. Sort and filter the data.  
4. Determine the credibility of the data

* <a id="prepare-deliverable"></a>
#### Deliverable
* A description of all data sources used

Data of 12 months (July 2020 - Jun 2021) is downloaded from a reliable open source with license agreement provided to use data freely and is stored in csv format which is organized into separate months for each year.

<a id="process"></a>
## The Process Phase

This step will prepare the data for analysis. All the csv files will be merged into one file to improve workflow

In [None]:
# First load all the neccessary libraries.

library("tidyverse")
library("ggplot2")
library("lubridate")
library("geosphere")
library("gridExtra") 
library("ggmap")

In [None]:
# Loading the data into their respective variables

tripdata_2020_07 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202007-divvy-tripdata.csv")
tripdata_2020_08 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202008-divvy-tripdata.csv")
tripdata_2020_09 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202009-divvy-tripdata.csv")
tripdata_2020_10 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202010-divvy-tripdata.csv")
tripdata_2020_11 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202011-divvy-tripdata.csv")
tripdata_2020_12 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202012-divvy-tripdata.csv")
tripdata_2021_01 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202101-divvy-tripdata.csv")
tripdata_2021_02 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202102-divvy-tripdata.csv")
tripdata_2021_03 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202103-divvy-tripdata.csv")
tripdata_2021_04 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202104-divvy-tripdata.csv")
tripdata_2021_05 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202105-divvy-tripdata.csv")
tripdata_2021_06 <- read.csv("../input/cyclistic-bike-share-data-202007-202106/202106-divvy-tripdata.csv")

In [None]:
# Create a simple function to make it simpler to change figure sizes over the notebook.
fig <- function(width, heigth){
     options(repr.plot.width = width, repr.plot.height = heigth)
}
# Setting the plot size
fig(16,8)

<a id="process-code-datacleaning"></a>
#### Data Cleaning

 Note : 

While exploring through the data after importing, I found that the 2020 datset till the month of november (including november) had start_station_id and end_station_id in integer format whereas for the year 2021 till june (including june) and the december month of 2020 had those two columns stored in character format and it caused an error while merging the datasets together.

In [None]:
#Then join all the data from 2020 and change the type of two columns (start_station_id,end_station_id) to match the 2021 data:
tripdata_int <- bind_rows(tripdata_2020_07,tripdata_2020_08,tripdata_2020_09,tripdata_2020_10,
                          tripdata_2020_11)

tripdata_int <- mutate(tripdata_int, start_station_id = as.character(start_station_id),
                        end_station_id = as.character(end_station_id))

tripdata_char <- bind_rows(tripdata_2020_12,tripdata_2021_01,tripdata_2021_02,
                           tripdata_2021_03,tripdata_2021_04,tripdata_2021_05,tripdata_2021_06)

#Then join all the data into one variable
final_tripdata <- bind_rows(tripdata_int, tripdata_char)

In [None]:
# Lets check that data:
print("Glimpse Of The Dataset")
glimpse(final_tripdata)

print("Summary Of The Dataset")
summary(final_tripdata)

**Dropping NA Values**

Note :

One method to deal with empty values is to delete them when dealing with a large dataset and the maximum number of rows that can be deleted should be 5% of the total number of rows in the dataset.

The maximum number of rows that can be deleted from this dataset would be 5% of 2,861,693 which is around 143,085. Since the number of NA values is less than the 5% we can go on and delete/drop these values from the dataset.

In [None]:
# Dropping the NA values
final_tripdata <- drop_na(final_tripdata)

# Viewing the dataset after dropping the NA values
summary(final_tripdata)

**Removing Duplicates**

In [None]:
# Checking for duplicate rows
summary(distinct(final_tripdata))

Note :

There are no duplicate rows present in the dataset.

**Parsing DateTime Columns**

In [None]:
final_tripdata$started_at <- as.POSIXct(final_tripdata$started_at, "%Y-%m-%d %H:%M:%S")
final_tripdata$ended_at   <- as.POSIXct(final_tripdata$ended_at, "%Y-%m-%d %H:%M:%S")

<a id="process-code-manipulating"></a>
#### Manipulating The Data

New columns will help improve calculation time in the future.

In [None]:
# Viewing the dataset
head(final_tripdata,2)

# Adding the column ride_length_m which stores the total time of a bike ride in minutes
final_tripdata <- final_tripdata %>%
    mutate(ride_length_m = as.numeric(final_tripdata$ended_at - final_tripdata$started_at) / 60)
summary(final_tripdata$ride_length_m)

# Adding the columns date,month,day,year and day of the week into the dataset
final_tripdata$date <- as.Date(final_tripdata$started_at) #The default format is yyyy-mm-dd
final_tripdata$month <- format(as.Date(final_tripdata$date), "%m")
final_tripdata$day <- format(as.Date(final_tripdata$date), "%d")
final_tripdata$year <- format(as.Date(final_tripdata$date), "%Y")
final_tripdata$day_of_week <- format(as.Date(final_tripdata$date), "%A")
final_tripdata$start_hour <- format(final_tripdata$started_at, format = "%H")



**Remove "bad" data**
​
The dataframe includes a few thousand entries where the ride_length is negative.
​
We will create a new version of the dataframe (v2) since data is being removed.

In [None]:
# To check the number of positive and negative ride lengths
table(final_tripdata$ride_length_m < 0)

# Creating final_tripdatav2 as the new version if the dataframe
# We can delete the zero ride duration enteries as it wouldn't be useful for the analysis.
final_tripdatav2 <- final_tripdata %>%
                      filter(ride_length_m > 0)



<a id="process-guiding"></a>
#### Guiding questions
* **What tools are you choosing and why?**

For this project, I'm choosing R for two reasons: the enormous dataset and to get expertise with the language.

* **Have you ensured your data’s integrity?**

By ensuring that all the files have consistent columns and each column has the correct data type.

* **What steps have you taken to ensure that your data is clean?**

Checking and dropping duplicates and null values, as well as ensuring that each column has the correct data type and adding a few extra columns to make the analysis process easier later on.

* **How can you verify that your data is clean and ready to analyze?**

The actions taken and recorded in this notebook can verify to the above.

* **Have you documented your cleaning process so you can review and share those results?**

Yes, and it's documented in this notebook.

<a id="process-key"></a>
#### Key tasks
1. Check the data for errors.
2. Choose your tools.
3. Transform the data so you can work with it effectively.
4. Document the cleaning process.

[](http://)<a id="process-deliverable"></a>
#### Deliverable
* Documentation of any cleaning or manipulation of data.

[](http://)<a id="analyze"></a>
## The Analyze Phase


> <a id="analyze-ridetype"></a>
#### Rideable Type

In [None]:
# Percentage of the total number of ride entries accounted for each type of ride
final_tripdatav2 %>%
    group_by(rideable_type) %>%
    summarize(count = n(),percentage = ( (count / nrow(final_tripdatav2)) *100)) %>%
    arrange(count)


<a id="analyze-year"></a>
#### Yearly Analysis

In [None]:
# Analyzing the total number of rides for each year
final_tripdatav2 %>%
   group_by(year) %>%
   summarise(number_of_rides = n(), percentage = ( number_of_rides / nrow(final_tripdatav2)*100))

# Analyzing the total duration rides for each year
final_tripdatav2 %>%
   group_by(year) %>%
   summarise(duration = sum(ride_length_m))

final_tripdatav2 %>%
   group_by(year,rideable_type) %>%
   summarise(number_of_rides = n(), percentage = ( number_of_rides / nrow(final_tripdatav2)*100)) %>%
   arrange(year,number_of_rides)

final_tripdatav2 %>%
   group_by(member_casual,rideable_type) %>%
   summarise(number_of_rides = n(), percentage = ( number_of_rides / nrow(final_tripdatav2)*100)) %>%
   arrange(member_casual,number_of_rides)

final_tripdatav2 %>%
     group_by(year,rideable_type) %>%
     summarise(number_of_rides = n()) %>%
     ggplot(aes(rideable_type,number_of_rides,fill = year)) +
     geom_col(position = 'dodge') + 
     labs(title = 'Distribution by ride type',x = 'Type of ride',y = 'Number of rides')


**Key Points**

The data has been taken equally (6 months from each year), but the second half of 2020 has more number of rides than the first half of the year 2021.
* Also the second half of 2020 had more ride duration than first half of 2021.
* Both members and casual riders tend to prefer docked bikes.
* Docked bikes accounted for almost nearly 50% of the rides.
* Docked bikes had more number of rides for the second half of the year 2020 compared to the first half of the year 2021, dropping by a whopping 41%.
* Except docked bikes rest of the bike types showed an increase in the number of riders from second half of 2020 to first half of 2021.
* Classic bike rides increased from 2020 to 2021 by a whopping 26.3%.

**<a id="analyze-month"></a>
#### Monthly Analysis

In [None]:
# Which month of the year has more number of riders
final_tripdatav2 %>%
     group_by(month) %>%
     summarise(number_of_rides = length(ride_id),
               '%' = number_of_rides / nrow(final_tripdatav2)*100,
               'Casual_p' = sum(member_casual == 'casual')/number_of_rides*100,
               'Member_p' = sum(member_casual == 'member')/number_of_rides*100,
               'Member Casual Per Diff' = abs(Member_p - Casual_p))

final_tripdatav2 %>%
     ggplot(aes(month,fill = member_casual)) +
     geom_bar() +
     labs(title = 'Distribution by month',x = 'Month',y = 'Number of rides')


**Key Points**
 
* Out of all the months, the month of june has the largest percentage of rides.
* In June the casual riders and annual members almost account for equal percentage but differ by just 1.5% of the rides.
* There is an uptrend in volume of rides beginning from March to June .
* There is an downtrend in volume of rides beginning from the month of June to Febuary.

<a id="analyze-week"></a>
#### Weekday Analysis

How much of the data is distributed by weekday?

<a id="analyze-week"></a>
#### Weekday Analysis

How much of the data is distributed by weekday?

In [None]:
final_tripdatav2 %>%
     ggplot(aes(day_of_week,fill = member_casual)) +
     geom_bar() +
     labs(title = 'Distribution by weekday',x = 'Day of the week',y = 'Number of rides')

**Key Points**

* Out of all the days, Saturday accounts for the largest percentage of rides.
* For Saturday, the number of casual riders lead the annual members by 7.9%.
* There is small uptrend in the volume of rides beggining from Monday which spikes and ends on Saturday.
* There is a downtrend in the volume of rides beggining from Saturday and ending on Monday.

<a id="analyze-day"></a>
#### Daily Analysis

In [None]:
# Which hour of the day has more number of riders
final_tripdatav2 %>%
     group_by(start_hour) %>%
     summarise(number_of_rides = length(ride_id),
               '%' = number_of_rides / nrow(final_tripdatav2)*100,
               'Casual_p' = sum(member_casual == 'casual')/number_of_rides*100,
               'Member_p' = sum(member_casual == 'member')/number_of_rides*100,
               'Member Casual Per Diff' = abs(Member_p - Casual_p))

final_tripdatav2 %>%
     ggplot(aes(start_hour,fill = member_casual)) +
     geom_bar() +
     labs(title = 'Distribution by hour of the day',x = 'Hour',y = 'Number of rides')

**Key Points**

* There is a big volume of riders from 9 am to 7 pm.
* 5 am indicates the uptrend of the volume of rides and 6pm indicates the downtrend of the volume.
* 9 am to 5 pm accounts for big volume of annual members and casual riders, but of volume of members is less.

In [None]:
final_tripdatav2 %>%
   ggplot(aes(start_hour,fill = member_casual)) +
   geom_bar() +
   labs(title = 'Distribution by hour of the day divided by weekday',x = 'Hour',y = 'Number of rides') +
   facet_wrap(~day_of_week)

**Key Points**

* For all the seven days of the week, 5 am indicates an uptrend in the volume of rides.
* For all the seven days of the week, 5 pm indicates an downtrend in the volume of rides.

In [None]:
final_tripdatav2 %>%
    mutate(type_of_weekday = ifelse(day_of_week == 'Saturday' | day_of_week == 'Sunday','weekend',
                                    'midweek')) %>%
    ggplot(aes(start_hour, fill = member_casual)) +
    labs(x="Hour of the day", title="Distribution by hour of the day in the midweek") +
    geom_bar() +
    facet_wrap(~type_of_weekday)

**Key Points**

* The weekends have a smooth flow of data points whereas the mid week have a steep flow of the data points.
* The volume of rides for weekends is less compared to weekdays.
* One of the explaination to this finding can be that during weekdays people have to follow their daily routines like go to work, get back from work or for any other travelling purposes.
* 5 am is the kick off point for the volume of rides for both midweek and weekend.
* 5 pm indicates an downtrend in the volume of rides for midweek.
* 3 pm indicates an downtrend in the volume of rides for weekend.
* In both categories casual riders contribute for more rides than the annual members.

<a id="analyze-casvmem"></a>
#### Casual Riders vs Annual Members

In [None]:
# Comparing the rides of casual riders and annual members
final_tripdatav2 %>%
   group_by(member_casual) %>%
   summarise(number_of_rides = n(), percentage = ( number_of_rides / nrow(final_tripdatav2)*100))

# For ride_length_m (all figures in minutes)
summary(final_tripdatav2$ride_length_m)

# Comparing members and casual users
aggregate(final_tripdatav2$ride_length_m ~ final_tripdatav2$member_casual,FUN = mean)
aggregate(final_tripdatav2$ride_length_m ~ final_tripdatav2$member_casual,FUN = median)
aggregate(final_tripdatav2$ride_length_m ~ final_tripdatav2$member_casual,FUN = max)
aggregate(final_tripdatav2$ride_length_m ~ final_tripdatav2$member_casual,FUN = min)

# Store the day of week in right order
final_tripdatav2$day_of_week <- ordered(final_tripdatav2$day_of_week,levels=c("Sunday","Monday",
                                        "Tuesday","Wednesday","Thursday","Friday","Saturday"))

# See the average ride time by each day for members vs casual users
aggregate(final_tripdatav2$ride_length_m ~ final_tripdatav2$member_casual + final_tripdatav2$day_of_week
          ,FUN = mean)

# Analyzing ridership data by type and weekday
final_tripdatav2 %>%
     mutate(weekday = wday(started_at,label = TRUE)) %>%
     group_by(member_casual,weekday) %>%
     summarise(number_of_rides = n(),
               average_duration = mean(ride_length_m)) %>%
     arrange(member_casual,weekday)
# Let's visualize the number of rides by rider type
final_tripdatav2 %>%
     mutate(weekday = wday(started_at,label = TRUE)) %>%
     group_by(member_casual,weekday) %>%
     summarise(number_of_rides = n(),
               average_duration = mean(ride_length_m)) %>%
     arrange(member_casual,weekday) %>%
     ggplot(aes(x = weekday,y = number_of_rides,fill = member_casual)) +
     geom_col(position = "dodge") +
     labs(title = 'Distribution of casual riders and annual members by weekday',
          x = 'Day of the week',y = 'Number of rides')

# Let's visualize the average duration
final_tripdatav2 %>%
     mutate(weekday = wday(started_at,label = TRUE)) %>%
     group_by(member_casual,weekday) %>%
     summarise(number_of_rides = n(),
               average_duration = mean(ride_length_m)) %>%
     arrange(member_casual,weekday) %>%
     ggplot(aes(x = weekday,y = average_duration,fill = member_casual)) +
     geom_col(position = "dodge") +
     labs(title = 'Distribution of casual riders and average ride duration by weekday',
          x = 'Day of the week',y = 'Average duration of the ride (in min)')



<a id="analyze-guiding"></a>
#### Guiding questions
* **How should you organize your data to perform analysis on it?**

The data has been organized into a single CSV concatenating all the files from the dataset.

* **Has your data been properly formatted?**

Yes, all the data has been properly formatted to perform all the neccessary statistical operations.


* **What surprises did you discover in the data?**

Few of the noteable discoveries are 
1. That maximum ride duration out of all riders is 55944.15 minutes by a casual rider which is about 39 days !!!!
2. That casual riders completely dominate the ride duration category for all seven days of the week !!! 
3. That maximum ride duration out of all riders for annual members is 33421.37 minutes by a casual rider which is about 23 days !!!!

* **What trends or relationships did you find in the data?**
 * On an average the ride duration for casual riders is much more than the annual members.
 * Despite having more number of riders than casual riders for almost all seven days except for saturday and sunday, annual members contribute more for ride duration for all seven days of the week than annual members.
 * There are more members than casuals in the dataset.
 * There are more data points in the second half of 2020.
 * There is a huge difference between the flow of members/casual from midweek to weekends.
 * Both members and casual riders tend to prefer docked bikes.

* **How will these insights help answer your business questions?**

This clearly shows that the current casual riders are dominating the annual members in certain statistical categories and converting them into annual members can boost the revenue of the company.

<a id="analyze-key"></a>
#### Key tasks
1. Aggregate your data so it’s useful and accessible.
2. Organize and format your data.
3. Perform calculations.
4. Identify trends and relationship.

<a id="analyze-deliverable"></a>
#### Deliverable
* A summary of your analysis.

<a id="share"></a>
## The Share Phase

The share phase is usually done by building a presentation. But for kaggle, the best representation of the analysis and conclusions is it's own notebook.

Let us look at our findings from the data so far to arrive on a conclusion.

So far we understood from the data that :
* Despite data being taken equally from each year, 2020 has 8.61% more rides than 2021.
* Also the second half of 2020 had more ride duration than first half of 2021.
* The riders' favorite mode of transportation is docked bikes as it accounted for a whopping 47.2% of the rides.
* Classic bike rides increased from 2020 to 2021 by a whopping 26.3%.
* There are more members than casuals in the dataset.
* Out of all the months, the month of june has the largest percentage of rides.
* Out of all the days, Saturday accounts for the largest percentage of rides.
* There is a big volume of riders from 9 am to 7 pm timeframe.
* The volume of rides for weekends is less compared to weekdays.

Now for how members differs from casuals :
* There are more members than casuals in the dataset.
* In month of June the casual riders and annual members almost account for equal percentage but differ by just 1.5% of the rides.
* For Saturdays, the number of casual riders lead the annual members by 7.9%.
* 9 am to 5 pm accounts for big volume of annual members and casual riders, but of volume of members is relatively lower.
* In both midweek and weekend categories casual riders contribute for more rides than the annual members.
* Casual riders completely dominate the ride duration category for all seven days of the week.
* On an average the ride duration for casual riders is much more than the annual members.


Concluding:

* Annual members have a predefined schedule and fixed routes which they follow in order to spend less time on travelling.
* Casual riders aren't satisfied with the current membership services which is why they hesitate converting to annual members.
* It can be assumed that majority of the riders use bikes for their respective jobs which is indicated by the huge increase in volume of rides from 9 am to 5 pm.
* The volume of rides varies according to the season.

<a id="share-guiding"></a>
#### Guiding questions
* **Were you able to answer the question of how annual members and casual riders use Cyclistic bikes differently?**

Yes, very interesting discoveries are made and data points to several differences between casuals and members.

* **What story does your data tell?**

Despite having a higher number of annual riders, our data shows that casual riders spend more time traveling, and there may be reasons why they do not convert to members, such as high subscription prices, fewer offers and services, and so on because converting to members can prove beneficial for them such as lowering the cost of rides using packages or deals offered by the company. The lower ride duration suggests that annual riders may have schedules fixed and have predefined routes they use for travelling daily.

* **How do your findings relate to your original question?**

The findings prompt us to consider the fundamental differences between casual and annual riders, as well as what types of bikes they ride and why they do so, in order to determine "How digital media could impact them."

* **Who is your audience? What is the best way to communicate with them?**

The main target audience is my cyclistic marketing analytics team and Lily Moreno. The best way to communicate is through a slide presentation of the findings.

* **Can data visualization help you share your findings?**

Yes, data visualization is at the heart of the findings.

* **Is your presentation accessible to your audience?**

Yes, the plots and labels were created utilizing bold and unique colors.

<a id="share-key"></a>
#### Key tasks
1. Determine the best way to share your findings.
2. Create effective data visualizations.
3. Present your findings.
4. Ensure your work is accessible.

<a id="share-deliverable"></a>
#### Deliverable
* Supporting visualizations and key findings.

<a id="act"></a>
## The Act Phase

The act phase would be done by the marketing team of the company. The main takeaway will be the top three recommendations for the marketing.

<a id="act-guiding"></a>
#### Guiding questions
* **What is your final conclusion based on your analysis?**

Annual and casual riders have different habits when using the bikes. The conclusion is further stated on the share phase.

* **How could your team and business apply your insights?**

Insights can be applied to the digital marketing campaign maximizing the conversion of casual riders to annual riders which in turn would boost the company's revenue.

* **What next steps would you or your stakeholders take based on your findings?**

My findings have given stakeolders a clear picture about the ride history of casual and annual riders which would enable them to research further on what is preventing casual riders from buying the annual membership.

* **Is there additional data you could use to expand on your findings?**

The data here lays more emphasis on boosting the company's revenue than understanding the customers(riders over here). Providing more useful details regarding riders can help in understanding the riders interests,tasks and difficulties faced which would enable the company to come up with solutions to help them leading to customer satisfaction which would in help compnay reach its goal of boosting the revenue. Also improved climate data can be useful.

<a id="act-key"></a>
#### Key tasks
1. Create your portfolio.
2. Add your case study.
3. Practice presenting your case study to a friend or family member.

<a id="act-deliverable"></a>
#### Deliverable
* Your top three recommendations based on your analysis.
 *  Develop a marketing strategy that demonstrates how bicycles are healthy modes of transportation not only for riders but also for the environment, while emphasizing the advantages of annual membership.
 *  To combat the low volume of rides during the winter months, offer discounts or coupons.
 *  Enabling free trips on weekends twice a month to reward customers, because "happy customers are loyal customers," as the phrase goes. As a result, the company's reputation would improve, resulting in more riders using their service, increased yearly membership conversion, and more income.

<a id="conclusion"></a>
## Conclusion

The Google Data Analytics Professional Certificate Program has provided me an understanding of what a data analyst does on a daily basis, and the Capstone Project allowed me to get a taste of it. This assignment required me to use the R programming language, which was challenging but also enjoyable because the language was new to me and allowed me to put my R skills to use.