# Case Study-1: How Does A Bike-Share Navigate Speedy Success?

## Introduction:
#### Google Data Analytics Capstone project - Case Study 1.
This case study focuses on Cyclistic, a bike-share company in Chicago.

### Ask

##### Background context on the case study:

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
##### Characters and teams:

● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaignand initiatives to promote the bike-share program. These may include email, social media, and other channels.

● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and
reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy
learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic
achieve them.

● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the
recommended marketing program.

#### Guiding questions
1. What is the problem you are trying to solve?
Our primary goal is to analyse the profiles of annual members and casual riders and use that information to develop marketing strategies to help casual riders become annual members. 
2. How can your insights drive business decisions?
This information can help the marketing team increase the number of annual members.

Key tasks
1. Identify the business task - Completed
2. Consider key stakeholders - Completed

##### Deliverable
• A clear statement of the business task
Find the difference between annual members and casual riders, and identify the marketing strategy to use to increase annual members.

### Prepare

The Cyclistic’s historical trip data is given [here](https://divvy-tripdata.s3.amazonaws.com/index.html) analyze and identify the trends. Download the previous 12 months of Cyclistic trip data

#### Guiding questions

1. Where is your data located?
The data is stored in my local machine as a dataset named "casestudy".

2. How is the data organized?
The datasets contain trip details from January to December 2021, and I later combined all of the csv files into one cvs file. 

3. Are there issues with bias or credibility in this data? Does your data ROCCC?
The data is collected from a first-party source that is the company's own data storage, so there is a low chance of bias, but because it is the company's own data, the credibility is very high. The data also does ROCCC as it is reliable, original, comprehensive, current, and cited.

4. How are you addressing licensing, privacy, security, and accessibility?
The data is open source, anyone can access it, and the company provides it, but it is also covered by the license. and the data does not include any personal details of the riders to protect their privacy.

5. How did you verify the data’s integrity?
In the analysis of the data, it was found that the data types and the columns (amount and names) were all consistent.

6. How does it help you answer your question?
after thoroughly reviewing data from annual members and casual riders to determine if there are any characteristics regarding the rides, bike usage, and needs

7. Are there any problems with the data?
More information that can be present regarding the units of measure, stations, and riders would add to the data’s value.

##### 
Key tasks
1. Download data and store it appropriately. - Completed
2. Identify how it’s organized. - Completed
3. Sort and filter the data. - Completed
4. Determine the credibility of the data. - Completed

##### Deliverable

• A description of all data sources used
The data source consists of 10 CSV files. Each month starting with April is an individual file. The period starts in January 2021 and runs until December 2021.

## Process

##### Guiding questions
1. What tools are you choosing and why?
2. Have you ensured your data’s integrity?
3. What steps have you taken to ensure that your data is clean?
4. How can you verify that your data is clean and ready to analyze?
5. Have you documented your cleaning process so you can review and share those results?

##### Key tasks
1. Check the data for errors.
2. Choose your tools.
3. Transform the data so you can work with it effectively.
4. Document the cleaning process.

*Deliverable*

Documentation of any cleaning or manipulation of data

In [1]:
library(tidyverse)
library(janitor)
library(lubridate)
library(ggplot2)

── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 0.3.5 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Attaching package: 'janitor'


The following objects are masked from 'package:stats':

    chisq.test, fisher.test


Loading required package: timechange


Attaching package: 'lubridate'


The following objects are masked from 'package:base':

    date, intersect, setdi

In [2]:
df <- read_csv('caseStudy.csv')

"[1m[22mOne or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat)"
[1mRows: [22m[34m5595074[39m [1mColumns: [22m[34m13[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
[32mdbl[39m  (4): start_lat, start_lng, end_lat, end_lng
[34mdttm[39m (2): started_at, ended_at

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [8]:
# head(df)
dim(df)

In [4]:
str(df)

spc_tbl_ [5,595,074 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : chr [1:5595074] "E19E6F1B8D4C42ED" "DC88F20C2C55F27F" "EC45C94683FE3F27" "4FA453A75AE377DB" ...
 $ rideable_type     : chr [1:5595074] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
 $ started_at        : POSIXct[1:5595074], format: "2021-01-23 16:14:19" "2021-01-27 18:43:08" ...
 $ ended_at          : POSIXct[1:5595074], format: "2021-01-23 16:24:44" "2021-01-27 18:47:12" ...
 $ start_station_name: chr [1:5595074] "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" ...
 $ start_station_id  : chr [1:5595074] "17660" "17660" "17660" "17660" ...
 $ end_station_name  : chr [1:5595074] NA NA NA NA ...
 $ end_station_id    : chr [1:5595074] NA NA NA NA ...
 $ start_lat         : num [1:5595074] 41.9 41.9 41.9 41.9 41.9 ...
 $ start_lng         : num [1:5595074] -87.7 -87.7 -87.7 -87.7 -87.7 ...
 $ end_lat           : nu