# About Company
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of **5,824** bicycles that
are geotracked and locked into a network of **692** stations across Chicago. The bikes can be unlocked from one station and
returned to any other station in the system anytime.<br />
  **Cyclistic** classifies it users into two categories:<br />
  1. **Casual Riders**: Customer who purchase single ride or full-day passes.<br />
  2. **Members**: Customer who purchase **annual membership**. <br />

# Stakholders in the Scenario   
**Lily Moreno**: The director of marketing and your manager. Moreno is responsible for the development of campaigns
and initiatives to promote the bike-share program. These may include email, social media, and other channels.<br/>
**Cyclistic marketing analytics team**: A team of data analysts who are responsible for collecting, analyzing, and
reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy
learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic
achieve them.<br/>
**Cyclistic executive team**: The notoriously detail-oriented executive team will decide whether to approve the
recommended marketing program.<br/>

# Busnies task
**Three questions will guide the future marketing program:**<br/>
1. How do annual members and casual riders use Cyclistic bikes differently?<br/>
2. Why would casual riders buy Cyclistic annual memberships? <br/>
3. How can Cyclistic use digital media to influence casual riders to become members? <br/>
**this analysis aims to answer the first question**:How do annual members and casual riders use Cyclistic bikes
differently? <br/>

# Loading packages.

In [None]:
library(tidyverse)
library(lubridate)
library(tidyr)
library(dplyr)

# Importing the dataset<br/>
For this project I will use Cyclistic historical data for 2022 trips
[data](https://divvy-tripdata.s3.amazonaws.com/index.html)under this [license](https://www.divvybikes.com/data-license-agreement) 

In [None]:
tripdata_202201 <- read_csv("202201-divvy-tripdata.csv")
tripdata_202202 <- read_csv("202202-divvy-tripdata.csv")
tripdata_202203 <- read_csv("202203-divvy-tripdata.csv")
tripdata_202204 <- read_csv("202204-divvy-tripdata.csv")
tripdata_202205 <- read_csv("202205-divvy-tripdata.csv")
tripdata_202206 <- read_csv("202206-divvy-tripdata.csv")
tripdata_202207 <- read_csv("202207-divvy-tripdata.csv")
tripdata_202208 <- read_csv("202208-divvy-tripdata.csv")
tripdata_202209 <- read_csv("202209-divvy-tripdata.csv")
tripdata_202210 <- read_csv("202210-divvy-tripdata.csv")
tripdata_202211 <- read_csv("202211-divvy-tripdata.csv")
tripdata_202212 <- read_csv("202212-divvy-tripdata.csv")

# Combining all the dataframes in one dataframe

In [None]:
total_tripdata_2022 <- rbind(tripdata_202201,tripdata_202202,
           tripdata_202203,tripdata_202204,
           tripdata_202205,tripdata_202206,
           tripdata_202207,tripdata_202208,
           tripdata_202209,tripdata_202210,
           tripdata_202211,tripdata_202212)

# Showing the data

In [None]:
summary(total_tripdata_2022)

# Data prepration
## Showing the data limtitaion

In [None]:
nrow(total_tripdata_2022)
sum(is.na(total_tripdata_2022))

There is a lot of missing values So I'm going to explore the data sets columns to find which columns has those missing values after some digging I have those columns have the missing values.

In [None]:
sum(is.na(total_tripdata_2022$start_station_name))
sum(is.na(total_tripdata_2022$start_station_id))
sum(is.na(total_tripdata_2022$end_station_name))
sum(is.na(total_tripdata_2022$end_station_id))
sum(is.na(total_tripdata_2022$end_lat))
sum(is.na(total_tripdata_2022$end_lng))

# Data Cleaning
I decided to exclude the rows with missing values from my analysis

In [None]:
total_tripdata_2022 <- drop_na(total_tripdata_2022)
nrow(total_tripdata_2022)

Also I have decided to add more columns to my data frame to help me in my analysis and those columns are Ride_length, month, day and hour.

In [None]:
total_tripdata_2022$ride_length <-difftime(total_tripdata_2022$ended_at,total_tripdata_2022$started_at)
total_tripdata_2022$month <-format(as.Date(total_tripdata_2022$started_at),"%b")
total_tripdata_2022$day_of_week <- weekdays(total_tripdata_2022$started_at)
total_tripdata_2022$hours <-format(as.POSIXct(total_tripdata_2022$started_at),format= "%H")

Putting the day of the week and the month in the right order

In [None]:
total_tripdata_2022$day_of_week <- ordered(total_tripdata_2022$day_of_week,
                                           levels=c("Sunday", "Monday", "Tuesday", 
                                                     "Wednesday", "Thursday", "Friday", "Saturday"))
total_tripdata_2022$month <- ordered(total_tripdata_2022$month,
                                           levels=c("Jan","Feb","Mar",
                                                     "Apr","May","Jun",
                                                     "Jul","Aug","Sep",
                                                     "Oct","Nov","Dec"))

The dataframe includes a few hundred entries when bikes were taken out of docks and checked for quality by Divvy or ride_length was negative
So we will going toe create a new dataframe that exclude those values.

In [None]:
tripdata_2022 <-  total_tripdata_2022[!(total_tripdata_2022$start_station_name == "HQ QR" | total_tripdata_2022$ride_length<0),]

# Analysis
**Descriptive analysis on the ride length**

In [None]:
mean(tripdata_2022$ride_length) # the mean of the ride
median(tripdata_2022$ride_length) # the median of the rides
max(tripdata_2022$ride_length) # maximum ride length
min(tripdata_2022$ride_length) # minimum ride length

**Compare members and casual users**

In [None]:
aggregate(tripdata_2022$ride_length ~ tripdata_2022$member_casual, FUN = mean)
aggregate(tripdata_2022$ride_length ~ tripdata_2022$member_casual, FUN = median)
aggregate(tripdata_2022$ride_length ~ tripdata_2022$member_casual, FUN = max)
aggregate(tripdata_2022$ride_length ~ tripdata_2022$member_casual, FUN = min)

<br/>
From the above calculations I found that the casual rides spend in average 51.8% more time than annual members.
<br/>

**See the average ride time by each day for members vs casual users**

In [None]:
aggregate(tripdata_2022$ride_length ~ tripdata_2022$member_casual + tripdata_2022$day_of_week, FUN = mean)

**Analyze ridership data by type and weekday**

In [None]:
tripdata_2022 %>% 
  mutate(weekday = wday(started_at,label = TRUE)) %>% 
    group_by(member_casual,weekday) %>% 
  summarize(number_of_rides = n(),
            average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)

**Showing the maximum ride length at every hour in the day for each user category**

In [None]:
aggregate(tripdata_2022$ride_length ~ tripdata_2022$member_casual + tripdata_2022$hours, FUN = max)

# Visualizations
**Visualize the relation between customers type and readable type using ggplot**
**to see each customer bike preferences and also to know which ride type is used more often** 

In [None]:
tripdata_2022 %>% 
  group_by(member_casual,rideable_type) %>% 
  summarize(number_of_rides=n()) %>% 
  ggplot(aes(x=member_casual,y=number_of_rides,fill=rideable_type))+
  geom_col(position = "dodge")+ggtitle("Member type vs rideable_type")

<br/>**Visualizing the number of rides each customer type made every month**

In [None]:
tripdata_2022 %>% 
  group_by(member_casual, month) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, month)  %>% 
  ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +ggtitle("The number of rides by rider type during each month")

<br/>**Visualizing the number of rides each customer type made every day**

In [None]:
tripdata_2022 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")+ggtitle("The number of rides by rider type during each day")

<br/>**Visualizing the number of rides each customer type made at every hour of the day**

In [None]:
tripdata_2022 %>%   
  group_by(member_casual,hours) %>% 
  summarise(number_of_rides = n(),
            average_duration = mean(ride_length)) %>% 
  ggplot(aes(x = hours, y = number_of_rides, fill = member_casual))+
  geom_col(position = "dodge")+ggtitle("The number of rides by rider type during the day")

<br/>**Visualizing the average ride length of the customers types each month,day and hour**

In [None]:
tripdata_2022 %>% 
  group_by(member_casual, month) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, month)  %>% 
  ggplot(aes(x = month, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge") +ggtitle("Average rides length by rider type during each month")

In [None]:
tripdata_2022 %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, day_of_week)  %>% 
  ggplot(aes(x = day_of_week, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge") +ggtitle("Average rides length by rider type during each day")

In [None]:
tripdata_2022 %>%   
  group_by(member_casual,hours) %>% 
  summarise(number_of_rides = n(),
            average_duration = mean(ride_length)) %>% 
  ggplot(aes(x = hours, y = average_duration, fill = member_casual))+
  geom_col(position = "dodge")+ggtitle("Average rides length by rider type during the day")

# Conclusion 
1. The **docked_bikes** is used only by the Casual users. <br/>
2. The number of trips from both users increases at Chicago warm season from (**May** to **September**) <br/>
3. The number of trips made by the annual members is more on weekdays while the casual users
made more trips on weekends.<br/>
4. Casual riders spend 51.8% more time each ride than annual members.<br/>
5. The rush hours of the riders during the morning is from 7 am to 9 am and peaks at 8 am.
<br/>
6. The rush hours of the riders during the noon is from 5 pm to 7pm and peaks at 6pm.<br/>

# Recommendations
1. The Chicago warm season should be the focus of  the marketing strategy, since it's the high season for the bike riders <br/>
2. Put more ads in the docked_bikes area for the casual members.<br/>
3. Highlight in the marketing campaign that the casual users would spend 51.8% less time at each ride if they subscribed to the annual program.<br/>
4. Send motivational messages at the rush hours to the annual members to encourage them to ride the bike.<br/>
5. We may build reward system for the riders it should be based on rides streak if the rider used the bike everyday for some period of time like a month he will get a discount on the next annual subscription. That would encourage the annual members to use our service more regularly.<br/>

# Considerations 
**More data point should be collected to enhance the scope of the further analysis**:<br/>
**More Information about the users**:(Gender,Age and Physical health; to better understand our targeted demographic group.)<br/>