## **About the Company**

---

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geo tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

<aside>
💡 **Cyclistic:** A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike.

</aside>

## Business Context

---

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

> **Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders**. Although the pricing flexibility helps Cyclistic attract more customers, Moreno, the director of marketing, believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members.
> 

## Business Task

---

Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends. Moreno has set a clear goal: ***Design marketing strategies aimed at converting casual riders into annual members***. In order to do that, however, the marketing analyst team needs to better understand:

1. How annual members and casual riders differ?
2. Why casual riders would buy a membership?
3. How digital media could affect their marketing tactics?

## Scenario

---

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. 

The key stakeholders in this project are:

- **Lily Moreno:** The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
- **Cyclistic marketing analytics team:** A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
- **Cyclistic executive team:** The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

## Available Data

---

You will use Cyclistic’s historical trip data to analyze and identify trends. This is public data that you can use to explore how different customer types are using Cyclistic bikes. The data has been made available by Motivate International Inc. under this **[license](https://ride.divvybikes.com/data-license-agreement),** you can download the data [**here**](https://divvy-tripdata.s3.amazonaws.com/index.html).

| Column | Description |
| --- | --- |
| ride_id | The identification number for ride from start station to end station in given time. |
| rideable_type | Cyclistic company bike type, there are 3 bike type classic bike, docked bike, and electric bike. |
| started_at | Date and time when the ride begin. |
| ended_at | Date and time when the ride end. |
| day | Day name when the ride begin. |
| start_station_name | The name of the station where the ride begin. |
| start_station_id | The id of start station. |
| end_station_name | The name of the station where the ride end. |
| end_station_id | The id of end station. |
| member_casual | The riders type in Cyclistic company. |
| start_lat | Latitude of the start station. |
| start_lng | Longitude of the start station. |
| end_lat | Latitude of the end station. |
| end_lng | Longitude of the end station. |

In [None]:
#Import library
import pandas as pd
import numpy as np
import altair as alt
print(alt.__version__)

alt.data_transformers.disable_max_rows()

In [None]:
df= pd.read_csv('data_clean.csv', parse_dates=['started_at','ended_at'])
df.sample(10)

1. How annual members and casual riders differ?

* Proportion of member and casual rider

In [None]:
df_count= df.groupby('member_casual')['ride_id'].count().to_frame().reset_index()
df_count['percentage']= round(df_count['ride_id']/df_count['ride_id'].sum()*100,2)
df_count

In [None]:
chart = alt.Chart(data=df_count)

# Menampilkan bar chart
base= chart.encode(
    x=alt.X('member_casual',axis=alt.Axis(labelAngle=0), sort=['member','casual']),
    y=alt.Y('percentage'),
    text=alt.Text('percentage', format='0.2f'),
    color=alt.Color('member_casual')
).properties(
    title=alt.Title(
        "23% Riders part of Casual Memberships",
        subtitle='Precentage of Memberships Type',
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=450, height=300,
)

base.mark_bar(color='tableau20')+ base.mark_text(align="center", baseline="middle",dy=-8)

In [None]:
#Rideable type based on membership
type= df.groupby(['member_casual','rideable_type']).agg(func={'ride_id':'count'}).reset_index()
total_rides = type.groupby('member_casual')['ride_id'].transform('sum')
type['percentage'] = round((type['ride_id'] / total_rides) * 100,2)
type.head()

In [None]:
chart = alt.Chart(data=type)

# Menampilkan bar chart
base= chart.encode(
    x=alt.X('member_casual',axis=alt.Axis(labelAngle=0), sort=['member','casual']),
    y=alt.Y('percentage'),
    text=alt.Text('percentage', format='0.2f'),
    color=alt.Color('rideable_type')
).properties(
    title=alt.Title(
        "61% Casual Riders Choose Electric Bike Type",
        subtitle='Precentage of Bike Type/Membership',
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=450, height=300,
)

base.mark_bar(color='tableau20')#+ base.mark_text(align="center", baseline="middle",)

In [11]:
color_scale = alt.Scale(
        domain=['member', 'casual'], 
        range=['#ff7f0e', '#5778a4'])

In [None]:
chart = alt.Chart(data=df[df['member_casual']=='member'])

# Menampilkan bar chart
base= chart.encode(
    x=alt.X('day_of_week',axis=alt.Axis(labelAngle=0), sort=['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']),
    y=alt.Y('count(ride_id)'),
    text=alt.Text('count(ride_id)', format='0.2f'),
    color=alt.Color('member_casual', scale=color_scale)
).properties(
    title=alt.Title(
        "More trips in weekday from member",
        subtitle='Count of trip per Day',
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=400, height=200,
)

base.mark_bar(color='tableau20')

In [None]:
a= df[df['member_casual']=='member'].groupby('day_of_week').agg(func={'ride_id':'count','Time_minutes':'median'})
a.reset_index()
a= a.iloc[[3,1,5,6,4,0,2],:]
a.reset_index(inplace=True)
a

In [None]:
chart = alt.Chart(data=df[df['member_casual']=='casual'])

# Menampilkan bar chart
base= chart.encode(
    x=alt.X('day_of_week',axis=alt.Axis(labelAngle=0), sort=['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']),
    y=alt.Y('count(ride_id)'),
    text=alt.Text('count(ride_id)', format='0.2f'),
    color=alt.Color('member_casual', scale=color_scale)
).properties(
    title=alt.Title(
        "More trips in Sunday from Casual Riders",
        subtitle='Count of trip per Day',
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=400, height=200,
)

base.mark_bar(color='tableau20')

In [None]:
source = df[df['member_casual']=='casual']

base = alt.Chart(source).encode(
    alt.X('time_hour', axis=alt.Axis(title=None))
).properties(width=400, height=200)

bar = base.mark_bar(color='#5778a4').encode(
    alt.Y('count(ride_id)',
          axis=alt.Axis(title='Count of Ride', titleColor='black')),
)

line = base.mark_line(stroke='red', interpolate='monotone').encode(
    alt.Y('median(Time_minutes)',
          axis=alt.Axis(title='Rent Time (minutes)', titleColor='black'))
)

alt.layer(bar, line).resolve_scale(
    y = 'independent'
)

In [None]:
source = df[df['member_casual']=='member']

base = alt.Chart(source).encode(
    alt.X('time_hour', axis=alt.Axis(title=None))
).properties(width=400, height=200)

bar = base.mark_bar(color='#ff7f0e').encode(
    alt.Y('count(ride_id)',
          axis=alt.Axis(title='Count of Ride', titleColor='black')),
)

line = base.mark_line(stroke='red', interpolate='monotone').encode(
    alt.Y('median(Time_minutes)',
          axis=alt.Axis(title='Rent Time (minutes)', titleColor='black'))
)

alt.layer(bar, line).resolve_scale(
    y = 'independent'
)

In [None]:
start_station_member= df[df['member_casual']=='member'].groupby('start_station').agg(func={'ride_id':'count'}).nlargest(10,'ride_id').reset_index()
start_station_member

In [None]:
source = start_station_member

# Menampilkan bar chart
alt.Chart(source).mark_bar(color='#ff7f0e').encode(
    x=alt.X('ride_id', title='Count of Ride').axis(format='0.2f'),
    y=alt.Y('start_station', title=None, sort='-x'),
).properties(
    title=alt.Title(
        "Top 10 Start Station for Member Users ",
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=400, height=200,
)

In [None]:
stop_station_member= df[df['member_casual']=='member'].groupby('stop_station').agg(func={'ride_id':'count'}).nlargest(10,'ride_id').reset_index()
stop_station_member

In [None]:
source = stop_station_member

# Menampilkan bar chart
alt.Chart(source).mark_bar(color='#ff7f0e').encode(
    x=alt.X('ride_id', title='Count of Ride').axis(format='0.2f'),
    y=alt.Y('stop_station', title=None, sort='-x'),
).properties(
    title=alt.Title(
        "55% of Patients Indicated Obesity and Overweight",
        subtitle='Patient Percentage (%) by BMI Category',
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=400, height=200,
)

In [None]:
start_station_casual= df[df['member_casual']=='casual'].groupby('start_station').agg(func={'ride_id':'count'}).nlargest(10,'ride_id').reset_index()
start_station_casual

In [None]:
source = start_station_casual

# Menampilkan bar chart
alt.Chart(source).mark_bar(color='#5778a4').encode(
    x=alt.X('ride_id', title='Count of Ride').axis(format='0.2f'),
    y=alt.Y('start_station', title=None, sort='-x'),
).properties(
    title=alt.Title(
        "Top 10 Start Station for Casual Users",
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=400, height=200,
)

In [None]:
stop_station_casual= df[df['member_casual']=='casual'].groupby('stop_station').agg(func={'ride_id':'count'}).nlargest(10,'ride_id').reset_index()
stop_station_casual

In [None]:
source = stop_station_casual

# Menampilkan bar chart
alt.Chart(source).mark_bar(color='#5778a4').encode(
    x=alt.X('ride_id', title='Count of Ride').axis(format='0.2f'),
    y=alt.Y('stop_station', title=None, sort='-x'),
).properties(
    title=alt.Title(
        "55% of Patients Indicated Obesity and Overweight",
        subtitle='Patient Percentage (%) by BMI Category',
        anchor='start',
        font='Calibri',
        fontSize=18,
        offset=20,
    ),
    width=400, height=200,
)

In [None]:
df.head()

In [None]:
station_member= df.loc[df['member_casual']=='member'].groupby('stop_station').agg(func={'ride_id':'count'}).nlargest(10,'ride_id').reset_index()
station_member

In [28]:
station= pd.read_csv('station_database.csv')

In [None]:
station

In [None]:
station_member= pd.merge(start_station_member,station, how='left', left_on='start_station', right_on='station_name')
station_member

In [None]:
station_casual= pd.merge(start_station_casual,station, how='left', left_on='start_station', right_on='station_name')
station_casual

In [34]:
import folium 

In [None]:
chicago_map = folium.Map(location=[41.881832, -87.623177], zoom_start=15)
# buat marker rumah termahal
for i in range(0, len(station_member)):
    folium.Marker(
        location= [station_member.iloc[i]['lat'], station_member.iloc[i]['lng']],
        popup=station_member.iloc[i]['station_name'],
        icon= folium.Icon(
                color='orange', 
                icon='info-sign')).add_to(chicago_map)

for i in range(0, len(station_casual)):
    folium.Marker(
        location= [station_casual.iloc[i]['lat'], station_casual.iloc[i]['lng']],
        popup=station_casual.iloc[i]['station_name'],
        icon= folium.Icon(
                color='blue', 
                icon='info-sign')).add_to(chicago_map)
chicago_map