# Cyclistic Bike Sharing Analysis
## Study Case - How does a Bike-Share company make quick success possible?

### Introduction
This notebook is a study case to **analysis bike sharing** in a fictional company called _Cyclistic_. The company has two types of clients: **casual** and **member**. The company has three types of plans: **single ride**, **all day ride** and **anual member**.

The **objective** is to understand how this company actually works and how it can be improved. The **goal** is to find out the most important factors that affect the success of a campaign and to answer the following questions:

### Ask
- Three questions will guide the future marketing program:
1. **How do annual members and casual riders use Cyclistic bikes differently?**
2. **Why would casual riders buy Cyclistic annual memberships?**
3. **How can Cyclistic use digital media to influence casual riders to become members?**

The final result is a report that contains the following information:
1. **A clear statement of the business task**    
2. **A description of all data sources used**
3. **A documentation of any cleaning or manipulation of data**
4. **A summary of my analysis**
5. **How i justified visualizations and my key findings**
6. **My three main recommendations based on my analysis**

### Prepare
I will use Cyclistic’s historical trip data to analyze and identify trends. The Cyclistic trip data start on 2013.

> 💡: I don't need old data, because doesn't represent the actual state of the company.

To solve this, i will use the data from 2022 to 2023. The data is available in this link: https://divvy-tripdata.s3.amazonaws.com/index.html

(Note: The data has been made available by Motivate International Inc. under this license)

In [1]:
# I choose to use polars instead of pandas because it is faster and more efficient. So, i will use this library to read the csv file and to do the data analysis.
%pip install polars

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
import polars as pl

In [3]:
%ls ./datasets/

[0m[01;34m2022[0m/                            202210-divvy-tripdata.csv
202201-divvy-tripdata.csv        202211-divvy-tripdata.csv
202202-divvy-tripdata.csv        202212-divvy-tripdata.csv
202203-divvy-tripdata.csv        [01;34m2023[0m/
202204-divvy-tripdata.csv        202301-divvy-tripdata.csv
202205-divvy-tripdata.csv        202302-divvy-tripdata.csv
202206-divvy-tripdata.csv        202303-divvy-tripdata.csv
202207-divvy-tripdata.csv        202304-divvy-tripdata.csv
202208-divvy-tripdata.csv        202305-divvy-tripdata.csv
202209-divvy-publictripdata.csv  202306-divvy-tripdata.csv


In [4]:
%ls ./datasets/2022

[0m[01;31m202201-divvy-tripdata.zip[0m  [01;31m202205-divvy-tripdata.zip[0m  [01;31m202209-divvy-tripdata.zip[0m
[01;31m202202-divvy-tripdata.zip[0m  [01;31m202206-divvy-tripdata.zip[0m  [01;31m202210-divvy-tripdata.zip[0m
[01;31m202203-divvy-tripdata.zip[0m  [01;31m202207-divvy-tripdata.zip[0m  [01;31m202211-divvy-tripdata.zip[0m
[01;31m202204-divvy-tripdata.zip[0m  [01;31m202208-divvy-tripdata.zip[0m  [01;31m202212-divvy-tripdata.zip[0m


In [5]:
%ls ./datasets/2023

[0m[01;31m202301-divvy-tripdata.zip[0m  [01;31m202303-divvy-tripdata.zip[0m  [01;31m202305-divvy-tripdata.zip[0m
[01;31m202302-divvy-tripdata.zip[0m  [01;31m202304-divvy-tripdata.zip[0m  [01;31m202306-divvy-tripdata.zip[0m


Now i get all the data that i will need, i have to extract the zip files, clean and transform it to be able to analyze it.

In [6]:
from os import listdir
import zipfile

has_to_download_the_files = False

if has_to_download_the_files:
    # Get all zip files
    my_path = "./datasets/"
    onlyfiles = [f for f in (listdir(f"{my_path}{2022}") + listdir(f"{my_path}{2023}")) if "zip" in f]

    # Extract all zip files
    for file_name in onlyfiles:
        path = my_path + ("2022/" if "2022" in file_name else "2023/") + file_name
        with zipfile.ZipFile(path, 'r') as zip_ref:
            zip_ref.extractall(my_path)

In [7]:
import polars as pl

all_files = [f"./datasets/{f}" for f in listdir("./datasets") if "csv" in f]

df = pl.scan_csv(all_files[0]).collect()
df

ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
str,str,str,str,str,str,str,str,f64,f64,f64,f64,str
"""C2F7DD78E82EC8…","""electric_bike""","""2022-01-13 11:…","""2022-01-13 12:…","""Glenwood Ave &…","""525""","""Clark St & Tou…","""RP-007""",42.0128,-87.665906,42.01256,-87.674367,"""casual"""
"""A6CF8980A652D2…","""electric_bike""","""2022-01-10 08:…","""2022-01-10 08:…","""Glenwood Ave &…","""525""","""Clark St & Tou…","""RP-007""",42.012763,-87.665967,42.01256,-87.674367,"""casual"""
"""BD0F91DFF741C6…","""classic_bike""","""2022-01-25 04:…","""2022-01-25 04:…","""Sheffield Ave …","""TA1306000016""","""Greenview Ave …","""TA1307000001""",41.925602,-87.653708,41.92533,-87.6658,"""member"""
"""CBB80ED4191054…","""classic_bike""","""2022-01-04 00:…","""2022-01-04 00:…","""Clark St & Bry…","""KA1504000151""","""Paulina St & M…","""TA1309000021""",41.983593,-87.669154,41.961507,-87.671387,"""casual"""
"""DDC963BFDDA51E…","""classic_bike""","""2022-01-20 01:…","""2022-01-20 01:…","""Michigan Ave &…","""TA1309000002""","""State St & Ran…","""TA1305000029""",41.87785,-87.62408,41.884621,-87.627834,"""member"""
"""A39C6F6CC0586C…","""classic_bike""","""2022-01-11 18:…","""2022-01-11 18:…","""Wood St & Chic…","""637""","""Honore St & Di…","""TA1305000034""",41.895634,-87.672069,41.903119,-87.673935,"""member"""
"""BDC4AB637EDF98…","""classic_bike""","""2022-01-30 18:…","""2022-01-30 18:…","""Oakley Ave & I…","""KA1504000158""","""Broadway & She…","""13323""",41.954341,-87.68608,41.952833,-87.649993,"""member"""
"""81751A3186E59A…","""classic_bike""","""2022-01-22 12:…","""2022-01-22 12:…","""Sheffield Ave …","""TA1306000016""","""Damen Ave & Cl…","""13271""",41.925602,-87.653708,41.931931,-87.677856,"""member"""
"""154222B86A338A…","""electric_bike""","""2022-01-17 07:…","""2022-01-17 08:…","""Racine Ave & 1…","""13304""","""Clinton St & W…","""WL-012""",41.861251,-87.6565,41.88338,-87.64117,"""member"""
"""72DC25B2DD467E…","""classic_bike""","""2022-01-28 15:…","""2022-01-28 15:…","""LaSalle St & J…","""TA1309000004""","""Clinton St & W…","""WL-012""",41.878166,-87.631929,41.88338,-87.64117,"""member"""


## Exploring

In [8]:
# It's necessary to know the number of each rideable type and the number of each user type.
# The number of each rideable type.
df_rideable_type = df['rideable_type'].value_counts()
df_rideable_type

rideable_type,counts
str,u32
"""docked_bike""",961
"""classic_bike""",55067
"""electric_bike""",47742


In [48]:
# The number of each user type.
df_user_type = df['member_casual'].value_counts()
df_user_type

member_casual,counts
str,u32
"""casual""",18520
"""member""",85250


Each row of the file corresponds to a single trip and contains information such as the following:
1. **ride_id**: A unique identifier for each trip
2. **rideable_type**: The type of bike used for the trip
    - Here i understand that has three types of bike: **classic_bike**, **docked_bike** and **electric_bike**.
3. **started_at**: The date/time when the trip started, in UTC
4. **ended_at**: The date/time when the trip ended, in UTC
5. **start_station_name**: The station name where the trip originated
6. **start_station_id**: A unique identifier for the station where the trip started
7. **end_station_name**: The station name where the trip terminated
8. **end_station_id**: A unique identifier for the station where the trip ended
9. **start_lat**: The latitude of the station where the trip started
10. **start_lng**: The longitude of the station where the trip started
11. **end_lat**: The latitude of the station where the trip ended
12. **end_lng**: The longitude of the station where the trip ended
13. **member_casual**: Whether the ride was taken by a member or a casual user
    - This field has two values: **member** and **casual** already mencioned above.

In [10]:
df.shape

(103770, 13)

In [11]:
df.describe()

describe,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
str,str,str,str,str,str,str,str,str,f64,f64,f64,f64,str
"""count""","""103770""","""103770""","""103770""","""103770""","""103770""","""103770""","""103770""","""103770""",103770.0,103770.0,103770.0,103770.0,"""103770"""
"""null_count""","""0""","""0""","""0""","""0""","""16260""","""16260""","""17927""","""17927""",0.0,0.0,86.0,86.0,"""0"""
"""mean""",,,,,,,,,41.89685,-87.648622,41.89695,-87.648964,
"""std""",,,,,,,,,0.049664,0.053199,0.0484,0.031342,
"""min""","""00010C6E382D64…","""classic_bike""","""2022-01-01 00:…","""2022-01-01 00:…","""2112 W Peterso…","""13001""","""2112 W Peterso…","""13001""",41.65,-87.83,41.648501,-87.83,"""casual"""
"""max""","""FFFE5FA260E982…","""electric_bike""","""2022-01-31 23:…","""2022-02-01 01:…","""Yates Blvd & 9…","""WL-012""","""Yates Blvd & 9…","""WL-012""",45.635034,-73.796477,42.07,-87.52,"""member"""
"""median""",,,,,,,,,41.894877,-87.644098,41.895501,-87.644098,
"""25%""",,,,,,,,,41.879255,-87.664169,41.879344,-87.664358,
"""75%""",,,,,,,,,41.925602,-87.629912,41.925602,-87.629912,


In [12]:
%pip install plotly
%pip install pandas
%pip install pyarrow
%pip install nbformat

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [49]:
import plotly.express as px

fig = px.scatter_geo(df.filter(pl.col('start_station_id') == None).head(50), lat='start_lat', lon='start_lng',
                     hover_name='start_station_name', color_continuous_scale='reds',
                     title='Earthquakes Around the World')
fig.show()

In [61]:
empty_start_station_data = df.filter(pl.col('start_station_id') == None)
empty_start_station_data

ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
str,str,str,str,str,str,str,str,f64,f64,f64,f64,str
"""857B71104B4375…","""electric_bike""","""2022-01-04 17:…","""2022-01-04 17:…",,,"""Lockwood Ave &…","""312""",41.93,-87.76,41.93,-87.76,"""casual"""
"""565EEF32A9B650…","""electric_bike""","""2022-01-11 21:…","""2022-01-11 21:…",,,"""Ashland Ave & …","""13319""",41.95,-87.65,41.950687,-87.6687,"""member"""
"""C1C1910260144C…","""electric_bike""","""2022-01-05 03:…","""2022-01-05 03:…",,,"""Ashland Ave & …","""13319""",41.92,-87.69,41.950687,-87.6687,"""member"""
"""A3CE9212720037…","""electric_bike""","""2022-01-18 07:…","""2022-01-18 07:…",,,"""Southport Ave …","""TA1309000030""",41.91,-87.69,41.920771,-87.663712,"""member"""
"""A285AF99096A99…","""electric_bike""","""2022-01-14 11:…","""2022-01-14 11:…",,,"""Southport Ave …","""TA1309000030""",41.91,-87.69,41.920771,-87.663712,"""member"""
"""80F29B80E0DF90…","""electric_bike""","""2022-01-03 16:…","""2022-01-03 16:…",,,"""Wallace St & 3…","""TA1308000045""",41.83,-87.62,41.831014,-87.641184,"""member"""
"""5D1029D15EA45D…","""electric_bike""","""2022-01-08 13:…","""2022-01-08 13:…",,,"""Woodlawn Ave &…","""KA1503000065""",41.8,-87.59,41.814093,-87.597005,"""member"""
"""14AAFBC4ACE693…","""electric_bike""","""2022-01-02 14:…","""2022-01-02 15:…",,,"""Damen Ave & Wa…","""20.0""",41.91,-87.69,41.91,-87.68,"""member"""
"""F8F5A55A2318BA…","""electric_bike""","""2022-01-08 16:…","""2022-01-08 16:…",,,"""Damen Ave & Wa…","""20.0""",41.94,-87.7,41.91,-87.68,"""member"""
"""09BBB0D56ADEF6…","""electric_bike""","""2022-01-01 22:…","""2022-01-01 23:…",,,"""Kilpatrick Ave…","""358""",41.9,-87.67,41.93,-87.74,"""casual"""


In [50]:
%pip install geopy

Defaulting to user installation because normal site-packages is not writeable
Collecting geopy
  Downloading geopy-2.3.0-py3-none-any.whl (119 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m[31m2.6 MB/s[0m eta [36m0:00:01[0m
[?25hCollecting geographiclib<3,>=1.52
  Downloading geographiclib-2.0-py3-none-any.whl (40 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.3/40.3 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: geographiclib, geopy
Successfully installed geographiclib-2.0 geopy-2.3.0
Note: you may need to restart the kernel to use updated packages.


In [59]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="coordinateconverter")
location = geolocator.reverse("41.93, -87.76")
print(location.address.split(", "))

['2701-2727', 'North Long Avenue', 'Belmont Cragin', 'Chicago', 'Jefferson Township', 'Cook County', 'Illinois', '60639', 'United States']
