# DIVVY'IN UP DATA

## Overview

This project analyzes anonymized data from Divvy, a rideshare biking service in Chicago, IL. We chose to examine data in the third quarter of 2019 which includes the summer months and high volume of riders compared to other quarters. 

We wanted to uncover interesting patterns in the data. **We asked ourselves:**
* Are there routes and stations that are more popular than others?
* Are there any specific bikes that were rented a surprising amount?
* Are there any relationships between the age and/or gender when comparing one-time customers or yearly subscribers?
* How long, on average, are trips made by Divvy bike?

## Observations

## Importing & Cleaning Data

**EDIT LATER** Text here about how we decided to clean up the data. 


In [None]:
# Import Dependencies
from matplotlib import pyplot as plt
from scipy.stats import linregress
import numpy as np
from sklearn import datasets
import pandas as pd
import requests
import gmaps
import os

# Import API key
# from api_keys import g_key

In [None]:
# Import data file
divvy_df = pd.read_csv('resources/Divvy_Trips_2019_Q3.csv')

In [None]:
# Remove null rows (if needed)
divvy_df.dropna()
divvy_df.shape

In [None]:
# Display sample of dataframe
divvy_df.head()

In [None]:
# Find column names
divvy_df.columns

## Analysis

### Bike Usage

### Popular Stations: Top 25

In [None]:
# Identify the most popular starting stations
top_routes = divvy_df['from_station_name'].value_counts()
total_trips_df = pd.DataFrame({
    'total trips': top_routes
})

# Reduce the results to any stations with trip greater than or equal to 10,000
total_trips_reduced = total_trips_df.loc[total_trips_df['total trips'] >= 10000]
total_trips_reduced

In [1]:
# Display a bar chart of the 25 most popular stations
total_trips_reduced.plot(kind='bar', figsize=(20,3))
plt.title('Most Popular Stations')
plt.xlabel('Stations')
plt.ylabel('Trips')
plt.show()
plt.tight_layout()

### Popular Routes: Top 25

In [None]:
# Create a dataframe shows the most popular ending station for popular starting stations
divvy_rides = {'from_station': divvy_df['from_station_name'],
               'to_station': divvy_df['to_station_name']
              }

divvy_rides_df = pd.DataFrame(divvy_rides, columns=['from_station', 'to_station'])
dup_to_from = divvy_rides_df.pivot_table(index=['from_station', 'to_station'], aggfunc='size')

In [2]:
# Reconfigure dataframe to display data side by side
dup_to_from.columns = dup_to_from.droplevel(0)
dup_to_from.columns.name = None
dup_to_from_II = dup_to_from.reset_index()

In [None]:
# Rename dataframe columns
to_from_III = dup_to_from_II.rename(columns={'from_station': 'from_station', 'to_station':'to_station', 0:'rides_taken'})
to_from_III

In [None]:
to_from_removed_dups = to_from_III[to_from_III['from_station'] != to_from_III['to_station']]


In [None]:
# Sort dataframe of start/end stations, sort, and identify 25 most popular routes
most_pop_rides = to_from_removed_dups.loc[to_from_removed_dups.groupby('from_station')['rides_taken'].idxmax()]
most_popular_routes = most_pop_rides.sort_values(ascending=False, by='rides_taken').reset_index(drop=True)
most_popular_reduced = most_popular_routes[0:25]
most_popular_reduced

In [None]:
# Create a bar chart of the 25 most popular routes
plt.bar(most_popular_reduced['from_station'], most_popular_reduced['rides_taken'], color='darkblue', alpha=.5, align='center')
plt.xticks(most_popular_reduced['from_station'], rotation='vertical')
plt.title('25 Most Popular Divvy Routes (origin)')
plt.xlabel('Divvy Locations')
plt.ylabel('Number of Rides')
plt.show()
plt.tight_layout()

### Comparing Subscribers & Customers

### Use by Gender

### Use by Age Group