# PyBer Challenge
**Objective**: Determine if there is a correlation between the average fare and the total rides for each city type for the individual scatter plots, and if there is any statistical significance between the different city types for each box-and-whisker plot.

## Tasks
1. Create a PyBer Summary DataFrame for each city type:
    * Total Rides
    * Total Drivers
    * Total Fares
    * Average Fare per Ride
    * Average Fare per Driver
2. Create a Multiple-Line Plot for the sum of the fares for each city type.

### Notebook Setup

In [1]:
# Add Matplotlib inline magic command
%matplotlib inline

# Setup dependencies
import matplotlib.pyplot as plt
import pandas as pd
import os

In [2]:
# Data files to load
city_data_to_load = os.path.join("Resources", "city_data.csv")
ride_data_to_load = os.path.join("Resources", "ride_data.csv")

# Read data files and store them in a Pandas DataFrame
city_data_df = pd.read_csv(city_data_to_load)
ride_data_df = pd.read_csv(ride_data_to_load)

### Data Cleaning

In [3]:
city_data_df.count()

city            120
driver_count    120
type            120
dtype: int64

In [4]:
city_data_df.isnull().sum()

city            0
driver_count    0
type            0
dtype: int64

In [5]:
city_data_df.dtypes

city            object
driver_count     int64
type            object
dtype: object

In [6]:
city_data_df["type"].unique()

array(['Urban', 'Suburban', 'Rural'], dtype=object)

In [10]:
ride_data_df.count()

city       2375
date       2375
fare       2375
ride_id    2375
dtype: int64

In [11]:
ride_data_df.isnull().sum()

city       0
date       0
fare       0
ride_id    0
dtype: int64

In [12]:
ride_data_df.dtypes

city        object
date        object
fare       float64
ride_id      int64
dtype: object

### Merge DataFrames

In [115]:
# Combine data into a single data
pyber_data_df = pd.merge(ride_data_df, city_data_df, how="left", on=["city", "city"])

pyber_data_df

Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873,5,Urban
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577,72,Urban
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003,57,Urban
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178,34,Urban
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344,46,Urban
...,...,...,...,...,...,...
2370,Michaelberg,2019-04-29 17:04:39,13.38,8550365057598,6,Rural
2371,Lake Latoyabury,2019-01-30 00:05:47,20.76,9018727594352,2,Rural
2372,North Jaime,2019-02-10 21:03:50,11.11,2781339863778,1,Rural
2373,West Heather,2019-05-07 19:22:15,44.94,4256853490277,4,Rural


## Task 1: Create PyBer Summary DataFrame

In [14]:
# Set up Rural Cities DF, Suburban Cities DF, and Urban Cities DF
rural_cities_df = pyber_data_df[pyber_data_df["type"] == "Rural"]
urban_cities_df = pyber_data_df[pyber_data_df["type"] == "Urban"]
suburban_cities_df = pyber_data_df[pyber_data_df["type"] == "Suburban"]

#### Calculate the total number of rides in each City Type

In [101]:
total_rides_by_city_type = pyber_data_df.groupby(["type"]).count()["ride_id"]
total_rides_by_city_type

type
Rural        125
Suburban     625
Urban       1625
Name: ride_id, dtype: int64

In [102]:
# Get the total number of rides in Rural Cities
rural_total_rides = sum(rural_cities_df.groupby(["city"]).count()["ride_id"])
rural_total_rides

125

In [103]:
# Get the total number of rides in Suburban Cities
suburban_total_rides = sum(suburban_cities_df.groupby(["city"]).count()["ride_id"])
suburban_total_rides

625

In [104]:
# Get the total number of rides in Urban Cities
urban_total_rides = sum(urban_cities_df.groupby(["city"]).count()["ride_id"])
urban_total_rides

1625

#### Calculate the total number of drivers in each City Type

In [133]:
total_drivers_by_city_type = city_data_df.groupby(["type"]).sum()["driver_count"]
total_drivers_by_city_type

type
Rural         78
Suburban     490
Urban       2405
Name: driver_count, dtype: int64

In [164]:
# Get the total number of drivers in Rural Cities
rural_total_drivers = sum(rural_cities_df.groupby(["city"]).count()["driver_count"])
rural_total_drivers

125

In [155]:
# Get the total number of drivers in Suburban Cities
suburban_total_drivers = sum(suburban_cities_df.groupby(["city"]).sum()["driver_count"])
suburban_total_drivers

8570

In [127]:
# Get the total number of drivers in Urban Cities
urban_drivers_count = urban_cities_df.groupby(["city"]).sum()["driver_count"]
urban_drivers_count

city
Amandaburgh             216
Barajasview             572
Carriemouth            1404
Christopherfurt        1107
Deanville               931
                       ... 
West Patrickchester     400
West Robert            1209
West Samuelburgh       1825
Williamsstad           1357
Williamsview            920
Name: driver_count, Length: 66, dtype: int64

#### Calculate the total fares in each City Type

In [77]:
total_fares_by_city_type = pyber_data_df.groupby(["type"]).sum()["fare"]
total_fares_by_city_type

type
Rural        4327.93
Suburban    19356.33
Urban       39854.38
Name: fare, dtype: float64

In [70]:
# Get the total fares in Rural Cities
rural_total_fares = sum(rural_cities_df.groupby(["city"]).sum()["fare"])
rural_total_fares

4327.93

In [72]:
# Get the total fares in Suburban Cities
suburban_total_fares = sum(suburban_cities_df.groupby(["city"]).sum()["fare"])
suburban_total_fares

19356.33

In [73]:
# Get the total fares in Urban Cities
urban_total_fares = sum(urban_cities_df.groupby(["city"]).sum()["fare"])
urban_total_fares

39854.380000000005

#### Calculate the average fare per ride in each City Type

In [148]:
avgfares_perride_by_city_type = total_fares_by_city_type / total_rides_by_city_type
avgfares_perride_by_city_type

type
Rural       34.623440
Suburban    30.970128
Urban       24.525772
dtype: float64

In [149]:
rural_avgfare_perride = rural_total_fares / rural_total_rides
rural_avgfare_perride

34.62344

In [150]:
suburban_avgfare_perride = suburban_total_fares / suburban_total_rides
suburban_avgfare_perride

30.970128000000003

In [151]:
urban_avgfare_perride = urban_total_fares / urban_total_rides
urban_avgfare_perride

24.52577230769231

#### Calculate the average fare per driver in each City Type

In [165]:
avgfare_perdriver_by_city_type = total_fares_by_city_type / total_drivers_by_city_type
avgfare_perdriver_by_city_type

type
Rural       55.486282
Suburban    39.502714
Urban       16.571468
dtype: float64

In [34]:
rural_avgfare_perdriver = rural_total_fares / rural_drivers_count
rural_avgfare_perdriver.head()

city
Bradshawfurt      40.064000
Garzaport         24.123333
Harringtonfort    33.470000
Jessicaport       36.013333
Lake Jamie        34.358333
dtype: float64

In [35]:
suburban_avgfare_perdriver = suburban_fares / suburban_drivers_count
suburban_avgfare_perdriver.head()

city
Barronchester    36.422500
Bethanyland      32.956111
Brandonfort      35.437368
Colemanland      30.894545
Davidfurt        31.995882
dtype: float64

In [36]:
urban_avgfare_perdriver = urban_fares / urban_drivers_count
urban_avgfare_perdriver.head()

city
Amandaburgh        24.641667
Barajasview        25.332273
Carriemouth        28.314444
Christopherfurt    24.501852
Deanville          25.842632
dtype: float64

#### Build DataFrame using calculated series

In [37]:
summary_df = pd.DataFrame()