# PyBer Challenge
**Objective**: Determine if there is a correlation between the average fare and the total rides for each city type for the individual scatter plots, and if there is any statistical significance between the different city types for each box-and-whisker plot.

## Tasks
1. Create a PyBer Summary DataFrame for each city type:
    * Total Rides
    * Total Drivers
    * Total Fares
    * Average Fare per Ride
    * Average Fare per Driver
2. Create a Multiple-Line Plot for the sum of the fares for each city type.

### Notebook Setup

In [1]:
# Add Matplotlib inline magic command
%matplotlib inline

# Setup dependencies
import matplotlib.pyplot as plt
import pandas as pd
import os

In [2]:
# Data files to load
city_data_to_load = os.path.join("Resources", "city_data.csv")
ride_data_to_load = os.path.join("Resources", "ride_data.csv")

# Read data files and store them in a Pandas DataFrame
city_data_df = pd.read_csv(city_data_to_load)
ride_data_df = pd.read_csv(ride_data_to_load)

### Data Cleaning

In [7]:
city_data_df.count()

city            120
driver_count    120
type            120
dtype: int64

In [6]:
city_data_df.isnull().sum()

city            0
driver_count    0
type            0
dtype: int64

In [9]:
city_data_df.dtypes

city            object
driver_count     int64
type            object
dtype: object

In [10]:
city_data_df["type"].unique()

array(['Urban', 'Suburban', 'Rural'], dtype=object)

In [11]:
sum(city_data_df["type"]=="Urban")

66

In [12]:
sum(city_data_df["type"]=="Suburban")

36

In [13]:
sum(city_data_df["type"]=="Rural")

18

In [14]:
ride_data_df.count()

city       2375
date       2375
fare       2375
ride_id    2375
dtype: int64

In [16]:
ride_data_df.isnull().sum()

city       0
date       0
fare       0
ride_id    0
dtype: int64

In [17]:
ride_data_df.dtypes

city        object
date        object
fare       float64
ride_id      int64
dtype: object

### Merge DataFrames

In [65]:
# Combine data into a single data
pyber_data_df = pd.merge(ride_data_df, city_data_df, how="left", on=["city", "city"])

pyber_data_df.dtypes

city             object
date             object
fare            float64
ride_id           int64
driver_count      int64
type             object
dtype: object

## Task 1: Create PyBer Summary DataFrame

In [69]:
total_rides = pyber_data_df.groupby(["type"]).count()["ride_id"]

In [70]:
total_rides

type
Rural        125
Suburban     625
Urban       1625
Name: ride_id, dtype: int64

In [90]:
rural_total_rides = pyber_data_df.groupby(["type"])==["Rural"].count()["ride_id"]

TypeError: count() takes exactly one argument (0 given)

In [87]:
rural_total_rides

False

In [104]:
rural_total_rides = pyber_data_df[pyber_data_df["type"] == "Rural"].count()["ride_id"]

In [105]:
rural_total_rides

125

In [120]:
rural_cities = pyber_data_df.groupby(["city"]).count()["ride_id"]

In [121]:
rural_cities

city
Amandaburgh         18
Barajasview         22
Barronchester       16
Bethanyland         18
Bradshawfurt        10
                    ..
West Robert         31
West Samuelburgh    25
Williamsonville     14
Williamsstad        23
Williamsview        20
Name: ride_id, Length: 120, dtype: int64