In [1]:
# Add the magic command 
%matplotlib inline

In [2]:
# Import the Pandas and Matplotlib libraries dependencies
import matplotlib.pyplot as plt
import pandas as pd

In [3]:
# Files to load
city_data_to_load = "Resources/city_data.csv"
ride_data_to_load = "Resources/ride_data.csv"

We can use the os.path.join() to load CSV files, but you'll need to first import the os module with the other dependencies, like this: import os

In [4]:
# Read the city data file and store it in a pandas DataFrame.
city_data_df = pd.read_csv(city_data_to_load)
city_data_df.head(10)

Unnamed: 0,city,driver_count,type
0,Richardfort,38,Urban
1,Williamsstad,59,Urban
2,Port Angela,67,Urban
3,Rodneyfort,34,Urban
4,West Robert,39,Urban
5,West Anthony,70,Urban
6,West Angela,48,Urban
7,Martinezhaven,25,Urban
8,Karenberg,22,Urban
9,Barajasview,26,Urban


In [5]:
# Read the ride data file and store it in a pandas DataFrame.
ride_data_df = pd.read_csv(ride_data_to_load)
ride_data_df.head(10)

Unnamed: 0,city,date,fare,ride_id
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344
5,South Latoya,2019-03-11 12:26:48,9.52,1994999424437
6,New Paulville,2019-02-27 11:17:56,43.25,793208410091
7,Simpsonburgh,2019-04-26 00:43:24,35.98,111953927754
8,South Karenland,2019-01-08 03:28:48,35.09,7995623208694
9,North Jasmine,2019-03-09 06:26:29,42.81,5327642267789


## Inspect the City Data DataFrame
For the city_data_df DataFrame:
1. Get all the rows that contain null values.
2. Make sure the driver_count column has an integer data type.
3. Find out how many data points there are for each type of city.

In [6]:
# Get the columns and the rows that are not null.
# use the df.count() method to find the names of our columns and the number of rows that are not null.

city_data_df.count()

city            120
driver_count    120
type            120
dtype: int64

In [8]:
# Confirm there are no null values
city_data_df.isnull().sum()

city            0
driver_count    0
type            0
dtype: int64

In [9]:
# To see if the driver_count column has a numerical data type to perform mathematical calculations on that column, use the dtypes on the DataFrame.
# Get the data types of each column.
city_data_df.dtypes

city            object
driver_count     int64
type            object
dtype: object

In [10]:
# To see how many data points there are for each type of city, use the sum() method on the city_data_df for the type column where the condition equals each city in the DataFrame.
# Use the unique() method on a specific column, which will return an array, or list, of all the unique values of that column.

# Get the unique values of the type of city.
city_data_df["type"].unique()

array(['Urban', 'Suburban', 'Rural'], dtype=object)

In [14]:
# Use the sum() method on the city_data_df for the type column where the condition equals either Urban, Suburban, or Rural.

# Get the number of data points from the Urban cities.
sum(city_data_df["type"]=="Urban")

66

## Inspect Ride Data DataFrame
For the ride_data_df DataFrame:
1. Get all the rows that contain null values.
2. Make sure the fare and ride_id columns are numerical data types.

In [15]:
# Get the columns and the rows that are not null.
ride_data_df.count()

city       2375
date       2375
fare       2375
ride_id    2375
dtype: int64

In [16]:
# Confirm there are no null values
ride_data_df.isnull().sum()

city       0
date       0
fare       0
ride_id    0
dtype: int64

In [17]:
# Get the data types of each column.
ride_data_df.dtypes

city        object
date        object
fare       float64
ride_id      int64
dtype: object

## Merge DataFrames
When we merge two DataFrames, we merge on a column with the same data, and the same column name, in both DataFrames. We use the following syntax to do that:
new_df = pd.merge(leftdf, rightdf, on=["column_leftdf", "column_rightdf"])

We may have to merge the DataFrames using the how= parameter either left, right, inner, or outer depending how we want to merge the DataFrames. The default is inner.

From the columns in the two DataFrames, the column the DataFrames have in common is city. Therefore, merge the two DataFrames on the city column, and then add the city_data_df to the end of the ride_data_df DataFrame with the constraint how="left".

In [18]:
# Combine the data into a single dataset
pyber_data_df = pd.merge(ride_data_df, city_data_df, how="left", on=["city", "city"])

# Display the DataFrame
pyber_data_df.head()

Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873,5,Urban
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577,72,Urban
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003,57,Urban
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178,34,Urban
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344,46,Urban


In the pyber_data_df DataFrame, all the columns from the city_data_df are the first four columns after the index. The driver_count and type columns from the ride_data_df are added at the end.