# Data Workout - Parking Tickets

## Parking Data

This activity uses a sample of the New York City Parking Violations Dataset. Imagine this data was collected by police officers, parking inspectors, or other individuals. This means the data might have some missing or incorrect information.

## Importing the Tools

* Import the `pandas` library with the `pd` alias.


In [None]:
## Begin Solution


## End Solution

### Exercise 1 - Loading the Data

In this exercise, you'll start by loading the parking violation data into a DataFrame and selecting the columns we'll be working with. 

__Your Task__

1. __Create a DataFrame__:

    * Create a DataFrame named `parking_df` from the file located at:
       
        * `../data/nyc-parking-violation-sample.csv`

2. __Select Specific Columns__:

    * From the loaded data, we only need a few specific pieces of information.  Create a new DataFrame (you can name it `parking_df` again, overwriting the previous one, or use a new name) that includes only the following columns:

        * `Plate ID`
        * `Registration State`
        * `Vehicle Make`
        * `Vehicle Color`
        * `Violation Time`
        * `Street Name`

3. __Get to Know Your Data__:

    * Now, let's take a look at the structure of your DataFrame. Output the information about the `parking_df` DataFrame.  This will help you understand what you're working with.  Make sure to include:

        * The name of each column
        * The number of entries (rows) in each column
        * The data type of each column (e.g., text, numbers, dates)

In [None]:
## Begin Solution
data = 

# Import Data


# Print Info (Before removing columns)


# Specify Columns to Keep


# Filter columns


# Output the Resulting Info


## End Solution

### Exercise 2 - Removing `NaN`

In this exercise, you'll learn how to handle missing data, a common issue in real-world datasets. We'll remove rows with missing values and then analyze the impact of this data cleaning step.

__Your Task__

1. __Clean the Data:__

    * Create a new DataFrame named `cleaned_parking_df`.

        * Remove any rows from the `parking_df` DataFrame (from Exercise 1) that have missing data (represented as `NaN` values).
  


1. __Analyze the Cleaned Data:__

    * Determine the number of rows in `cleaned_parking_df`.  In other words, how many rows are left after removing the rows with missing data?
  


2. __Calculate Avoided Fines (Hypothetical):__

    * For the sake of this exercise, let's imagine that each parking ticket carries a $100 fine.
    * Also, imagine that if a ticket has any missing information, it can be successfully contested, and the fine is waived.
    * Based on the rows you removed in step 1, calculate the total amount of fines that New York City citizens hypothetically avoided due to missing data.
  


3. __Important Notes:__

    * The idea that missing data automatically voids a ticket is a simplified scenario created for this exercise to make it more engaging. It is not based on actual legal information.
    * The purpose of this exercise is to illustrate the impact of data cleaning.
    * This exercise is for educational purposes only. I am not a lawyer, and this should not be taken as legal advice. For legal advice, please consult a qualified professional.

In [None]:
## Begin Solution

# Drop Null Rows


# Count Rows



# Calculate Fine




## End Solution

### Exercise 3 - Missing Data

Let's switch up the removal criteria. A ticket can only be dismissed if the license plate, state, and or street name are missing.

__Your Task:__

1. Clean the Data:

    * Create a new DataFrame named `improved_parking_df`.

    * Remove rows from the `parking_df` DataFrame (from Exercise 1) that have missing data (represented as NaN values) in any of the following columns:

        * `Plate ID`
        * `Registration State`
        * `Street Name`

2. Analyze the Cleaned Data:
    * Determine the number of rows in `improved_parking_df`. In other words, how many rows are left after removing the rows with missing data?

3. Calculate Avoided Fines (Hypothetical):
    * For the sake of this exercise, let's imagine that each parking ticket carries a $100 fine.
    * Also, imagine that if a ticket has missing information in the `Plate ID`, `Registration State`, or `Street Name` columns, it can be successfully contested, and the fine is waived.
    * Based on the rows you removed in step 1, calculate the total amount of fines that New York City citizens hypothetically avoided due to missing data.
    * The result should be a more realistic value than the previous exercise.



In [None]:
## Begin Solution

# Drop Rows containing null values in certain columns




# Get Row Count


# Calculate Avoided Fees




## End Solution

### Exercise 3 - Missing License Plates

In data cleaning, we often deal with not just missing data (like `NaN` values), but also data that, while present, is invalid. This exercise focuses on identifying and removing invalid data.

__Your Task:__

Consider a new scenario where a parking ticket can be contested and dismissed if the Plate ID is recorded as `BLANKPLATE`.


1. __Clean the Data:__

    * Create a new DataFrame, `blank_plates_df`.
    * Start with the original DataFrame, `parking_df` (from Exercise 1).
    * Remove all rows where the `Plate ID` column contains the value `BLANKPLATE`.

2. __Analyze the Cleaned Data:__

    * Determine how many rows were removed from the original DataFrame (`parking_df`) in the previous step.

3. __Calculate Avoided Fines (Hypothetical):__

    * Based on the scenario where a `BLANKPLATE` entry allows a ticket to be successfully contested, calculate the total amount in fines that NYC citizens could have potentially avoided. Assume each fine is $100.

In [None]:
## Begin Solution

# Create Mask to Isolate BLANKPLATE (not null values!)


# Apply Filter


# Calculate Fees Avoided





## End Solution

## Bonus - Vehicle Colors

Inspect and clean the `Vehicle Color` column from the `parking_df` dataframe.

What do you notice?

How will you go about cleaning this data?

In [None]:
## Your Solution

















## Freestyle

What information can you and your group gather from the dataset on your own?

Use the rest of the notebook to explore and discover new patterns.

In [None]:
## Your Code














