# Pandas Recap

Recap the options that we've learned about how to read, clean up, explore, & summarize data using a DataFrame.

#### Import Dependencies

In [None]:
import pandas as pd
import os

#### Load the provided csv from the *Resources* folder

In [None]:
# Create a reference the CSV file desired
csv_path = os.path.join("Resources","ufoSightings.csv")

# Read the CSV into a Pandas DataFrame
ufo_df = pd.read_csv(csv_path)

# Print the first five rows of data to the screen
ufo_df.head()

<hr>

## 1. Clean up and filter the data

#### Check to see if there are any rows with missing data

In [None]:
ufo_df.count()

#### Remove the rows with missing data

**IN THE REAL WORLD, MAKE SURE THAT YOU'VE INVESTIGATED THE NULLS FIRST.**

In [None]:
clean_ufo_df = ufo_df.dropna(how="any")
clean_ufo_df.count()

#### Create a new DataFrame that only contains US data and drops the latitude and longitude columns

In [None]:
# Collect a of the columns that we want to keep
columns = [
    "datetime",
    "city",
    "state",
    "country",
    "shape",
    "duration (seconds)",
    "duration (hours/min)",
    "comments",
    "date posted"
]

# Filter the data so that only those sightings in the US are in a DataFrame
# Also include the list of columns that we want to keep
usa_ufo_df = clean_ufo_df.loc[clean_ufo_df["country"] == "us", columns]
usa_ufo_df.head()

<hr>

## 2. Convert a column's datatype for further analysis

#### We want to add up the number of seconds that UFOs are seen, but there is a problem.

Displaying the datatypes of each of our columns shows us that everything is stored as an "object", which means it's stored as a string.

In [None]:
usa_ufo_df.dtypes

#### Using `astype()`, we can convert a column's data into floats

In [None]:
usa_ufo_df["duration (seconds)"] = usa_ufo_df["duration (seconds)"].astype("float")
usa_ufo_df.dtypes

#### Now it is possible to find the sum of seconds

In [None]:
usa_ufo_df["duration (seconds)"].sum()

<hr>

## 3. Count how many sightings have occured within each state

In this cell, we're storing the output of the `value_counts()` function in a variable, so we can use it in the next cells

In [None]:
state_counts = usa_ufo_df["state"].value_counts()
state_counts

#### Convert the state_counts Series into a DataFrame
The output of the `value_counts()` function that we ran above is returned as a Series. We can pass that into the `pd.DataFrame()` function to turn it in to a DataFrame. This could allow us to add additional columns or perform any DataFrame manipulation that we desire.

In [None]:
state_ufo_counts_df = pd.DataFrame(state_counts)
state_ufo_counts_df.head()

#### Add a column to display each state's percentage of the total US sightings

In [None]:
# Calculate the total number of sightings in our state-level DataFrame
state_total_sightings = state_ufo_counts_df['state'].sum()

# create a new column and set it equal the the output of the calculation that we'd like to perform
state_ufo_counts_df['Percent of Total Sightings'] = state_ufo_counts_df['state'] / state_total_sightings
state_ufo_counts_df.head()

#### Rename the "state" column to "Sum of Sightings"

In [None]:
state_ufo_counts_df = state_ufo_counts_df.rename(columns={"state": "Sum of Sightings"})
state_ufo_counts_df.head()