# UFO Sightings

#### The objective of this assignment is for you to explain what is happening in each cell in clear, understandable language. 

#### _There is no need to code._ The code is there for you, and it already runs. Your task is only to explain what each line in each cell does.

#### The placeholder cells should describe what happens in the cell below it.

**Example**: The cell below imports `pandas` as a dependency because `pandas` functions will be used throughout the program, such as the Pandas `DataFrame` as well as the `read_csv` function.

In [None]:
import pandas as pd

The cell below has 3 commands: (1) loads in the input or source file by directing the path of the file (where it is located) and then (2) reads in the file into a panda DataFrame. In this case, the source file, ufoSightings.csv is located in the Resources subdirectory and is read into pandas DataFrame.  The last command displays the DataFrame content using a .head() which limits the display to the first few rows.

In [None]:
csv_path = "Resources/ufoSightings.csv"

ufo_df = pd.read_csv(csv_path)

ufo_df.head()

This cell below counts the number of items (rows) in the ufo DataFrame list.

In [None]:
ufo_df.count()

The cell below creates a new data from the original DataFrame dropping any row or column that has NA values present.  Then it displays a count of the rows in the new DataFrame list.

In [None]:
clean_ufo_df = ufo_df.dropna(how="any")
clean_ufo_df.count()

The cell below assigns column names to the current pandas DataFrame series and then creates a new DataFrame using the newly defined column list to filter records.  Using the defined column list as the 2nd parameter in the .loc sets the specified value as the column index label mapping from the defined column list (the first parameter sets the index label for the row) .  The command results in a list of rows that match the specified value (“US”) for all elements in the specified column (“country”) that was defined in the columns list.

Lastly the cell displays the first few rows in the new DataFrame.


In [None]:
columns = [
    "datetime",
    "city",
    "state",
    "country",
    "shape",
    "duration (seconds)",
    "duration (hours/min)",
    "comments",
    "date posted"
]

usa_ufo_df = clean_ufo_df.loc[clean_ufo_df["country"] == "us", columns]
usa_ufo_df.head()

This cell counts and stores the number of ufo sightings by states by returning the unique count of values in the “states” column. Lastly the count of states (value in the state_counts variable) is displayed. 


In [None]:
state_counts = usa_ufo_df["state"].value_counts()
state_counts


The cell below creates a new DataFrame of the ufo sighting counts grouped by state, converts the .values_count values into a column within the DataFrame, and then displays the first few rows in the DataFrame.  The data type for the variable holding the sighting count (an integer/number) is: series int64. This count is necessary to provide state level sighting informaiton.  

In [None]:
state_ufo_counts_df = pd.DataFrame(state_counts)
state_ufo_counts_df.head()

The cell below renames the state ufo count DataFrame column labeled "state". This action results in the new count column having a more accurate/descriptive name of "Sum of Sightings".

In [None]:
state_ufo_counts_df = state_ufo_counts_df.rename(
    columns={"state": "Sum of Sightings"})
state_ufo_counts_df.head()

The cell below lists the DataFrame data types of each element (column).

In [None]:
usa_ufo_df.dtypes

The cell below redefines the data type for every row of the "duration (seconds" column from "ojbect" to a floating point number (or decimal number).

In [None]:
usa_ufo_df.loc[:, "duration (seconds)"] = usa_ufo_df["duration (seconds)"].astype("float")
usa_ufo_df.dtypes

The cell below calculates the sum of the "duration (seconds)" column to the nearest hundreth using the .sum function and displays the value. The cell above allowed this action by changing the value data type to a floating point number.

In [None]:
# Now it is possible to find the sum of seconds
usa_ufo_df["duration (seconds)"].sum()

The cell below groups the DataFrame by state and city and then totals all of the records by the grouping a count of thefirst column (datetime).  

In [None]:
grouped_data = usa_ufo_df.groupby(['state', 'city'])

# Hint: If you are counting records, you can use any column and get the same result. Try it.
grouped_data['datetime'].count()