# Inspecting a DataFrame Object

## About the Data
In this notebook, we will be working with earthquake data from September 18, 2018 - October 13, 2018 (obtained from the US Geological Survey (USGS) using the [USGS API](https://earthquake.usgs.gov/fdsnws/event/1/))

## Setup
We will be working with the `data/earthquakes.csv` file again, so we need to handle our imports and read it in.

In [None]:
import numpy as np
import pandas as pd

df = pd.read_csv('data/earthquakes.csv')

## Examining dataframes
### Is it empty?

In [None]:
df.empty

### What are the dimensions?

In [None]:
df.shape

### What columns do we have?
We know there are 26 columns, but what are they? Let's use the `columns` attribute to see:

In [None]:
df.columns

### What does the data look like?
View rows from the top with `head()`:

In [None]:
df.head()

View rows from the bottom with `tail()`. Let's view 2 rows:

In [None]:
df.tail(2)

*Tip: we can modify the display options in order to see more columns:*

```python
# check the max columns setting
>>> pd.get_option('display.max_columns')
20

# set the max columns to show when printing the dataframe to 26
>>> pd.set_option('display.max_columns', 26)
# OR
>>> pd.options.display.max_columns = 26

# reset the option
>>> pd.reset_option('display.max_columns')

# get information on all display settings
>>> pd.describe_option('display')
```

*More information can be found in the documentation [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html).*

### What data types do we have?

In [None]:
df.dtypes

### Getting extra info and finding nulls

In [None]:
df.info()

## Describing and Summarizing
### Get summary statistics

In [None]:
df.describe()

Specifying the 5<sup>th</sup> and 95<sup>th</sup> percentile:

In [None]:
df.describe(percentiles=[0.05, 0.95])

Describe specific data types:

In [None]:
df.describe(include=np.object)

Or describe all of them:

In [None]:
df.describe(include='all')

This works on columns also:

In [None]:
df.felt.describe()

There are methods for specific statistics as well. Here is a sampling of them:

| Method | Description | Data types |
| --- | --- | --- |
| `count()` | The number of non-null observations | Any |
| `nunique()` | The number of unique values | Any |
| `sum()` | The total of the values | Numerical or Boolean |
| `mean()` | The average of the values | Numerical or Boolean |
| `median()` | The median of the values | Numerical |
| `min()` | The minimum of the values | Numerical |
| `idxmin()` | The index where the minimum values occurs | Numerical |
| `max()` | The maximum of the values | Numerical |
| `idxmax()` | The index where the maximum value occurs | Numerical |
| `abs()` | The absolute values of the data | Numerical |
| `std()` | The standard deviation | Numerical |
| `var()` | The variance |  Numerical |
| `cov()` | The covariance between two `Series`, or a covariance matrix for all column combinations in a `DataFrame` | Numerical |
| `corr()` | The correlation between two `Series`, or a correlation matrix for all column combinations in a `DataFrame` | Numerical |
| `quantile()` | Calculates a specific quantile | Numerical |
| `cumsum()` | The cumulative sum | Numerical or Boolean |
| `cummin()` | The cumulative minimum | Numerical |
| `cummax()` | The cumulative maximum | Numerical |

For example, finding the unique values in the `alert` column:

In [None]:
df.alert.unique()

We can then use `value_counts()` to see how many of each unique value we have:

In [None]:
df.alert.value_counts()

Note that `Index` objects also have several methods to help describe and summarize our data:

| Method | Description |
| --- | --- |
| `argmax()`/`argmin()` | Find the location of the maximum/minimum value in the index |
| `equals()` | Compare the index to another `Index` object for equality |
| `isin()` | Check if the index values are in a list of values and return an array of Booleans |
| `max()`/`min()` | Find the maximum/minimum value in the index |
| `nunique()` | Get the number of unique values in the index |
| `to_series()` | Create a `Series` object from the index |
| `unique()` | Find the unique values of the index |
| `value_counts()`| Create a frequency table for the unique values in the index |

<hr>
<div>
    <a href="./3-making_dataframes_from_api_requests.ipynb">
        <button style="float: left;">&#8592; Previous Notebook</button>
    </a>
    <a href="./5-subsetting_data.ipynb">
        <button style="float: right;">Next Notebook &#8594;</button>
    </a>
</div>
<br>
<hr>