## Data Analysis and Visualization

Analyzing and Visualizing data using Pandas

`import pandas as pd`

`from matplotlib import pyplot as plt`

`%matplotlib inline`

### Create a DataFrame

Create a dataframe from San Francisco 311 case reports from 2019. Set the index column to `"CaseID"`

`cases = pd.read_csv("311_Cases2019.csv", index_col="CaseID")`

### Selecting Data by Column

From `cases` select all rows from columns `"Opened"` through `"Source"` *column indicies 0-17*

`cases = cases.loc[:, "Opened":"Source"]`

or

`cases = cases.iloc[:, 0:18]`

### Sorting Data

Use `sort_values` to sort on a specific column. Use `axis=0`. Use `ascending=False` for reverse sorting

`cases = cases.sort_values("Request Type", axis=0, ascending=True)`

### Replacing NaN Values

Use the `fillna` function to replace NaN values with another value

`cases = cases.fillna("None")`

### Value Counts

Use `value_counts()` to find the 5 most common categories

`cat_counts = cases["Category"].value_counts()[:5]`

`nhood_counts = cases["Neighborhood"].value_counts()`

`nhood_counts`

### Grouping Data

Using the `groupby` function

`gc = cases.groupby("Category")`

`gc.get_group("Street and Sidewalk Cleaning").head()`

`gc = cases.groupby("Category").size()`

## Analyzing Case Data by Weekday

Copy `cases` into a new dataframe called `weekdays`. Use the `to_datetime` function to create an index from the `"Opened"` column.

`weekdays = cases[["Opened", "Category", "Neighborhood","Request Type"]].copy()`

`date_index = pd.DatetimeIndex(pd.to_datetime(weekdays["Opened"]))`

Create a new column for `Weekday` and add the value from `date_index` using the attribute `weekday_name`

`weekdays.loc[:,'Weekday'] = date_index.weekday_name`

`weekdays.loc[:,"Opened"] = date_index`

### Visualizing Data

Plot a bar graph of category counts

`cat_counts.plot(kind="bar")`

`plt.suptitle("SF 311 Cases (2019)")`

`plt.xlabel("Category")`

`plt.ylabel("Cases")`

### Plotting Data on a Map

Use Latitude and Longitude columns to plot locations on a map. View the latitude and longitude data using `describe()`. 

`cases[['Latitude', 'Longitude']].describe()`

Note that some min and max values are 0.

Create a subset that ignores bad values

`valid_latlon = cases[(cases.Latitude > 30) & (cases.Longitude < -120)]`

`valid_latlon[['Latitude', 'Longitude']].describe()`

Get all cases from the `"Noise Report"` Category:

`categories = valid_latlon.groupby("Category")`

`noise = categories.get_group("Noise Report")`

Plot the data on a map