# Python, Pandas, and Visualizing Data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Enable inline plotting
%matplotlib inline

## Reading in Data

We'll read in a dataset containing some information about cars.

It's in a CSV file.

In [None]:
Location = 'mtcars.csv'
df = pd.read_csv(Location)

We can use the head() function of the data frame to look at the first 5 records.

In [None]:
df.head()

Similarly, we can use the tail() function to look at the last 5 records.

In [None]:
df.tail()

## Changing Values

If we want to set a value in the data frame, we can use the _loc_ attribute:

In [None]:
df.loc[29, 'cyl'] = 4
df.loc[df.model == 'Lotus Europa', 'cyl'] = 12
df.tail()

If we want to add a new column, we can use _append_. 

In [None]:
hp_cyl_ratio = df['hp'] / df['cyl']

df.assign(ratio = hp_cyl_ratio)

# Visualizing Data

We can plot a histogram showing the distribution of mpg (miles per gallon, or fuel efficiency) for all cars.

In [None]:
plt.figure()
df['mpg'].plot.hist()
plt.xlabel('Miles per Gallon')
plt.show()

We can create a boxplot, showing the distribution of mpg based on number of cylinders.

In [None]:
plt.figure()
df.boxplot(column='mpg', by='cyl', grid=False)
plt.xlabel('Miles per Gallon, by Cylinder')
plt.suptitle("")
plt.title("Boxplot Example")
plt.show()

A barplot could also be used to compare those two variables for each car.

In [None]:
plt.figure()
df[['mpg', 'cyl']].plot(kind='bar')
plt.show()

A scatterplot is useful for seeing a relationship between two variables. 

In [None]:
plt.figure()
df.plot.scatter('hp', 'mpg')
plt.xlabel('horsepower')
plt.ylabel('miles per gallon')
plt.show()

# Grouping Data

We can group data in various ways. For example, we can group the cars by number of cylinders.

In [None]:
grouped = df.groupby('cyl')
grouped.mean()

In [None]:
plt.figure()
df.hist(column='wt', by='cyl')
plt.show()