# Creating Visuals With Seaborn and Matplotlib

In this notebook, we will create simple visuals with `seaborn` and `matplotlib`

In [None]:
#data analysis libraries
import pandas as pd
import numpy as np

#visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
#create a sample list of data points
sample_1 = np.array([x**3 for x in range(-10,10,1)])
print(sample_1)

## Line Plot

In [None]:
#create a line plot
plt.plot(sample_1)

#set title for the chart
plt.title('A Simple Line Plot')

## Box Plot

The image below explains the various parts of a boxplot
![image.png](attachment:image.png)

In [None]:
#create a box plot using the values in sample_1 as the y axis
sns.boxplot(y=sample_1)

In [None]:
#create a box plot using the values in sample_1 as the x axis
sns.boxplot(x=sample_1)

## Histogram

In [None]:
#create a random age distribution using numpy's random module
age_distribution = np.random.randint(60,size=(1,20))[0]
print(f'The list of rages in the range is: {age_distribution}')

In [None]:
#plot the list of ages
plt.hist(age_distribution)

#set title and labels for x and y axis
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')

## Pie Chart

In [None]:
#create sample data
categories = ['Fish','Meat','Eggs','Bread','Vegetables']
prices = [4000,8000,2000,1000,1500]

#plot the pie chart
plt.pie(x=prices,labels = categories)
plt.show()

In [None]:
#plot pie chart showing percentage contribution of each category
plt.pie(x=prices,labels = categories,autopct='%.2f')

#set chart title
plt.title('Pie Chart with Percentage Displayed')


plt.show()

## Bar Chart

Using the food prices data above, we'll create a bar chart

In [None]:
#create sample data
categories = ['Fish','Meat','Eggs','Bread','Vegetables']
prices = [4000,8000,2000,1000,1500]

#plot the pie chart
plt.bar(x=categories,height = prices)

#specify titles for x and y axis as well as the entire chart
plt.xlabel('Food Items')
plt.ylabel('Prices')
plt.title('Cost of Food Items')


plt.show()

# Let's create charts using data in a pandas dataframe

`seaborn` has sample data sets from which we will create a pandas dataframe for the next few steps in this notebook. We can access these datasets by using the `load_dataset` method in seaborn.

In [None]:
#create a dataframe from the taxis dataset in seaborn
df_taxis = sns.load_dataset('taxis')
df_taxis.head()

In [None]:
#create a histogram with the passengers column
plt.hist(df_taxis.passengers)

plt.title('Passenger distribution per trip')

plt.ylabel('Frequency')
plt.xlabel('Number of Passengers')

Most rides have only 1 passenger, however it is possible to have a ride with up to 6 passengers, or as little as no passenger.

Why would there be a ride with no passenger? Delivery? What other options are there?

In [None]:
#create a bar plot to compare how much is paid in total by each payment method

#one way is to first create a series that contains the total sum paid grouped by payment method
payment_sums = df_taxis.groupby('payment').total.sum()
print(payment_sums)

In [None]:
#plot the bar chart
plt.bar(x=payment_sums.index,height=payment_sums)

plt.title('Total Amount Paid Per Payment Method')
plt.ylabel('Total ($)')


In [None]:
#plot payment sums with a pie chart

plt.pie(x=payment_sums,labels=payment_sums.index,autopct='%.2f')

plt.title('Total Amount Paid Per Payment Method')

plt.show()

Credit cards make up 77.55% of all payments

In [None]:
#compare distance by fare paid
plt.scatter(x=df_taxis.distance, y=df_taxis.fare)

plt.title('Distance vs Fare')
plt.xlabel('Distance')
plt.ylabel('Fare')

It's safe to say that in general, longer distances attract higher fares.