## LSE Career Accelerator
# LSE Data Analysis Using Python

## LSE DA201: Week 4 (optional) challenge activity

In the previous challenge, you answered specific business problems related to the data sets, such as, 'What is the daily average price of gold and oil, and how do they compare?' This will assist investors to decide on the most stable entity to invest in. However, for this week, you will apply your Seaborn and Matplotlib knowledge to understand your data better and solve some specific problems for Investgenics. You will answer the following business questions:

- What is the distribution of the data?
- Which performed better in December 2015, gold or oil?
- Are there any outliers in the opening value of both gold and oil?
- What happened to gold and oil on the stock market during June 2016?

## Prepare your workstation

In [None]:
# Prepare your workstation.
# Import libararies.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Import CSV files.
oil = pd.read_csv('oil_price.csv')
gold = pd.read_csv('gold_stocks_price.csv')

# View the DataFrames.
print(oil.shape)
print(oil.dtypes)
print(oil.head())

print(gold.shape)
print(gold.dtypes)
print(gold.head())

In [None]:
# Subset gold DataFrame.
gold_plot = gold[['Date', 'Open', 'High', 'Low']]

# View gold_subset.
print(gold_plot.shape)
print(gold_plot.dtypes)
print(gold_plot.head())

In [None]:
# Select only relevant index to work with.
oil_plot = oil[['Date', 'Open', 'High', 'Low']]

# View DataFrames.
print(oil_plot.columns)
print(oil_plot.dtypes)
print(oil_plot.head())

In [None]:
# Import the DateTime module.
import datetime

# Change the Data column to date type. 
gold_plot['Date'] = pd.to_datetime(gold_plot['Date'])
oil_plot['Date'] = pd.to_datetime(oil_plot['Date'])

# Check data types of two DataFrames.
print(gold_plot.dtypes)
print(gold_plot.head())
print(oil_plot.dtypes)
print(oil_plot.head())

# 

# Question 1: Q1. Are there any outliers in the Gold and Oil dataframes?

In [None]:
# Plot pairplot for gold subset with KDE.
sns.pairplot(gold_plot, diag_kind='kde', height=2);

In [None]:
# Plot pairplot for oil subset with KDE.
sns.pairplot(oil_plot, diag_kind='kde', height=2);

# 

# Question 2: Which performed better in December 2015, gold or oil?

In [None]:
# Filter the two data sets between the specified dates, and save them as filtered_gold_df and filtered_oil_df. 
filtered_gold = gold_plot[(gold_plot['Date'] >'2015-12-01') & (gold_plot['Date'] < '2015-12-31')]
filtered_oil = oil_plot[(oil_plot['Date'] >'2015-12-01') & (oil_plot['Date'] < '2015-12-31')]

print(filtered_gold.head())
print(filtered_oil.head())

In [None]:
# Plots for gold.
# Specify plot size.
plt.figure(figsize=(20, 6))

# Create barplot.
sns.barplot(x='High', y='Date', data=filtered_gold)

In [None]:
# Plots for oil.
# Specify plot size.
plt.figure(figsize=(20, 6))

# Create barplot.
sns.barplot(x='High', y='Date', data=filtered_oil)

# 

# Question 3: Are there any outliers in the opening value of both gold and oil?

In [None]:
# Plot gold.
# Fig size modification. 
plt.figure(figsize=(8, 6))
plt.title("Opening Value: Gold")

sns.boxplot(x=gold_plot['Open']);

In [None]:
# Plot oil.
# Fig size modification. 
plt.figure(figsize=(8, 6))
plt.title("Opening Value: Oil")

sns.boxplot(x=oil_plot['Open'])

In [None]:
# Plot gold.
# Fig size modification. 
plt.figure(figsize=(8, 6))
plt.title("Opening Value: Gold")

sns.histplot(x=gold_plot['Open'], bins=20)

In [None]:
# Plot oil.
# Fig size modification. 
plt.figure(figsize=(8, 6))
plt.title("Opening Value: Oil")

sns.histplot(x=oil_plot['Open'], bins=20)

# 

# Question 4: What happened to gold and oil on the stock market during June 2016?

In [None]:
# Plot a lineplot for High and Low for both the oil and gold subsets for the month of June 2016.
plt.figure(figsize = (12, 6))

filtered_gold = gold_plot[(gold_plot['Date'] >'2016-06-01') & (gold_plot['Date'] < '2016-06-30')]

sns.lineplot(data=filtered_gold, x='High', y='Date');

In [None]:
plt.figure(figsize = (12,6))
filtered_oil = oil_plot[(oil_plot['Date'] >'2016-06-01') & (oil_plot['Date'] < '2016-06-30')]

sns.lineplot(data=filtered_oil, x='High', y='Date')

In [None]:
# Customise plots.
filtered_gold = gold_plot[(gold_plot['Date'] >'2016-06-01') & (gold_plot['Date'] < '2016-06-30')]

g = sns.relplot(data=filtered_gold, x='High', y='Date')

g.set_axis_labels("High", "Date")
g.fig.suptitle("High Value: Gold (June 2016)", y=1.02, fontsize=16);

In [None]:
filtered_oil = oil_plot[(oil_plot['Date'] >'2016-06-01') & (oil_plot['Date'] < '2016-06-30')]

g = sns.relplot(data=filtered_oil, x='High', y='Date')

g.set_axis_labels("High", "Date")
g.fig.suptitle("High Value: Oil (June 2016)", y=1.02, fontsize=16);