# Composing Masterpieces

In your role as a data analyst for a video streaming service, you have been asked by your manager to compare and contrast the subscriber data for various international regions over the last several years. 

Use your knowledge of compose and overlay plots to create visualizations to gather the data requested by your manager. 

In [24]:
# Import the required libraries and dependencies
import pandas as pd
import hvplot.pandas
from pathlib import Path

## Step 1: Prepare the Subscriber Numbers DataFrame

In [25]:
# Read subscriber_numbers.csv file into a Pandas DataFrame
subscribers_df = pd.read_csv(Path('../Resources/subscriber_numbers.csv'))

# Review the DataFrame
subscribers_df.head()

Unnamed: 0,Area,Years,Subscribers
0,United States and Canada,Q1 - 2018,60909000
1,"Europe, Middle East and Africa",Q1 - 2018,29339000
2,Latin America,Q1 - 2018,21260000
3,Asia-Pacific,Q1 - 2018,7394000
4,United States and Canada,Q2 - 2018,61870000


## Step 2: Using both a compose plot and and overlay plot, compare the subscriber numbers for each region for the periods "Q1 - 2018", "Q1 - 2019", and "Q1 - 2020".

In [30]:
# Create a DataFrame that slices the subscriber data for Q1 - 2018.
q1_2018 = subscribers_df[subscribers_df['Years']=='Q1 - 2018']

# Set the index of the DataFrame to Area
q1_2018 = q1_2018.set_index('Area')

# Create a bar chart of the Q1 - 2018 data
q1_2018_plot = q1_2018.hvplot.bar(title='Subscriber distribution - Q1 2018',label='Q1 2018')

# Show the bar chart
q1_2018_plot

In [31]:
# Create a DataFrame that slices the subscriber data for Q1 - 2019.
q1_2019 = subscribers_df[subscribers_df['Years']=='Q1 - 2019']

# Set the index of the DataFrame to Area
q1_2019 = q1_2019.set_index('Area')

# Create a bar chart of the Q1 - 2019 data
q1_2019_plot = q1_2019.hvplot.bar(title='Subscriber distribution - Q1 2019',label='Q1 2019')

# Show the bar chart
q1_2019_plot

In [32]:
# Create a DataFrame that slices the subscriber data for Q1 - 2020.
q1_2020 = subscribers_df[subscribers_df['Years']=='Q1 - 2020']

# Set the index of the DataFrame to Area
q1_2020 = q1_2020.set_index('Area')

# Create a bar chart of the Q1 - 2020 data
q1_2020_plot = q1_2020.hvplot.bar(title='Subscriber distribution - Q1 2020',label='Q1 2020')

# Show the bar chart
q1_2020_plot

## Create a compose plot to visualize the Q1 data

In [33]:
# Create a compose plot to visualize the Q1 data side-by-side
q1_2018_plot + q1_2019_plot + q1_2020_plot

## Create an overlay plot for the Q1 data

> Hint: Does the order of the plots change the visualization?

In [34]:
# Create an overlay plot to visualize the Q1 data
q1_2020_plot * q1_2019_plot * q1_2018_plot

**Question:** How does the rate of growth compare across each region over the time periods being analyzed?

**Answer:** EMEA has the highest rate of growth over the periods observed, and while the US/Canada region has the most subscribers, it also has experienced the slowest growth rate

## Step 3: Using both a compose plot and and overlay plot, compare the time series trends in subscriber numbers for the two largest regions detailed in the dataset, the "United States and Canada" region and the "Europe, Middle East and Africa" region.

In [35]:
# Create a DataFrame that slices the subscriber data for the United States and Canada
us_canada_subscribers = subscribers_df[subscribers_df['Area']=='United States and Canada']

# Set the index of the DataFrame to Years
us_canada_subscribers = us_canada_subscribers.set_index('Years')

# Create a line plot of the US and Canada data
us_canada_plot = us_canada_subscribers.hvplot.bar(title='Subscriber distribution - US/CAN',label='US/CAN')

# Show the plot
us_canada_plot

In [36]:
# Create a DataFrame that slices the subscriber data for the Europe, Middle East and Africa area
emea_subscribers = subscribers_df[subscribers_df['Area']=='Europe, Middle East and Africa']

# Set the index of the DataFrame to Years
emea_subscribers = emea_subscribers.set_index('Years')

# Create a line plot of the Europe, Middle East and Africa data
emea_plot = emea_subscribers.hvplot.bar(title='Subscriber distribution - EMEA',label='EMEA')

# Show the plot
emea_plot

## Create a compose plot to visualize the subscriber time series data for the two regions

In [37]:
# Create a compose plot to visualize the subscriber time series data for the two regions
us_canada_plot + emea_plot

## Create an overlay plot for the subcriber time series data

In [38]:
# Create an overlay plot to visualize the subscriber time series data for the two regions
us_canada_plot * emea_plot

**Question:** How does the rate of growth over the time series compare across the two regions?

**Answer:** EMEA has a significantly higher growth rate of subscribers than the US/CAN region

## Step 4: Answer the following question:

**Question:** Given the information in these visualizations, toward what region would you recommend that advertising dollars be focused?

**Answer:** If the goal is to increase the number of subscribers, then I would recommend spending money in the US/CAN region to push the numbers up; otherwise if the goal is to capitalize on the subscribers in the fastest growing market (that will soon become the dominant one), then I suggest spending marketing funds in EMEA.