# Challenge

You are working as a Data Analyst for a Video Game Developer, and the company is deciding on what game they should focus their efforts on. 

You are provided with a dataset of all video game sales and are tasked with answering the questions below: 

1. Which genre should we focus our next game on?
2. Which platform should we focus our next game on? 
3. Which platform and genre should we focus our next game on? 
4. Which year was the best year to have sold video games? 
5. What are some outliers of video games?


In [2]:
# !pip install plotly 

In [3]:
import pandas as pd
import plotly.express as px # import plotly express as px

In [4]:
# read in the dataset
df = pd.read_csv("https://intro-to-python-asdaf.s3.ap-southeast-2.amazonaws.com/vgsales.csv")

In [5]:
# show the df 
df.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


### Which genre should we focus our next game on? 

In [6]:
# create a bar plot of global sales by genre
df_grouped_genre = df.groupby(["Genre"]).sum().reset_index()
px.bar(df_grouped_genre.sort_values(["Global_Sales"], ascending=False), x="Genre", y="Global_Sales")

### Which platfrom should we focus our next game on? 

In [7]:
# create a bar plot of global sales by genre
df_grouped_platform = df.groupby(["Platform"]).sum().reset_index()
px.bar(df_grouped_platform.sort_values(["Global_Sales"], ascending=False), x="Platform", y="Global_Sales")

### Which platform and genre should we focus our next game on? 

In [8]:
# create a bar plot of global sales by genre and platform. 
# tip: when creating the bar plot, place genre into colors
df_grouped_genre_platform = df.groupby(["Genre", "Platform"]).sum().reset_index()
px.bar(df_grouped_genre_platform.sort_values(["Global_Sales"], ascending=False), x="Platform", y="Global_Sales", color="Genre")

### Which year was the best year to have sold video games? 

In [9]:
# create a line plot of global sales by year
# what year was the highest year?
df_grouped_year = df.groupby(["Year"]).sum().reset_index()
px.line(df_grouped_year, x="Year", y="Global_Sales")

In [14]:
# create a scatter plot to observe any outliers in the data
# refer to scatter plot examples here: https://plotly.com/python/line-and-scatter/
# tip: include the hover_data argument 
px.scatter(df, x="Year", y="Global_Sales", hover_data=["Name"], color="Platform")