<a href="https://colab.research.google.com/github/rafinika/ds-python-fundamental/blob/main/5-Data-Driven-Product-Management-Conducting-a-Market-Analysis/Data_Driven_Product_Management_Conducting_a_Market_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Data-Driven Product Management: Conducting a Market Analysis**
You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

### The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products.

### workout.csv

| Column     | Description              |
|------------|--------------------------|
| `'month'` | Month when the data was measured. |
| `'workout_worldwide'` | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |

### three_keywords.csv

| Column     | Description              |
|------------|--------------------------|
| `'month'` | Month when the data was measured. |
| `'home_workout_worldwide'` | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
| `'gym_workout_worldwide'` | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
| `'home_gym_worldwide'` | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |

### workout_geo.csv

| Column     | Description              |
|------------|--------------------------|
| `'country'` | Country where the data was measured. |
| `'workout_2018_2023'` | Index representing the popularity of the keyword 'workout' during the 5 year period. |

### three_keywords_geo.csv

| Column     | Description              |
|------------|--------------------------|
| `'country'` | Country where the data was measured. |
| `'home_workout_2018_2023'` | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
| `'gym_workout_2018_2023'` | Index representing the popularity of the keyword 'gym workout' during the 5 year period.  |
| `'home_gym_2018_2023'` | Index representing the popularity of the keyword 'home gym' during the 5 year period. |


## **Task**
Help the fitness studio explore interest in workouts at a global and national level.

1. When was the global search for 'workout' at its peak? Save the year of peak interest as a string named `year_str` in the format "yyyy".

2. Of the keywords available, what was the most popular during the covid pandemic, and what is the most popular now? Save your answers as variables called `peak_covid` and `current` respectively.

3. What country has the highest interest for workouts among the following: United States, Australia, or Japan? Save your answer as `top_country`.

4. You'd be interested in expanding your virtual home workouts offering to either the Philippines or Malaysia. Which of the two countries has the highest interest in home workouts? Identify the country and save it as `home_workout_geo`.

In [16]:
# Load necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Peak interest for workout
workout = pd.read_csv('/content/drive/MyDrive/colab_notebooks/Data-Driven_Product_Management:_Conducting_a_Market_Analysis/workout.csv')
max_interest = workout['workout_worldwide'].max()
year_str = pd.to_datetime(workout.loc[workout['workout_worldwide'] == max_interest, 'month'].values[0]).strftime('%Y')
print(f"Answer 1. Peak interest for 'workout' was in {year_str}")

# Popular keyword during the covid pandemic and nowdays
three_keywords = pd.read_csv('/content/drive/MyDrive/colab_notebooks/Data-Driven_Product_Management:_Conducting_a_Market_Analysis/three_keywords.csv')
three_keywords_melt = pd.melt(three_keywords,
                              id_vars=['month'],
                              value_vars=['home_workout_worldwide', 'gym_workout_worldwide', 'home_gym_worldwide'])
three_keywords_melt.columns = ['month', 'keyword', 'interest']
peak_covid = three_keywords_melt.loc[three_keywords_melt['interest'] == three_keywords_melt['interest'].max(), 'keyword'].values[0]
current = three_keywords_melt.sort_values(by=['month', 'interest'], ascending=[False, False]).groupby('month').first()['keyword'].values[0]

print(f"Answer 2a. Most popular keyword during the covid pandemic: {peak_covid}")
print(f"Answer 2b. Most popular keyword now: {current}")

# Highest interest for workout among: United States, Australia, Japan
workout_geo = pd.read_csv('/content/drive/MyDrive/colab_notebooks/Data-Driven_Product_Management:_Conducting_a_Market_Analysis/workout_geo.csv')
top_country = workout_geo.loc[workout_geo['workout_2018_2023'] == workout_geo['workout_2018_2023'].max(), 'country'].values[0]
print(f"Answer 3. Country with highest interest for workout: {top_country}")

# Highest interest for home workout between: Philippines and Malaysia
three_keywords_geo = pd.read_csv('/content/drive/MyDrive/colab_notebooks/Data-Driven_Product_Management:_Conducting_a_Market_Analysis/three_keywords_geo.csv', index_col=0)
countries_homeworkout = ['Philippines', 'Malaysia']
home_workout_geo_list = three_keywords_geo.loc[countries_homeworkout, ['home_workout_2018_2023']].sort_values(by='home_workout_2018_2023', ascending=False)
home_workout_geo = home_workout_geo_list.index[0]
print(f"Answer 4. Country with highest interest for home workout: {home_workout_geo}")

Answer 1. Peak interest for 'workout' was in 2020
Answer 2a. Most popular keyword during the covid pandemic: home_workout_worldwide
Answer 2b. Most popular keyword now: gym_workout_worldwide
Answer 3. Country with highest interest for workout: United States
Answer 4. Country with highest interest for home workout: Philippines
