![gym](gym.png)


You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

### The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products. 

### workout.csv

| Column     | Description              |
|------------|--------------------------|
| `'month'` | Month when the data was measured. |
| `'workout_worldwide'` | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |

### three_keywords.csv

| Column     | Description              |
|------------|--------------------------|
| `'month'` | Month when the data was measured. |
| `'home_workout_worldwide'` | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
| `'gym_workout_worldwide'` | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
| `'home_gym_worldwide'` | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |

### workout_geo.csv

| Column     | Description              |
|------------|--------------------------|
| `'country'` | Country where the data was measured. |
| `'workout_2018_2023'` | Index representing the popularity of the keyword 'workout' during the 5 year period. |

### three_keywords_geo.csv

| Column     | Description              |
|------------|--------------------------|
| `'country'` | Country where the data was measured. |
| `'home_workout_2018_2023'` | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
| `'gym_workout_2018_2023'` | Index representing the popularity of the keyword 'gym workout' during the 5 year period.  |
| `'home_gym_2018_2023'` | Index representing the popularity of the keyword 'home gym' during the 5 year period. |

In [27]:
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

In [28]:
# Load the CSV files (from data/)
workout = pd.read_csv("data/workout.csv")                 # monthly interest for "workout" worldwide
workout_geo = pd.read_csv("data/workout_geo.csv")         # country interest for "workout"
three_kw = pd.read_csv("data/three_keywords.csv")         # monthly interest for 3 keywords
three_kw_geo = pd.read_csv("data/three_keywords_geo.csv") # country interest for the 3 keywords


In [29]:
# Make sure numeric columns are numeric (safe for NaNs or stray text)
workout["workout_worldwide"] = pd.to_numeric(workout["workout_worldwide"], errors="coerce")
for col in ["home_workout_worldwide", "gym_workout_worldwide", "home_gym_worldwide"]:
    three_kw[col] = pd.to_numeric(three_kw[col], errors="coerce")
workout_geo["workout_2018_2023"] = pd.to_numeric(workout_geo["workout_2018_2023"], errors="coerce")
three_kw_geo["home_workout_2018_2023"] = pd.to_numeric(three_kw_geo["home_workout_2018_2023"], errors="coerce")

# Parse month columns as real dates
workout["month"] = pd.to_datetime(workout["month"], format="%Y-%m", errors="coerce")
three_kw["month"] = pd.to_datetime(three_kw["month"], format="%Y-%m", errors="coerce")

# Clean country names (avoid issues with extra spaces)
workout_geo["country"] = workout_geo["country"].astype(str).str.strip()
three_kw_geo["Country"] = three_kw_geo["Country"].astype(str).str.strip()


In [30]:
# 1) When was global search for "workout" at its peak?
#    Save year as a string "yyyy" -> year_str

peak_row = workout.loc[workout["workout_worldwide"].idxmax()]
year_str = str(peak_row["month"].year)
print("year_str:", year_str)

year_str: 2020


In [31]:
# 2) Most popular keyword during COVID and "now"
#    COVID window: Mar 2020 to Dec 2021

keyword_cols = ["home_workout_worldwide", "gym_workout_worldwide", "home_gym_worldwide"]
covid_mask = (three_kw["month"] >= "2020-03-01") & (three_kw["month"] <= "2021-12-31")

# During COVID
covid_avgs = three_kw.loc[covid_mask, keyword_cols].mean(numeric_only=True)
peak_covid = covid_avgs.idxmax().replace("_worldwide", "")  # "home_workout", "gym_workout", or "home_gym"
print("peak_covid:", peak_covid)

# "Now"
latest_idx = three_kw["month"].idxmax()
latest_vals = three_kw.loc[latest_idx, keyword_cols].astype(float)
current = latest_vals.idxmax().replace("_worldwide", "")
print("current:", current)

peak_covid: home_workout
current: gym_workout


In [32]:
# 3) Highest interest for workouts among US, Australia, Japan

subset_countries = ["United States", "Australia", "Japan"]
subset = workout_geo[workout_geo["country"].isin(subset_countries)].dropna(subset=["workout_2018_2023"])
top_country = subset.loc[subset["workout_2018_2023"].idxmax(), "country"] if not subset.empty else None
print("top_country:", top_country)


top_country: United States


In [33]:
# 4) Higher interest in "home workout": Philippines vs Malaysia

ph_my = three_kw_geo[three_kw_geo["Country"].isin(["Philippines", "Malaysia"])][
    ["Country", "home_workout_2018_2023"]
].dropna()
home_workout_geo = ph_my.loc[ph_my["home_workout_2018_2023"].idxmax(), "Country"] if not ph_my.empty else None

print("home_workout_geo:", home_workout_geo)


home_workout_geo: Philippines
