In [47]:
# Start coding here... 
# Read in the data
info = pd.read_csv("info.csv")
finance = pd.read_csv("finance.csv")
reviews = pd.read_csv("reviews.csv")
traffic = pd.read_csv("traffic.csv")
brands = pd.read_csv("brands.csv")

# Merge the data
merged_df = info.merge(finance, on="product_id", how="outer")
merged_df = merged_df.merge(reviews, on="product_id", how="outer")
merged_df = merged_df.merge(traffic, on="product_id", how="outer")
merged_df = merged_df.merge(brands, on="product_id", how="outer")

# Drop null values
merged_df.dropna(inplace=True)

# Add price labels based on listing_price quartiles
merged_df["price_label"] = pd.qcut(merged_df["listing_price"], 4, labels=["Budget", "Average", "Expensive", "Elite"])

# Group by brand and price_label to get volume and mean revenue
adidas_vs_nike = merged_df.groupby(["brand", "price_label"]).agg({"price_label": "count", "revenue": "mean"})

# Upper description length limits
lengthes = [0, 99, 199, 299, 399, 499, 599, 699]

# Description length labels
labels = ["99", "199", "299", "399", "499", "599", "699"]

# Store the length of each description
merged_df["word_limit"] = merged_df["description"].str.len()

# Cut into bins
merged_df["word_limit"] = pd.cut(merged_df["word_limit"], bins=lengthes, labels=labels)

# Group by the bins
descriptions = merged_df.groupby("word_limit", as_index=False).agg({"rating": "mean", "reviews": "count"})

# Copy the DataFrame to avoid overwriting or filtering the original data
shoes = merged_df.copy(deep=True)

# List of footwear keywords
mylist = "shoe*|trainer*|foot*"

# Filter for footwear products
shoes = merged_df[merged_df["description"].str.contains(mylist)]

# Filter for clothing products
clothing = merged_df[~merged_df.isin(shoes["product_id"])]

# Remove null product_id values from clothing DataFrame
clothing.dropna(inplace=True)

# Create product_types DataFrame
product_types = pd.DataFrame({"clothing_products": len(clothing), 
                              "clothing_revenue": clothing["revenue"].median(), 
                              "footwear_products": len(shoes), 
                              "footwear_revenue": shoes["revenue"].median()}, 
                              index=[0])





In [48]:
# Share your results in this format
revenue_analysis = {"brand_analysis": adidas_vs_nike,
                    "description_analysis": descriptions,
                    "product_analysis": product_types}

# Call the answer!
print(revenue_analysis)

{'brand_analysis':                     price_label      revenue
brand  price_label                          
Adidas Budget               538  2050.966580
       Average              599  2982.297429
       Expensive            707  4599.578600
       Elite                533  8424.178574
Nike   Budget               321  1664.329595
       Average                8   675.592500
       Expensive             43   472.739070
       Elite                124  1418.420484, 'description_analysis':   word_limit    rating  reviews
0         99  1.866667        6
1        199  3.188937      461
2        299  3.287108     1660
3        399  3.313765      603
4        499  3.396460      113
5        599  3.120000       15
6        699  3.653333       15, 'product_analysis':    clothing_products  clothing_revenue  footwear_products  footwear_revenue
0                439            683.73               2434            3073.3}


![trainers in a store](trainers.jpg)

Sports clothing and athleisure attire is a huge industry, worth approximately [$193 billion in 2021](https://www.statista.com/statistics/254489/total-revenue-of-the-global-sports-apparel-market/) with a strong growth forecast over the next decade! 

In this notebook, you will undertake the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. You will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.  

You've been provided with five datasets to investigate:
* `info.csv`
* `finance.csv`
* `reviews.csv`
* `traffic.csv`
* `brands.csv`

The company has asked you to answer the following questions:

## What is the volume of products and average revenue for Adidas and Nike products based on price quartiles?

* Label products priced up to quartile one as `"Budget"`, quartile 2 as `"Average"`, quartile 3 as `"Expensive"`, and quartile 4 as `"Elite"`.
* Store as a `pandas` DataFrame called `adidas_vs_nike` containing the following columns: `"brand"`, `"price_label"`, `"count"`, and `"revenue"`.

## Do any differences exist between the word count of a product's description and its mean rating?

* Store the results as a `pandas` DataFrame called `description_lengths` containing the following columns: `"description_length"`, `"rating"`, `"reviews"`.

## How does the volume of products and median revenue vary between clothing and footwear?

* Store as a `pandas` DataFrame called `product_types` containing the following columns: `"clothing_products"`, `"clothing_revenue"`, `"footwear_products"`, `"footwear_revenue"`.

## Completing the project

* Create a dictionary called `revenue_analysis` containing the following key-value pairs:
	- `"brand_analysis"`: `adidas_vs_nike` DataFrame.
	- `"description_analysis"`: `description_lengths` DataFrame.
    - `"product_analysis"`: `product_types` DataFrame