# 👗 FashionTrendScraper – Analyzing Fashion Trends via Web Scraping

This project uses Python and web scraping techniques to collect and analyze fashion product data from an e-commerce website. It extracts product names, prices, and ratings to uncover insights such as average pricing, rating distribution, and keyword trends in product titles.

## 🔧 Technologies Used
- Python
- BeautifulSoup (for parsing HTML)
- Requests (for HTTP requests)
- Pandas (for data handling)
- Matplotlib & Seaborn (for visualization)
- WordCloud (for keyword analysis)

## 🚀 How to Run
1. Clone this repository.
2. Install the required packages:
   ```bash
   pip install -r requirements.txt
   ```
3. Run the notebook:
   ```bash
   jupyter notebook FashionTrendScraper.ipynb
   ```

## 📌 Note
This project is for educational purposes. Please ensure compliance with the website's `robots.txt` and terms of service before scraping.

---


# 🧵 Fashion Trend Analysis via Web Scraping

**Objective:** Scrape product names, prices, and ratings from an e-commerce website and analyze fashion trends.

> ⚠️ This example uses `http://books.toscrape.com` for demonstration. Replace it with a real fashion website for production.

In [None]:
# 📦 Import Libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud

# Notebook settings
plt.style.use('seaborn')
%matplotlib inline

In [None]:
# 🌐 Scrape Product Data
base_url = "http://books.toscrape.com/catalogue/page-{}.html"
headers = {"User-Agent": "Mozilla/5.0"}

product_data = []

for page in range(1, 6):
    print(f"Scraping page {page}...")
    response = requests.get(base_url.format(page), headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")

    products = soup.find_all("article", class_="product_pod")

    for product in products:
        title = product.h3.a["title"]
        price = product.find("p", class_="price_color").text.replace("£", "").strip()
        rating_class = product.p["class"][1]

        rating_map = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}
        rating = rating_map.get(rating_class, None)

        product_data.append({
            "Product Name": title,
            "Price": float(price),
            "Rating": rating
        })

In [None]:
# 📊 Create DataFrame
df = pd.DataFrame(product_data)
df.head()

In [None]:
# 🧹 Basic Data Cleaning
print("Checking for missing values:")
print(df.isnull().sum())

df.dropna(subset=["Rating"], inplace=True)
df.reset_index(drop=True, inplace=True)

In [None]:
# 📈 Rating Distribution
plt.figure(figsize=(7, 4))
sns.countplot(x="Rating", data=df, palette="Set2")
plt.title("Distribution of Product Ratings")
plt.xlabel("Rating (1-5)")
plt.ylabel("Count")
plt.show()

In [None]:
# 💰 Average Price by Rating
plt.figure(figsize=(8, 5))
sns.barplot(x="Rating", y="Price", data=df, palette="coolwarm", ci=None)
plt.title("Average Price by Rating")
plt.xlabel("Rating")
plt.ylabel("Average Price (£)")
plt.show()

In [None]:
# 🔝 Top 10 Most Expensive Products
df.sort_values(by="Price", ascending=False).head(10)

In [None]:
# 🔤 Word Cloud for Product Titles
text = " ".join(df['Product Name'])
wordcloud = WordCloud(background_color='white', width=800, height=400, colormap='plasma').generate(text)

plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title("Common Words in Product Titles", fontsize=14)
plt.show()

In [None]:
# 💾 Export Data
df.to_csv("fashion_trend_data.csv", index=False)
print("Data exported to 'fashion_trend_data.csv'")

---

## 📈 Future Improvements
- Implement sentiment analysis on user reviews.
- Add category-wise trend comparison (e.g., shoes vs. tops).
- Visualize price changes over time using historical scraping.
- Automate the scraper with scheduling (e.g., using cron jobs or Airflow).
