# Ceneo Scraper

## Struktura pojedynczej opinii w serwisie Ceneo.pl

|Składowa|Selektor|Zmienna|
|--------|--------|-------|
|id opinii|["data-entry-id"]|opinion_id|
|autor|span.user-post__author-name|author|
|rekomendacja|span.user-post__author-recomendation > em|recommendation|
|gwiazdki|span.user-post__score-count|rating|
|treść|div.user-post__text|content|
|lista zalet|div.review-feature__title--positives ~ div.review-feature__item|pros|
|lista wad|div.review-feature__title--negatives ~ div.review-feature__item|cons|
|dla ilu przydatna|span[id^="votes-yes"]|useful|
|dla ilu nieprzydatna|span[id^="votes-no"]|useless|
|data wystawienia|span.user-post__published > time:nth-child(1)["datetime"]|post_date|
|data zakupu|span.user-post__published > time:nth-child(2)["datetime"]|purchase_date|

## Biblioteki

In [None]:
import os
import json
import requests
from bs4 import BeautifulSoup

In [None]:
product_id = "114700014"
url = f"https://www.ceneo.pl/{product_id}/opinie-1"


In [None]:
all_opinions = []
while (url):
    response = requests.get(url)
    page_dom = BeautifulSoup(response.text, "html.parser")
    opinions = page_dom.select("div.js_product-review")
    for opinion in opinions:
        try:
            single_opinion = {
                "opinion_id": opinion["data-entry-id"],
                "author": opinion.select_one("span.user-post__author-name").get_text().strip(),
                "recommendation": opinion.select_one("span.user-post__author-recomendation > em").get_text().strip(),
                "rating": opinion.select_one("span.user-post__score-count").get_text().strip(),
                "content": opinion.select_one("div.user-post__text").get_text().strip(),
                "pros": [p.get_text().strip() for p in opinion.select("div.review-feature__title--positives ~ div.review-feature__item")],
                "cons": [c.get_text().strip() for c in opinion.select("div.review-feature__title--negatives ~ div.review-feature__item")],
                "useful": opinion.select_one("span[id^='votes-yes']").get_text().strip(),
                "useless": opinion.select_one("span[id^='votes-no']").get_text().strip(),
                "post_date": opinion.select_one("span.user-post__published > time:nth-child(1)")["datetime"].strip(),
                "purchase_date": opinion.select_one("span.user-post__published > time:nth-child(2)")["datetime"].strip()
            }
            all_opinions.append(single_opinion)
        except (AttributeError,TypeError):
            pass
    try:
        url = "https://www.ceneo.pl"+page_dom.select_one("a.pagination__next")["href"]
    except TypeError:
        url = None

In [None]:
if not os.path.exists("opinions"):
    os.mkdir("opinions")
with open(f"opinions/{product_id}.json", "w", encoding="UTF-8") as jf:
    json.dump(all_opinions, jf, indent=4, ensure_ascii=False)

In [None]:
len(all_opinions)