# Ceneo Scraper

## Struktura pojedynczej opinii
|Składowa|Selektor|Zmienna|
|--------|--------|-------|
|id opinii|["data-entry-id"]|opinion_id|
|autor|.user-post__author-name|author|
|rekomendacja|.user-post__author-recomendation|recommendation|
|gwiazdki|.user-post__score-count|stars|
|treść|.user-post__text|content|
|lista zalet|.review-feature__title--positives ~ .review-feature__item|pros|
|lista wad|.review-feature__title--negatives ~ .review-feature__item|cons|
|dla ilu przydatna|.vote-yes > span|helpful|
|dla ilu nieprzydatna|.vote-no > span|unhelpful|
|data wystawienia|.user-post__published > time:nth-child(1)|publish_date|
|data zakupu|.user-post__published > time:nth-child(2)|purchase_date|

## Załadowanie bibliotek

In [81]:
import os
import json
import requests
from bs4 import BeautifulSoup

## Adres URL pierwszej strony z opiniami o produkcie

In [82]:
product_id = "138331381"
url = f"https://www.ceneo.pl/{product_id}#tab=reviews"

## Pobieranie wszystkich opinii z kodu HTML strony


In [83]:
all_opinions=[]
while(url):
    response = requests.get(url)
    page_dom = BeautifulSoup(response.text, "html.parser")
    opinions = page_dom.select("div.js_product-review")
    for opinion in opinions:
        single_opinion = {
        "opinion_id" : opinion["data-entry-id"],
        "author" : opinion.select_one(".user-post__author-name").text.strip(),
        "recommendation" : opinion.select_one(".user-post__author-recomendation").text.strip(),
        "stars" : opinion.select_one(".user-post__score-count").text.strip(),
        "content" : opinion.select_one(".user-post__text").text.strip(),
        "pros" : [p.text.strip() for p in opinion.select(".review-feature__title--positives ~ .review-feature__item")],
        "cons" : [c.text.strip() for c in opinion.select(".review-feature__title--negatives ~ .review-feature__item")],
        "helpful" : opinion.select_one(".vote-yes > span").text.strip(),
        "unhelpful" : opinion.select_one(".vote-no > span").text.strip(),
        "publish_date" : opinion.select_one(".user-post__published > time:nth-child(1)")["datetime"].strip(),
        "purchase_date" : opinion.select_one(".user-post__published > time:nth-child(2)")["datetime"].strip(),
        }
        all_opinions.append(single_opinion)
    try:
        url= "https://www.ceneo.pl/"+page_dom.select_one("a.pagination__next")["href"].strip()
    except TypeError: url= None

## Zapis opinii do pliku

In [84]:
if not os.path.exists("opinions"):
    os.makedirs("opinions")
with open(f"opinions/{product_id}.json","w", encoding="UTF-8") as jf:
    json.dump(all_opinions, jf, indent=4, ensure_ascii=False)