# Ceneo Scraper

## Struktura pojedynczej opinii

|Składowa|Selektor|Zmienna|
|--------|--------|-------|
|id opinii|["data-entry-id"]|opinion_id|
|autor|span.user-post__author-name|author|
|rekomendacja|span.user-post__author-recomendation|recommendation|
|gwiazdki|span.user-post__score-count|stars|
|treść|div.user-post__text|content|
|lista zalet|div.review-feature__title--positives ~ div.review-feature__item|pros|
|lista wad|div.review-feature__title--negatives ~ div.review-feature__item|cons|
|dla ilu przydatna|button.vote-yes > span|helpful|
|dla ilu nieprzydatna|button.vote-no > span|unhelpful|
|data wystawienia|span.user-post__published > time:nth-child(1)["datetime"]|publish_date|
|data zakupu|span.user-post__published > time:nth-child(2)["datetime"]|purchase_date|

## Załadowanie bibliotek

In [53]:
import os
import json
import requests
from bs4 import BeautifulSoup

## Adres URL pierwszej strony z opiniami o produkcie 

In [54]:
product_id = '138331381'
url = f'https://www.ceneo.pl/{product_id}#tab=reviews'

# response.status_code


## Pobranie wszystkich opinii o produkcie

In [55]:
all_opinions = []
while(url):

    response = requests.get(url)
    page_dom = BeautifulSoup(response.text,'html.parser')
    opinions = page_dom.select("div.js_product-review")

    # all_opinions = []
    for opinion in opinions:
        single_opinion = {
            "opinion_id": opinion["data-entry-id"],
            "author" : opinion.select_one("span.user-post__author-name").text.strip(),
            "recommendation" : opinion.select_one('span.user-post__author-recomendation').text.strip(),
            "stars" : opinion.select_one("span.user-post__score-count").text.strip(),
            "content" : opinion.select_one("div.user-post__text").text.strip(),
            "pros" : [p.text.strip() for p in opinion.select("div.review-feature__title--positives ~ div.review-feature__item")],
            "cons" : [c.text.strip() for c in opinion.select("div.review-feature__title--negatives ~ div.review-feature__item")],
            "helpful" : opinion.select_one("button.vote-yes > span").text.strip(),
            "unhelpful" : opinion.select_one("button.vote-no > span").text.strip(),
            "publish_date" : opinion.select_one("span.user-post__published > time:nth-child(1)")["datetime"].strip(),
            "purchase_date" : opinion.select_one("span.user-post__published > time:nth-child(2)")["datetime"].strip()
        }
        all_opinions.append(single_opinion)
    try:
        url = 'https://www.ceneo.pl'+page_dom.select_one("a.pagination__next")["href"].strip()
    except TypeError: url = ''

    # print(next_page)
# opinion = opinions.pop(0)


## Zapis opinii o produkcie do pliku JSON

In [56]:
if not os.path.exists("opinions"):
    os.makedirs("opinions")
with open(f"opinions/{product_id}.json", 'w', encoding='utf-8') as jf:
    json.dump(all_opinions,jf,indent=4,ensure_ascii=False)