First, go into your terminal and if you haven’t installed Scrapy then install it using:

````
pip install scrapy
````

Now from your terminal go into the directory where you want to start your project and run:

````
scrapy startproject AmazonScrap
````

And now create an “amazon_scraping.py” file in spiders directory and start coding

In [None]:
import scrapy
from scrapy import Request
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

Create a python class defining all the variables that we want to scrape

In [None]:
class MyItem(scrapy.Item):
    names = scrapy.Field()
    reviewerLink = scrapy.Field()
    reviewTitles = scrapy.Field()
    reviewBody = scrapy.Field()
    verifiedPurchase = scrapy.Field()
    postDate = scrapy.Field()
    starRating = scrapy.Field()
    helpful = scrapy.Field()
    nextPage = scrapy.Field(default = 'null')

Create the main class on which Scrapy will come to scrape the data

In [None]:
class ReviewspiderSpider(scrapy.Spider):
    name = 'reviewspider'
    allowed_domains = ["amazon.com"]
    start_urls = ["<Any Product URL u wanna scrape>"]

In the same class define a function that will be used to scrape the link you mentioned above to get the link of “all reviews tag” on the Amazon page

In [None]:
def parse(self, response):
  # This will get the link for the all reviews tag on amazon page.
  all_reviews = response.xpath('//div[@data-hook="reviews-medley-footer"]//a[@data-hook="see-all-reviews-link-foot"]/@href').extract_first()
  # This will tell scrapy to move to all reviews page for further scraping.
  yield response.follow("https://www.amazon.com"+all_reviews, callback=self.parse_page)

Now Scrapy is on the “all reviews page” of amazon, so now we will write a function that will scrape that page for all the above-mentioned items and store it in a JSON file

In [None]:
def parse_page(self, response):
  
  # Scraping all the items for all the reviewers mentioned on that Page
  
  names=response.xpath('//div[@data-hook="review"]//span[@class="a-profile-name"]/text()').extract()
  reviewerLink=response.xpath('//div[@data-hook="review"]//a[@class="a-profile"]/@href').extract()
  reviewTitles=response.xpath('//a[@data-hook="review-title"]/span/text()').extract()
  reviewBody=response.xpath('//span[@data-hook="review-body"]/span').xpath('normalize-space()').getall()
  verifiedPurchase=response.xpath('//span[@data-hook="avp-badge"]/text()').extract()
  postDate=response.xpath('//span[@data-hook="review-date"]/text()').extract()
  starRating=response.xpath('//i[@data-hook="review-star-rating"]/span[@class="a-icon-alt"]/text()').extract()
  helpful = response.xpath('//span[@class="cr-vote"]//span[@data-hook="helpful-vote-statement"]/text()').extract()
  
  # Extracting details of each reviewer and storing it in in the MyItem object items and then appending it to the JSON file.
  
  for (name, reviewLink, title, Review, Verified, date, rating, helpful_count) in zip(names, reviewerLink, reviewTitles, reviewBody, verifiedPurchase, postDate, starRating, helpful):
      
      # Getting the Next Page URL for futher scraping.
      next_urls = response.css('.a-last > a::attr(href)').extract_first()
      
      yield MyItem(names=name, reviewerLink = reviewLink, reviewTitles=title, reviewBody=Review, verifiedPurchase=Verified, postDate=date, starRating=rating, helpful=helpful_count, nextPage=next_urls)


We have got all the items and have been appended to JSON file and now its time to tell Scrapy to go to the next page and repeat the above process.

In [None]:
# This will get the next psge URL
next_page = response.css('.a-last > a::attr(href)').extract_first()
  # Checking if next page is not none then loop back in the same function with the next page URL.
if next_page is not None:
  yield response.follow("https://www.amazon.com"+next_page, callback=self.parse_page)

### It's time to run the Code

Go into the AmazonScrap directory and run the following command in the terminal

````
scrapy crawl reviewspider -t json -o outputfile.json
````

After this, you should see a file name ````<outputfile.json>```` created in the AmazonScrap folder having all the scraped data

You may be getting 503 Service Unavailable in the terminal, this is because we may be putting to much load on the servers, to resolve that go to AmazonScrap/settings.py and add the following code and then try running

````
DOWNLOAD_TIMEOUT = 540
DOWNLOAD_DELAY = 5
DEPTH_LIMIT = 10
EXTENSIONS = {
    'scrapy.extensions.telnet.TelnetConsole': None,
    'scrapy.extensions.closespider.CloseSpider': 1
}
````