## Splash Basics

Scrapy alone doesn't let us scrape JavaScript-driven websites, so we have to add Splash to it. We will keep scraping websites using Python/Scrapy, but, in addition to that, we will build a small script on Splash to get the HTML code behind the JavaScript website we wish to scrape.

For this section we'll need `docker`.
Create docker account [here](https://hub.docker.com/)  
[Download Docker](https://www.docker.com/products/docker-desktop/)


Now install splash using this command `docker pull scrapinghub/splash`.  
Before executing this command open the Docker Desktop in your pc.
After that we have to execute another command which will help us to run splash on browser.  
`docker run -it -p 8050:8050 scrapinghub/splash`
And goto this address in your browser `http://localhost:8050/`

![splash](../images/splash.png)

Splash uses Lua programming language.

Splash code to get the image of the website.
```
function main(splash, args)
    url = args.url # url we want 
    splash:go(url)
    return splash:png()
end
```

```
function main(splash, args)
  assert(splash:go(args.url))
  assert(splash:wait(0.5))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end
```
assert function helps to get error message here.

## Splash in action 
Execute this [code](paste amazon lua) on splash and see the result.

Remember unlike selenium we can't use XPath on Splash. We can only use css selector.

## Scrape JavaScript website using Splash & Scrapy

We are using a new project to do this.  
`scrapy startproject splash_scrapy`  
We also need to install this library `pip install scrapy-splash`  
Then paste this inside the scrapy project setting.py  
```
SPLASH_URL = 'http://localhost:8050`

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
```

Then paste this [code](https://github.com/sahaavi/Web-Scraping/blob/main/Splash/splash_scrapy.lua) inside the [spider script](https://github.com/sahaavi/Web-Scraping/blob/main/Splash/splash_scrapy/splash_scrapy/spiders/adamchoi.py).

Now parse the [website](https://github.com/sahaavi/Web-Scraping/blob/main/Splash/splash_scrapy/splash_scrapy/spiders/adamchoi_parse.py)

Change user agent inside Splash [code](https://github.com/sahaavi/Web-Scraping/blob/main/Splash/splash_scrapy/splash_scrapy/spiders/splash_scrapy_change_user_agent.lua)