# 1. Creating a new Scrapy project


## Create scafolding for new project

```python

scrapy startproject flightRadar

# scrapy - key word for scrapy shell command (CLI - command line interface)
# startproject - function for creating new project
# flighRadar - name of project
```

Above command create scaffolding:

![Scaffolding](image.png)

---

*** Open Project in PyCharm ***

```Bash
alias pycharm="open -a /Applications/PyCharm*.app"
cd to-project
pycharm .

```

---
# 2. Writing a spider to crawl a site and extract data

Spiders are classes that you define and that Scrapy uses to scrape information from a website.

Create ***flights_spider.py*** inside ***spiders*** directory.

**Rules of spider**
1. **name:** identifies the Spider. It must be unique within a project, that is, you can’t set the same name for different Spiders.
2. **start_requests():** must return an iterable of Requests (you can return a list of requests or write a generator function) which the Spider will begin to crawl from. Subsequent requests will be generated successively from these initial requests.
3. **parse():** a method that will be called to handle the response downloaded for each of the requests made. The response parameter is an instance of TextResponse that holds the page content and has further helpful methods to handle it.


Example:

```python
import scrapy

class FlightsSpider(scrapy.Spider):
    name = "flightsSpider2"

    def start_requests(self):
        urls = [
            'https://www.ryanair.com/pl/pl/'
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        filename = './flight.html'
        with open(filename, 'a') as f:
            f.write(str(response.body))
        self.log('Saved file %s' % filename)


```

---
*** Run spider ***

```python
scrapy crawl flightsSpider

```

---
# 3. Using spider arguments

```python
scrapy crawl flightSpider -o fligh.json

```