# Scrapy CSS vs XPath Selectors Cheatsheet

## Basic Scrapy Usage

```python
# CSS
response.css('selector').get()           # First match
response.css('selector').getall()        # All matches
response.css('selector::text').get()     # Text content
response.css('selector::attr(href)').get()  # Attribute value

# XPath
response.xpath('//xpath').get()          # First match
response.xpath('//xpath').getall()       # All matches
response.xpath('//xpath/text()').get()   # Text content
response.xpath('//xpath/@href').get()    # Attribute value
```

## Element Selection

| Task | CSS | XPath |
|------|-----|-------|
| **Select by tag** | `div` | `//div` |
| **Select by class** | `.classname` | `//*[@class='classname']` |
| **Select by ID** | `#myid` | `//*[@id='myid']` |
| **Select specific tag + class** | `div.content` | `//div[@class='content']` |
| **Select specific tag + ID** | `div#header` | `//div[@id='header']` |

## Attribute Selection

| Task | CSS | XPath |
|------|-----|-------|
| **Has attribute** | `[href]` | `//*[@href]` |
| **Exact attribute value** | `[href="url"]` | `//*[@href='url']` |
| **Attribute contains** | `[href*="partial"]` | `//*[contains(@href, 'partial')]` |
| **Attribute starts with** | `[href^="start"]` | `//*[starts-with(@href, 'start')]` |
| **Attribute ends with** | `[href$="end"]` | `//*[substring(@href, string-length(@href) - string-length('end') + 1) = 'end']` |
| **Multiple attributes** | `[class="x"][id="y"]` | `//*[@class='x' and @id='y']` |

## Hierarchy & Relationships

| Task | CSS | XPath |
|------|-----|-------|
| **Any descendant** | `div p` | `//div//p` |
| **Direct child** | `div > p` | `//div/p` |
| **Next sibling** | `h1 + p` | `//h1/following-sibling::p[1]` |
| **All following siblings** | `h1 ~ p` | `//h1/following-sibling::p` |
| **Parent** | *Not possible* | `//p/parent::div` or `//p/..` |
| **Ancestor** | *Not possible* | `//p/ancestor::section` |

## Position-Based Selection

| Task | CSS | XPath |
|------|-----|-------|
| **First child** | `:first-child` | `*[1]` |
| **Last child** | `:last-child` | `*[last()]` |
| **Nth child** | `:nth-child(3)` | `*[3]` |
| **Odd children** | `:nth-child(odd)` | `*[position() mod 2 = 1]` |
| **Even children** | `:nth-child(even)` | `*[position() mod 2 = 0]` |
| **First of type** | `:first-of-type` | `tagname[1]` |
| **Last of type** | `:last-of-type` | `tagname[last()]` |

## Text Content

| Task | CSS | XPath |
|------|-----|-------|
| **Get text** | `::text` | `/text()` |
| **Get all text (including nested)** | *Not possible* | `//text()` |
| **Text contains** | *Not possible* | `[contains(text(), 'word')]` |
| **Exact text match** | *Not possible* | `[text()='exact']` |
| **Text starts with** | *Not possible* | `[starts-with(text(), 'start')]` |
| **Normalize whitespace** | *Not possible* | `[normalize-space(text())='clean']` |

## Multiple Conditions

| Task | CSS | XPath |
|------|-----|-------|
| **AND conditions** | `div.class1.class2` | `//div[@class1 and @class2]` |
| **OR conditions** | `div.class1, div.class2` | `//div[@class='class1' or @class='class2']` |
| **NOT condition** | `:not(.classname)` | `//div[not(@class='classname')]` |

## Common Scrapy Examples

### Product Listings
```python
# CSS
titles = response.css('h2.product-title::text').getall()
prices = response.css('.price::text').getall()
links = response.css('a.product-link::attr(href)').getall()

# XPath
titles = response.xpath('//h2[@class="product-title"]/text()').getall()
prices = response.xpath('//*[@class="price"]/text()').getall()
links = response.xpath('//a[@class="product-link"]/@href').getall()
```

### Table Data
```python
# CSS - Get 2nd column data
data = response.css('table tr td:nth-child(2)::text').getall()

# XPath - Get 2nd column data
data = response.xpath('//table//tr/td[2]/text()').getall()
```

### Nested Content
```python
# CSS - Text from nested elements
text = response.css('div.content p::text').getall()

# XPath - All text from nested elements
text = response.xpath('//div[@class="content"]//text()').getall()
```

## Advanced Patterns

### Chaining Selectors
```python
# CSS
products = response.css('div.product')
for product in products:
    name = product.css('h3::text').get()
    price = product.css('.price::text').get()

# XPath
products = response.xpath('//div[@class="product"]')
for product in products:
    name = product.xpath('.//h3/text()').get()
    price = product.xpath('.//*[@class="price"]/text()').get()
```

### Complex Selections
```python
# CSS - Limited complex selection
items = response.css('div.item:not(.sold-out) .title::text').getall()

# XPath - Powerful complex selection
items = response.xpath('//div[@class="item" and not(@class="sold-out")]//span[@class="title"]/text()').getall()
```

### Fallback Selection
```python
# CSS with fallback
title = response.css('h1::text').get() or response.css('h2::text').get()

# XPath with fallback in single expression
title = response.xpath('(//h1/text() | //h2/text())[1]').get()
```

## Quick Reference Card

### CSS Selectors
```css
tag                    # Element
.class                 # Class
#id                    # ID
[attr]                 # Has attribute
[attr="val"]           # Attribute value
parent > child         # Direct child
parent child           # Descendant
::text                 # Text content
::attr(name)           # Attribute value
:first-child           # First child
:nth-child(n)          # Nth child
:not(selector)         # Not matching
```

### XPath Expressions
```xpath
//tag                  # Any element
//*[@class="name"]     # Any with class
//*[@id="name"]        # Any with ID
//tag[@attr]           # Has attribute
//tag[@attr="val"]     # Attribute value
//parent/child         # Direct child
//parent//child        # Descendant
/text()                # Text content
/@attribute            # Attribute value
/*[1]                  # First child
/*[position()=n]       # Nth child
//tag[not(@class)]     # Not matching
contains(@attr,"val")  # Attribute contains
starts-with(text(),"x") # Text starts with
```

## When to Use Which?

**Use CSS when:**
- Simple element/class/ID selection
- Working with modern web pages
- Need better performance
- Familiar with CSS syntax

**Use XPath when:**
- Need parent/ancestor selection
- Complex text filtering required
- Working with XML/older HTML
- Need powerful conditional logic