# XPaths and Selectors
Leverage XPath syntax to explore scrapy selectors. Both of these concepts will move you towards being able to scrape an HTML document.

# 1. XPath
## 1.1 Where am I?
In this exercise, you will navigate to a specific element using your new knowledge of XPath notation.

Consider the HTML code:

In [1]:
html = """
<html>
  <body>
    <div>
      <p>Good Luck!</p>
      <p>Not here...</p>
    </div>
    <div>
      <p>Where am I?</p>
    </div>
  </body>
</html>
"""

Your job will be to create an XPath string __using only single forward-slashes__ which navigates to the paragraph `p` element which contains the text "Where am I?".

### Instructions:
* __Using only single forward-slashes to move between generations__, assign a string to the variable `xpath` that directs to the paragraph element containing "Where am I?". Do not include any blank spaces in the string you assign to `xpath`.

In [2]:
import re
from scrapy.http import TextResponse

def space_remover( expr ):
    return re.sub( pattern = "[ ]+", repl = "", string = expr)

response = TextResponse( url = 'http://datacamp.com', body = html, encoding = 'utf-8' )

def check_xpath( xpath ):
    print( response.xpath( xpath ).extract() )
    
def check_xpath_1( xpath ):
    print( response.xpath( xpath ).extract_first() )

In [3]:
# Fill in the blank
xpath = '/html/body/div[2]/p[1]'

check_xpath_1(xpath)

<p>Where am I?</p>


## 1.2 It's Time to P
In the lecture, we learned how to use double forward-slashes to navigate to all future generations. In this exercise, you will select all paragraph `p` elements within the HTML. Because we want you to navigate to __all__ paragraph elements, it is not important that you know what the HTML code is, since the task can be accomplished with a simple XPath string using the __double forward-slash__ notation you have learned.

### Instructions:
* Using double forward-slash notation, assign to the variable `xpath` a simple XPath string navigating to __all__ paragraph `p` elements within any HTML code.

In [4]:
# Fill in the blank
xpath = '//p'

check_xpath(xpath)

['<p>Good Luck!</p>', '<p>Not here...</p>', '<p>Where am I?</p>']


## 1.3 Body Appendages
We have loaded the HTML from a secret website and have used it to create a function `how_many_elements()`. The way this function works is that you pass it an XPath string and it will print out the number of elements the XPath you wrote has selected. For example, by running the code `how_many_elements('//*')` in the console will print out the total number of elements the HTML document has (try it!).

Your job in this exercise is to create an XPath string which can be used to direct to all child elements the `body` (regardless of tag type). To note, you can first test your solution with `how_many_elements()` to find the total number of children in the body element if you wish.

Note that the exercises in this chapter may take some time to load.

### Instructions:
* Assign to the variable `xpath` an XPath string which directs to all child elements of the body element. There is only one body element in this HTML document and it is a child of the root `html` element.

In [5]:
# WE WANT TO USE THE SAME DATACAMP COURSE DIRECTORY PRE-SAVED HTML CODE HERE INSTEAD OF REQUESTS
from scrapy import Selector
import requests
html = requests.get( 'https://www.datacamp.com/courses/q:introduction' ).content

sel = Selector( text = html )

def how_many_elements( xpath ):
    print( len(sel.xpath( xpath )) )

In [6]:
# Create an XPath string to direct to children of body element
xpath = '/html/body/*'

# Print out the number of elements selected
how_many_elements( xpath )

25


You were able to direct to all children of the body element! 

## 1.4 Choose DataCamp!
In this exercise, we want to give you the opportunity to create your own XPath string to achieve a certain task; the task is to select the paragraph element containing the text "Choose DataCamp!".

Consider the following HTML:

<html>
  <body>
    <div>
      <p>Hello World!</p>
      <div>
        <p>Choose DataCamp!</p>
      </div>
    </div>
    <div>
      <p>Thanks for Watching!</p>
    </div>
  </body>
</html>
We have created the function print_element_text() for you, which will print the text contained in your element (if it contains any). Feel free to use this function to check if your solution is correct!

### Instructions:
* Assign to the variable `xpath` an XPath string to direct to the paragraph element containing the phrase: "Choose DataCamp!".

In [7]:
from scrapy import Selector
html = '''
<html>
  <body>
    <div>
      <p>Hello World!</p>
      <div>
        <p>Choose DataCamp!</p>
      </div>
    </div>
    <div>
      <p>Thanks for Watching!</p>
    </div>
  </body>
</html>
'''

sel = Selector( text = html )

def print_element_text( xpath ):
    text = ' '.join( sel.xpath( xpath ).xpath( './text()' ).extract() )
    print( text )

In [8]:
# Create an XPath string to the desired paragraph element
xpath = '/html/body/div/div/p'
# xpath = '/html/body/div[1]/div/p' # alternative code

# Print out the element text
print_element_text( xpath )

Choose DataCamp!
