## Counting Elements in the Wild

<p>Below, you are given a few options to consider regarding counting the number of elements which are selected. Choose the one which is <strong>incorrect</strong>!</p>

<ul>
<li>A single forward-slash only looks forward one generation, whereas a double forward-slash looks forward to all future generations.</li>
</ul>

## Body Appendages

In [None]:
# WE WANT TO USE THE SAME DATACAMP COURSE DIRECTORY PRE-SAVED HTML CODE HERE INSTEAD OF REQUESTS
from scrapy import Selector
import requests
html = requests.get( 'https://www.datacamp.com/courses/q:introduction' ).content

sel = Selector( text = html )


def how_many_elements( xpath ):
  print( len(sel.xpath( xpath )) )

In [None]:
# Create an XPath string to direct to children of body element
xpath = '/html/body/*'

# Print out the number of elements selected
how_many_elements( xpath )

<p>We have loaded the HTML from a secret website and have used it to create a function <code>how_many_elements()</code>. The way this function works is that you pass it an XPath string and it will print out the number of elements the XPath you wrote has selected. For example, by running the code <code>how_many_elements('//*')</code> in the console will print out the total number of elements the HTML document has (try it!). </p>
<p>Your job in this exercise is to create an XPath string which can be used to direct to all child elements the <code>body</code> (regardless of tag type). To note, you can first test your solution with <code>how_many_elements()</code> to find the total number of children in the body element if you wish.</p>
<p><strong>Note that the exercises in this chapter may take some time to load.</strong></p>

<ul>
<li>Assign to the variable <code>xpath</code> an XPath string which directs to all child elements of the body element. There is only one body element in this HTML document and it is a child of the root <code>html</code> element.</li>
</ul>

<ul>
<li>Remember that a single forward slash moves down one generation, and a wildcard ignores tag type.</li>
</ul>

## Choose DataCamp!

from scrapy import Selector

html = '''
<html>
  <body>
    <div>
      <p>Hello World!</p>
      <div>
        <p>Choose DataCamp!</p>
      </div>
    </div>
    <div>
      <p>Thanks for Watching!</p>
    </div>
  </body>
</html>
'''

sel = Selector( text = html )

def print_element_text( xpath ):
  text = ' '.join( sel.xpath( xpath ).xpath( './text()' ).extract() )
  print( text )

In [None]:
# Create an XPath string to the desired paragraph element
xpath = '/html/body/div[1]/div/p'

# Print out the element text
print_element_text( xpath )

<p>In this exercise, we want to give you the opportunity to create your own XPath string to achieve a certain task; the task is to select the paragraph element containing the text "Choose DataCamp!". </p>
<p>Consider the following HTML:</p>
<pre><code class="html language-html">&lt;html&gt;
  &lt;body&gt;
    &lt;div&gt;
      &lt;p&gt;Hello World!&lt;/p&gt;
      &lt;div&gt;
        &lt;p&gt;Choose DataCamp!&lt;/p&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div&gt;
      &lt;p&gt;Thanks for Watching!&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
<p>We have created the function <code>print_element_text()</code> for you, which will print the text contained in your element (if it contains any). Feel free to use this function to check if your solution is correct!</p>

<ul>
<li>Assign to the variable <code>xpath</code> an XPath string to direct to the paragraph element containing the phrase: "Choose DataCamp!".</li>
</ul>

<ul>
<li>Pay attention to the fact that there is one <code>div</code> element nested within another!</li>
</ul>

## Where it's @

from scrapy import Selector

html = '''
<html>
  <body>
    <div id="div1" class="class-1">
      <p class="class-1 class-2">Hello World!</p>
      <div id="div2">
        <p id="p2" class="class-2">Choose DataCamp!</p>
      </div>
    </div>
    <div id="div3" class="class-2">
      <p class="class-2">Thanks for Watching!</p>
    </div>
  </body>
</html>
'''

sel = Selector( text = html )

def print_element_text( xpath ):
  text = ' '.join( sel.xpath( xpath ).xpath( './text()' ).extract() )
  print( text )

In [None]:
# Create an Xpath string to select desired p element
xpath = '//*[@id="div3"]/p'

# Print out selection text
print_element_text( xpath )

<p>In this exercise, you'll begin to write an XPath string using attributes to achieve a certain task; that task is to select the paragraph element containing the text "Thanks for Watching!". We've already created most of the XPath string for you.</p>
<p>Consider the following HTML:</p>
<pre><code class="html language-html">&lt;html&gt;
  &lt;body&gt;
    &lt;div id="div1" class="class-1"&gt;
      &lt;p class="class-1 class-2"&gt;Hello World!&lt;/p&gt;
      &lt;div id="div2"&gt;
        &lt;p id="p2" class="class-2"&gt;Choose DataCamp!&lt;/p&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div id="div3" class="class-2"&gt;
      &lt;p class="class-2"&gt;Thanks for Watching!&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
<p>We have created the function <code>print_element_text()</code> for you, which will print any text contained in your element.</p>

<ul>
<li>Fill in the blanks in the XPath string to select the paragraph element containing the phrase: "Thanks for Watching!".</li>
</ul>

<ul>
<li>The blank in the XPath string you are given should identify the element (by <code>id</code> attribute) whose child has the paragraph element you are begin asked to direct to.</li>
</ul>

## Check your Class

from scrapy import Selector

html = '''
<html>
  <body>
    <div id="div1" class="class-1">
      <p class="class-1 class-2">Hello World!</p>
      <div id="div2">
        <p id="p2" class="class-2">Choose DataCamp!</p>
      </div>
    </div>
    <div id="div3" class="class-2">
      <p class="class-2">Thanks for Watching!</p>
    </div>
  </body>
</html>
'''

sel = Selector( text = html )

def print_element_text( xpath ):
  text = ' '.join( sel.xpath( xpath ).xpath( './text()' ).extract() )
  print( text )

In [None]:
# Create an XPath string to select p element by class
xpath = '//p[@class="class-1 class-2"]'

# Print out select text
print_element_text( xpath )

<p>This exercise is to emphasize that when you use an XPath to select an element by its class attribute without using the <code>contains()</code> function, you match the class exactly. Your job is to fill in the blank below and finish the variable <code>xpath</code> directing to the specified element.</p>
<p>Consider the following HTML:</p>
<pre><code class="html language-html">&lt;html&gt;
  &lt;body&gt;
    &lt;div id="div1" class="class-1"&gt;
      &lt;p class="class-1 class-2"&gt;Hello World!&lt;/p&gt;
      &lt;div id="div2"&gt;
        &lt;p id="p2" class="class-2"&gt;Choose DataCamp!&lt;/p&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div id="div3" class="class-2"&gt;
      &lt;p class="class-2"&gt;Thanks for Watching!&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<ul>
<li>Fill in the blanks in the xpath below to select the paragraph element containing the phrase: "Hello World!".</li>
</ul>

<ul>
<li>Be careful to use the double-quotation marks within the XPath (i.e., it has the form <code>xpath = '//p[@class="____"]'</code>.</li>
<li>Make sure to match the class attribute value exactly, including any spaces.</li>
</ul>

## Hyper(link) Active

from scrapy import Selector

html = '''
<html>
  <body>
    <div id="div1" class="class-1">
      <p class="class-1 class-2">Hello World!</p>
      <div id="div2">
        <p id="p2" class="class-2">Choose 
            <a href="http://datacamp.com">DataCamp!</a>!
        </p>
      </div>
    </div>
    <div id="div3" class="class-2">
      <p class="class-2">Thanks for Watching!</p>
    </div>
  </body>
</html>
'''

sel = Selector( text = html )

def print_attribute( xpath ):
  print( "You have selected:" )
  for i,el in enumerate(sel.xpath( xpath ).extract()):
  	print( "%d) %s" % (i+1, el) )

In [None]:
# Create an xpath to the href attribute
xpath = '//p[@id="p2"]/a/@href'

# Print out the selection(s); there should be only one
print_attribute( xpath )

<p>One of the most important attributes to extract for "web-crawling" is the hyperlink url (<code>href</code> attribute) within an <code>a</code> tag. Here, you will extract such a hyperlink! We have created the function <code>print_attribute</code> to print out the data extracted from your XPath, so you can test your XPath strings in the console, if you like.</p>
<p>The exercise refers to the following HTML source code:</p>
<pre><code class="html language-html">&lt;html&gt;
  &lt;body&gt;
    &lt;div id="div1" class="class-1"&gt;
      &lt;p class="class-1 class-2"&gt;Hello World!&lt;/p&gt;
      &lt;div id="div2"&gt;
        &lt;p id="p2" class="class-2"&gt;Choose 
            &lt;a href="http://datacamp.com"&gt;DataCamp!&lt;/a&gt;!
        &lt;/p&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div id="div3" class="class-2"&gt;
      &lt;p class="class-2"&gt;Thanks for Watching!&lt;/p&gt;
    &lt;/div&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<ul>
<li>Fill in the blanks to complete the variable <code>xpath</code> below to select the <code>href</code> attribute value from the DataCamp hyperlink.</li>
</ul>

<ul>
<li>Remember to use the double-quotation marks within the XPath attribute value. </li>
<li>Use the <code>@</code> symbol to refer to attributes.</li>
</ul>

## Secret Links

In [None]:
# WE WANT TO USE THE SAME DATACAMP COURSE DIRECTORY PRE-SAVED HTML CODE HERE INSTEAD OF REQUESTS
from scrapy import Selector
import requests
html = requests.get( 'https://www.datacamp.com/courses/q:introduction' ).content

sel = Selector( text = html )

def how_many_elements( xpath ):
  print( "You've selected %d elements" % len(sel.xpath( xpath )) )
  
def preview( xpath ):
  els = sel.xpath( xpath ).extract()
  n = len(els)
  for i,el in enumerate( els[:min(4,n)]):
    print( "Element %d: %s" % (i+1,el) )

In [None]:
# Create an xpath to the href attributes
xpath = '//a[contains(@class,"course-block")]/@href'

# Print out how many elements are selected
how_many_elements( xpath )
# Preview the selected elements
preview( xpath )

<p>We have loaded the HTML from a secret website and have used it to create the functions <code>how_many_elements()</code> and <code>preview()</code>. The function <code>how_many_elements()</code> allows you to pass in an XPath string and it will print out the number of elements the XPath you wrote has selected. The function <code>preview()</code> allows you to pass in an XPath string and it will print out the first few elements you've selected. </p>
<p>Your job in this exercise is to create an XPath which directs to all <code>href</code> attribute values of the hyperlink <code>a</code> elements whose class attributes contain the string <code>"course-block"</code>. If you do it correctly, you should find that you have selected 40 elements with your XPath string and that it previews links (with some repetition).</p>

<ul>
<li>Fill in the blanks below to assign an XPath string to the variable <code>xpath</code> which directs to all <code>href</code> attribute values of the hyperlink <code>a</code> elements whose class attributes contain the string <code>"course-block"</code>. Remember that we use the <code>contains</code> call within the XPath string to check if an attribute value contains a particular string.</li>
</ul>

<ul>
<li>Remember that the format for <code>contains(____,____)</code> is <code>contains(@attr, "string")</code> to check that the <code>attr</code> attribute value contains the string <code>"string"</code>. You need to decide what to write for <code>@attr</code> and <code>"string"</code>. </li>
<li>The final blank is left to reference the <code>href</code> attribute value you want to point to; don't forget to include the <code>@</code> symbol (i.e., you will write <code>@href</code>).</li>
</ul>

## XPath Chaining

from scrapy import Selector

html = '''
<html>
<body>
<div>HELLO</div>
<div><p>GOODBYE</p></div>
<div><span><p>NOPE</p><p>ALMOST</p><p>YOU GOT IT!</p></span></div>
</body>
</html>
'''

sel = Selector( text = html )

<p><code>Selector</code> and <code>SelectorList</code> objects allow for <em>chaining</em> when using the <code>xpath</code> method. What this means is that you can apply the <code>xpath</code> method over once you've already applied it. For example, if <code>sel</code> is the name of our <code>Selector</code>, then </p>
<pre><code class="python language-python">sel.xpath('/html/body/div[2]')
</code></pre>
<p>
is the same as </p>
<pre><code class="python language-python">sel.xpath('/html').xpath('./body/div[2]')
</code></pre>
<p>or is the same as </p>
<pre><code class="python language-python">sel.xpath('/html').xpath('./body').xpath('./div[2]')
</code></pre>
<p>
The only catch is that you need to "glue together" the XPath pieces by using a period at the start of each subsequent XPath string (notice the periods we added to the XPath strings in our examples).</p>

<ul>
<li>Fill in the blank below to chain together two <code>xpath</code> calls which result in the same selection as</li>
</ul>
<pre><code class="python language-python">sel.xpath('//div/span/p[3]')
</code></pre>

<ul>
<li>Don't forget to use a period <code>.</code> as "glue" when chaining the <code>xpath</code> function!</li>
</ul>

## Divvy Up This Exercise

html = '''
<html>
<body>
<div>Div 1: <p>paragraph 1</p></div>
<div>Div 2: <p>paragraph 2</p> <p>paragraph 3</p> </div>
<div>Div 3: <p>paragraph 4</p> <p>paragraph 5</p> <p>paragraph 6</p></div>
<div>Div 4: <p>paragraph 7</p></div>
<div>Div 5: <p>paragraph 8</p></div>
</body>
</html>
'''

from scrapy import Selector
divs = Selector( text = html ).xpath( '//div' )

<p>We have pre-loaded an HTML into the string variable <code>html</code>. In this two part problem you will use this <code>html</code> variable as the HTML document to set up a <code>Selector</code> object with, and create a <code>SelectorList</code> which selects all <code>div</code> elements; then, you will check your understanding of what happens within the <code>SelectorList</code>.</p>

## Course Class by Inspection

<p>In the lesson, you had a brief glimpse of the following screenshot taken when "inspecting the element" of the DataCamp course title for the course <em>Introduction to R</em>:</p>
<p><img src="https://assets.datacamp.com/production/repositories/2560/datasets/59b60e887fddcea07b8de46b1c899562a2a0603d/ElementSource.png" alt="ElementSource.png"></p>
<p>By looking at the source (HTML) code provided in this image, choose which of the following matches the <code>class</code> attribute for the <code>h4</code> element containing the text for the title of the selected course.</p>

<ul>
<li>Look at the highlighted <code>h4</code> element in the bottom panel of source code given in the image and find the class attribute.</li>
</ul>

## Requesting a Selector

In [None]:
url = 'https://assets.datacamp.com/production/repositories/2560/datasets/19a0a26daa8d9db1d920b5d5607c19d6d8094b3b/all_short'

In [None]:
# Import a scrapy Selector
from scrapy import Selector

# Import requests
import requests

# Create the string html containing the HTML source
html = requests.get( url ).content

# Create the Selector object sel from html
sel = Selector( text = html )

# Print out the number of elements in the HTML document
print( "There are 1020 elements in the HTML document.")
print( "You have found: ", len( sel.xpath('//*') ) )

<p>We have pre-loaded the URL for a particular website in the string variable <code>url</code> and use the requests library to put the content from the website into the string variable <code>html</code>. Your task is to create a <code>Selector</code> object <code>sel</code> using the HTML source code stored in <code>html</code>.</p>

<ul>
<li>Fill in the two blanks below to assign to create the <code>Selector</code> object <code>sel</code> which uses the string <code>html</code> as the text it inputs.</li>
</ul>

<ul>
<li>The comment above the import from scrapy is a huge hint.</li>
<li>Remember that you want to pass the <code>html</code> variable as the text for the <code>Selector</code> object.</li>
</ul>