# CSS Locators, Chaining, and Responses
Learn CSS Locator syntax and begin playing with the idea of chaining together CSS Locators with XPath. We also introduce Response objects, which behave like Selectors but give us extra tools to mobilize our scraping efforts across multiple websites.
# 1. From XPath to CSS
## 1.1 The (X)Path to CSS Locators
Many people prefer using CSS Locator notation to XPath notation. As we will see later, it often makes attribute selection very easy. To help get you more comfortable going back and forth between XPath and CSS Locator strings, we give you a chance in this exercise to do some direct "translation" between the two.

_Note that the exercises in this chapter may take some time to load._

### Instructions 1/2:
* Assign to the variable `css_locator` a CSS Locator string which is equivalent to the XPath string given.

In [1]:
# Create the XPath string equivalent to the CSS Locator 
xpath = '/html/body/span[1]//a'
# Create the CSS Locator string equivalent to the XPath
css_locator = 'html > body > span:nth-of-type(1) a'

### Instructions 2/2:
* Assign to the variable xpath a XPath string which is equivalent to the CSS Locator string given.

In [2]:
# Create the XPath string equivalent to the CSS Locator 
xpath = '//div[@id="uid"]/span//h4'
# Create the CSS Locator string equivalent to the XPath
css_locator = 'div#uid > span h4'

## 1.2 Get an "a" in this Course
We have loaded the HTML from a secret website which you will use to set up a `Selector` object and the function `how_many_elements()`. When passing this function a CSS Locator string, it will print out the number of elements that the CSS Locator you wrote has selected.

In the second part of this problem, we want you to create a CSS Locator string which will select a certain collection of elements as described here: Select the hyperlink (`a` element) children of all `div` elements belonging to the class `"course-block"` (that is, any div element with `a` class attribute such that `"course-block"` is one of the classes assigned). The number of such elements is 11, so you can check your solution with `how_many_elements` if you choose.

### Instructions:
* Fill in the blank below to create the `Selector` object `sel` using the string `html` as the text input.
* Assign the variable css_locator a CSS Locator string which directs to the hyperlink (a element) children of all div elements belonging to the class "course-block".

In [3]:
from scrapy import Selector
import requests
# html = requests.get( 'https://assets.datacamp.com/production/repositories/2560/datasets/0f78aa6961422247398f079e099e179f6bf4aec9/all_long' ).content
html = requests.get('https://assets.datacamp.com/production/repositories/2560/datasets/19a0a26daa8d9db1d920b5d5607c19d6d8094b3b/all_short').content

def how_many_elements( css ):
    sel = Selector( text = html )
    print( len(sel.css( css )) )

In [4]:
# Create a selector from the html (of a secret website)
sel = Selector( text = html )

# Fill in the blank
css_locator = "div.course-block > a"

# Print the number of selected elements.
how_many_elements( css_locator )

11


## 1.3 The CSS Wildcard
You can use the wildcard `*` in CSS Locators too! In fact, we can use it in a similar way, when we want to ignore the tag type. For example:

* The CSS Locator string `'*'` selects all elements in the HTML document.
* The CSS Locator string `'*.class-1'` selects all elements which belong to `class-1`, but this is unnecessary since the string `'.class-1'` will also do the same job.
* The CSS Locator string `'*#uid'` selects the element with `id` attribute equal to `uid`, but this is unnecessary since the string `'#uid'` will also do the same job.

In this exercise, we want you to work by analogy with the wildcard character you know from XPath notation to discover how to select all the children of a certain element in CSS Locator notation.

### Instructions:
* Assign to the variable `css_locator` a CSS Locator string which will select all children (regardless of tag-type) of the unique element in the HTML document that has its `id` attribute equal to `uid`.

In [5]:
# Create the CSS Locator to all children of the element whose id is uid
css_locator = "#uid > *"

# 2. CSS Attributes and Text Selection
## 2.1 You've been \`href\`ed
In a previous exercise, you created a CSS Locator string to select the hyperlink (`a` element) children of all `div` elements belonging to the class `"course-block"`. Here we have created a `SelectorList` called `course_as` having selected those hyperlink children.

Now, we want you to fill in the blank below to extract the `href` attribute values from these elements. This is another example of chaining, as we've seen in a previous exercise.

The point here is that we can chain together calls to the methods `css` and `xpath`, and combine them! We help nudge you in the correct direction by giving you the solution if we chain with another call to the `css` method.

### Instructions:
* Set up the `Selector` object `sel` using the string `html` as the text input.
* Assign to the variable `hrefs_from_xpath` the `href` attribute values from the elements in `course_as`. Your solution should match `hrefs_from_css`!

In [6]:
# Create a selector object from a secret website
sel = Selector( text = html )

# Select all hyperlinks of div elements belonging to class "course-block"
course_as = sel.css( 'div.course-block > a' )

# Selecting all href attributes chaining with css
hrefs_from_css = course_as.css( '::attr(href)' )

# Selecting all href attributes chaining with xpath
hrefs_from_xpath = course_as.xpath( './@href' )

## 2.2 Top Level Text
This exercise will have you write an XPath and CSS Locator string to direct to the text of a specific paragraph `p` element. The `p` element in the HTML is uniquely defined by its `id` attribute, which is `"p3"`. With this small piece of information, you should be able to create the desired strings; however, we have preloaded the variable `html` with a string containing the HTML in which this link belongs, if you want to peruse it.

In this exercise, you will only be selecting the text within the element, which "does not include" the text in future generations of the element. We have created a function `print_results` for you to compare which elements your strings direct to.

### Instructions:
* Assign to the variable `xpath` an XPath string directing to the text within the paragraph `p` element with `id` equal to `p3`, which __does not include__ the text of future generations of this `p` element.
* Assign to the variable `css_locator` a CSS Locator string directing to this same text.

In [7]:
html = '''
<html>
<body>
<div id="this-div">
<p id="p1" class="class-1">This is not the element you are looking for</p>
<p id="p2" class="class-12">
<a href="https://www.google.com">Google</a> is linked to here, but this isn't the link you are looking for. 
</p>
<p id="p3" class="class-1 class-12">
Here is the <a href="https://www.datacamp.com" id="a-exercise">DataCamp</a> link you want!
</p>
</div>
</body>
</html>
'''

from scrapy.http import TextResponse
res = TextResponse( url = "https://www.DataCamp.com", body = html, encoding = 'utf-8' )

def our_xpath( xpath ):
    xextr = res.xpath( xpath ).extract()
    return xextr
  
def our_css( css ):
    cextr = res.css( css ).extract()
    return cextr


def print_results( xpath, css_locator ):
    print( "Your XPath extracts to following:")
    print( our_xpath(xpath) )
    print("_________________\n")
    print( "Your CSS Locator extracts the following:")
    print( our_css(css_locator) )
    return None

In [8]:
# Create an XPath string to the desired text.
xpath = '//p[@id="p3"]/text()'

# Create a CSS Locator string to the desired text.
css_locator = 'p#p3::text'

# Print the text from our selections
print_results( xpath, css_locator )

Your XPath extracts to following:
['\nHere is the ', ' link you want!\n']
_________________

Your CSS Locator extracts the following:
['\nHere is the ', ' link you want!\n']


## 2.3 All Level Text
This exercise is similar to the previous, but differs in that you will be selecting text from multiple generations of a given element.

You will write an XPath and CSS Locator strings to direct to the text of a specific paragraph `p` element. The `p` element in the HTML is uniquely defined by its `id` attribute, which is `"p3"`. With this small piece of information, you should be able to create the desired strings; however, we have preloaded the variable `html` with a string containing the HTML in which this link belongs, if you want to peruse it.

In this exercise, you will only be selecting the text within the element which includes all text within the future generations. We have created a function print_results for you to compare which elements your strings direct to.

### Instructions:
* Assign to the variable `xpath` an XPath string directing to the text within the paragraph `p` element with `id` equal to `p3`, which includes the text of future generations of this `p` element.
* Assign to the variable `css_locator` a CSS Locator string directing to this same text.

In [9]:
# Create an XPath string to the desired text.
xpath = '//p[@id="p3"]//text()'

# Create a CSS Locator string to the desired text.
css_locator = 'p#p3 ::text'

# Print the text from our selections
print_results( xpath, css_locator )

Your XPath extracts to following:
['\nHere is the ', 'DataCamp', ' link you want!\n']
_________________

Your CSS Locator extracts the following:
['\nHere is the ', 'DataCamp', ' link you want!\n']


# 3. Respond Please!
## 3.1 Reveal By Response
We have pre-loaded a `Response` object, named `response` with the content from a secret website. Your job is to figure out the URL and the title of the website using the response variable. You learned how to find the URL in the last lesson. To find the website title, what you need to know is:

* The title is the __text__ from the `title` element
* The `title` element is a child of the `head` element, which is a child of the `html` root element.

To note: the `html` root element only has one child `head` element, and the `head` element only has one child `title` element.

### Instructions:
* Assign to the variable `this_url` the URL used to load the `response` variable.
* Assign to the variable `this_title` the title of the website used to load the `response` variable. Since we only want the text from the single element we will select, we use the `extract_first()` method to extract the text.

In [10]:
import requests
from scrapy.http import TextResponse

html = requests.get('https://assets.datacamp.com/production/repositories/2560/datasets/19a0a26daa8d9db1d920b5d5607c19d6d8094b3b/all_short').content

response = TextResponse( url = 'https://www.datacamp.com/courses/all', 
                         body = html, 
                         encoding = 'utf-8' )

def print_url_title( url, title ):
    print( "Here is what you found:" )
    print( "\t-URL: %s" % url )
    print( "\t-Title: %s" % title )

In [11]:
# Get the URL to the website loaded in response
this_url = response.url

# Get the title of the website loaded in response
this_title = response.xpath('//title/text()').extract_first()

# Print out our findings
print_url_title( this_url, this_title )

Here is what you found:
	-URL: https://www.datacamp.com/courses/all
	-Title: Data Science Courses: R & Python Analysis Tutorials | DataCamp


## 3.2 Responding with Selectors
Something that we should emphasize at this point about the relationship between a `Selector` and `Response` objects is that __both__ objects return a `SelectorList` when using the `xpath` or `css` methods to direct to elements. In this exercise, we'll prove it to you, by having you find all hyperlink elements belonging to the class `course-block__link` (notice the double underscore!) and looking at the object that is produced when doing so.

We have pre-loaded both a `Response` object named `response` and a `Selector` object named `sel` with the content from the same "secret" website. Once you complete the task of creating a CSS Locator, you will compare both the output from `response.css` and `selector.css` to see that they are effectively the same!

### Instructions:
* Assign to the variable `css_locator` a CSS Locator string which directs to all hyperlink `a` elements belonging to the class `course-block__link`.
* Assign to the variable `response_as` the output of passing the `css_locator` variable to the `css` method in `response`.
* Assign to the variable `sel_as` the output of passing the `css_locator` variable to the `css` method in `sel`.

In [12]:
# Create a CSS Locator string to the desired hyperlink elements
css_locator = 'a.course-block__link'

# Select the hyperlink elements from response and sel
response_as = response.css(css_locator)
sel_as = sel.css(css_locator)

# Examine similarity
nr = len( response_as )
ns = len( sel_as )
for i in range( min(nr, ns, 2) ):
    print( "Element %d from response: \n%s" % (i+1, response_as[i]) )
    print( "Element %d from sel: \n%s" % (i+1, sel_as[i]) )
    print( "" )

Element 1 from response: 
<Selector xpath="descendant-or-self::a[@class and contains(concat(' ', normalize-space(@class), ' '), ' course-block__link ')]" data='<a class="course-block__link ds-snowplow'>
Element 1 from sel: 
<Selector xpath="descendant-or-self::a[@class and contains(concat(' ', normalize-space(@class), ' '), ' course-block__link ')]" data='<a class="course-block__link ds-snowplow'>

Element 2 from response: 
<Selector xpath="descendant-or-self::a[@class and contains(concat(' ', normalize-space(@class), ' '), ' course-block__link ')]" data='<a class="course-block__link ds-snowplow'>
Element 2 from sel: 
<Selector xpath="descendant-or-self::a[@class and contains(concat(' ', normalize-space(@class), ' '), ' course-block__link ')]" data='<a class="course-block__link ds-snowplow'>



## 3.3 Selecting from a Selection
In this exercise, you will find the text from an `h4` element within a particular `div` element. It will occur in steps where the first step is selecting a family of `div` elements, and the second step is narrowing in on the first one, from which we will grab the `h4` element text. This process of progressively narrowing in on elements (e.g., first to the `div` elements, then to the `h4` element) is another example of "chaining", even if it doesn't look exactly the same as we've seen it before.

Along the way in this exercise, there is a variable `first_div` set up for you to use. Think carefully about what type of object `first_div` is!

### Instructions:
* Assign to the variable `divs` a SelectorList which selects all `div` elements belonging to the class `course-block`.
* Assign to the variable `h4_text` the text from the __only__ `h4` element within the content selected in `first_div`. Since we only want the text from the single element we will select, we use the `extract_first()` method to extract the text.

In [13]:
# Select all desired div elements
divs = response.css('div.course-block')

# Take the first div element
first_div = divs[0]

# Extract the text from the (only) h4 element in first_div
h4_text = first_div.css('h4::text').extract_first()

# Print out the text
print( "The text from the h4 element is:", h4_text )

The text from the h4 element is: Introduction to R


# 4. Survey
## 4.1 Titular
Similar to the work given in the previous lesson, we will have you use a pre-loaded `Response` object, named `response` to scrape the course titles from the (shortened version of the) DataCamp course directory https://www.datacamp.com/courses/all. To successfully do so, you only need to know the following

* The course titles __are the text__ from all the `h4` elements within the HTML document.

Extract these course titles here.

### Instructions:
* Using `response`, assign to the variable `crs_title_els` a `SelectorList` of the selected course titles.
* Assign to the variable `crs_titles` a list created by extracting the course titles from `crs_title_els`.

In [14]:
# Create a SelectorList of the course titles
crs_title_els = response.css('h4::text')

# Extract the course titles 
crs_titles = crs_title_els.extract()

# Print out the course titles 
for el in crs_titles:
    print( ">>", el )

>> Introduction to R
>> Data Analysis in R, the data.table Way
>> Data Manipulation in R with dplyr
>> Data Visualization in R with ggvis
>> Reporting with R Markdown
>> Intermediate R
>> Introduction to Machine Learning
>> Cleaning Data in R
>> Intro to Python for Data Science
>> Intermediate R - Practice
>> Predicting Customer Churn in Python


## 4.2 Scraping with Children
We did a cute trick in the lesson to calculate how many children there were of one of the `div` elements belonging to the class `course-block`. Here we ask you to find the number of children of a mystery element (already stored within a `Selector` object, so you can use the `xpath` or `css` method).

To be explicit, we have created the `Selector` object `mystery` in the following way:

* We first loaded a `Response` variable using a secret website as the input.
* Then we used a call to the `xpath` method to create a `SelectorList` of elements (but we won't say which ones)
* Finally, we let `mystery` be the first `Selector` object of this `SelectorList`.

### Instructions:
* Fill in the blank below to chain on a call to `xpath` so that we can calculate the number of children of the mystery element; we assign this number to the variable `how_many_kids`.

In [15]:
from scrapy.http import TextResponse
import requests

_url = 'https://assets.datacamp.com/production/repositories/2560/datasets/19a0a26daa8d9db1d920b5d5607c19d6d8094b3b/all_short'
_html = requests.get( _url ).content
_response = TextResponse( url = _url, body = _html, encoding = 'utf-8' )
_as = _response.xpath('//body')

mystery = _as[0]

In [16]:
# Calculate the number of children of the mystery element
how_many_kids = len( mystery.xpath( './*' ) )

# Print out the number
print( "The number of elements you selected was:", how_many_kids )

The number of elements you selected was: 23
