## Day 43 of 100DaysOfCode 🐍
### Web Scraping - CSS Locators, Chaining, Responses, and Intro to Beautiful Soup

#### **CSS Locators 🎯📏**

CSS locators are patterns used in web development and web scraping to target and style specific HTML elements. It provides a simple and concise way to identify elements based on their attributes, classes, IDs, or relationships to other elements.

#### **Exercise - The (X)Path to CSS Locators**

#### Instructions 1/2
> Assign to the variable css_locator a CSS Locator string which is equivalent to the XPath string given.

In [None]:
# Creating the XPath string equivalent to the CSS Locator
xpath = '/html/body/span[1]//a'

# Creating the CSS Locator string equivalent to the XPath
css_locator = 'html > body > span:nth-of-type(1) a'

#### Instructions 2/2
> Assign to the variable xpath a XPath string which is equivalent to the CSS Locator string given.

In [None]:
# Create the XPath string equivalent to the CSS Locator
xpath = '//div[@id="uid"]/span//h4'

# Create the CSS Locator string equivalent to the XPath
css_locator = 'div#uid > span h4'

#### **Exercise - Get an "a" in this Course**\

- Fill in the blank below to create the Selector object sel using the string html as the text input.

- Assign the variable css_locator a CSS Locator string which directs to the hyperlink (a element) children of all div elements belonging to the class "course-block".

In [None]:
# Instructions 1/2
from scrapy import Selector

# Create a selector from the html (of a secret website)
sel = Selector(text=html)

# Instructions 2/2 # Fill in the blank
css_locator = 'div.course-block > a'

# Print the number of selected elements.
how_many_elements( css_locator )


#### **Exercise - The CSS Wildcard**

- Assign to the variable css_locator a CSS Locator string which will select all children (regardless of tag-type) of the unique element in the HTML document that has its id attribute equal to uid.

In [None]:
# Create the CSS Locator to all children of the element whose id is uid
css_locator = '#uid > *'

#### **Exercise - Top Level Text**

- Assign to the variable xpath an XPath string directing to the text within the paragraph p element with id equal to p3, which does not include the text of future generations of this p element.

- Assign to the variable css_locator a CSS Locator string directing to this same text.

In [None]:
# Create an XPath string to the desired text.
xpath = '//p[@id="p3"]/text()'

# Create a CSS Locator string to the desired text.
css_locator = 'p#p3::text'

# Print the text from our selections
print_results( xpath, css_locator )

#### **Exercise - All Level Text**

- Assign to the variable xpath an XPath string directing to the text within the paragraph p element with id equal to p3, which includes the text of future generations of this p element.

- Assign to the variable css_locator a CSS Locator string directing to this same text.

In [None]:
# Create an XPath string to the desired text.
xpath = '//p[@id="p3"]//text()'

# Create a CSS Locator string to the desired text.
css_locator = 'p#p3 ::text'

# Print the text from our selections
print_results( xpath, css_locator )

#### **Chaining 🔗🔗**

Chaining refers to the practice of combining multiple actions or functions in a sequence, where the output of one action becomes the input for the next. It is commonly used to streamline code and improve readability in programming and data manipulation tasks.

#### **Responses 📦📨**

Responses are the output or answers given in response to specific inputs, questions, or stimuli. In various contexts, it can be in the form of replies, reactions, or actions taken as a result of a particular situation.

#### **Exercise - Reveal By Response**

- Assign to the variable this_url the URL used to load the response variable.
- Assign to the variable this_title the title of the website used to load the response variable. Since we only want the text from the single element we will select, we use the extract_first() method to extract the text.

- Regardless of whether you use xpath or css, make sure that you are selecting the text within the title element, and not just the title itself.



In [None]:
# Get the URL to the website loaded in response
this_url = response.url

# Get the title of the website loaded in response using Xpath
this_title = response.xpath('//title/text()').extract_first()

# or

# Get the title of the website loaded in response (using CSS)
this_title = response.css(' title::text').extract_first()

# Print out our findings
print_url_title( this_url, this_title )

#### **Exercise - Titular**

> Similar to the work given in the previous lesson, we will have you use a pre-loaded Response object, named response to scrape the course titles from the (shortened version of the) DataCamp course directory https://www.datacamp.com/courses/all. To successfully do so, you only need to know the following.  The course titles are the text from all the h4 elements within the HTML document.



- Using response, assign to the variable crs_title_els a SelectorList of the selected course titles.
- Assign to the variable crs_titles a list created by extracting the course titles from crs_title_els


In [None]:
response = 'https://app.datacamp.com/learn/courses'

# Create a SelectorList of the course titles using Xpath
crs_title_els = response.xpath('//h4/text()')

#or

# Create a SelectorList of the course titles using CSS
crs_title_els = response.css(' h4::text')

# Extract the course titles
crs_titles = crs_title_els.extract()


# Print out the course titles
for el in crs_titles:
  print( ">>", el )

#### **Intro to Beautiful Soup 🌟🍜**

Beautiful Soup is a Python library used for web scraping and parsing HTML and XML documents. It simplifies the extraction of data from web pages by providing easy-to-use methods to navigate and manipulate the document's elements.

In [None]:
# Installing the beautifulsoup4 library
!pip install beautifulsoup4

In [None]:
# Import Beautiful Soup in the notebook
from bs4 import BeautifulSoup