## Select for CSS Selectors
Another way to capture your desired elements with the soup object is to use CSS selectors. The .select() method will take in all of the CSS selectors you normally use in a .css file!

In [5]:
a = """<h1 class='results'>Search Results for: <span class='searchTerm'>Funfetti</span></h1>
<div class='recipeLink'><a href="spaghetti.html">Funfetti Spaghetti</a></div>
<div class='recipeLink' id="selected"><a href="lasagna.html">Lasagna de Funfetti</a></div>
<div class='recipeLink'><a href="cupcakes.html">Funfetti Cupcakes</a></div>
<div class='recipeLink'><a href="pie.html">Pecan Funfetti Pie</a></div>`"""


If we wanted to select all of the elements that have the class `'recipeLink'`, we could use the command:

`soup.select(".recipeLink")`

If we wanted to select the element that has the id `'selected'`, we could use the command:

`soup.select("#selected")`

Let’s say we wanted to loop through all of the links to these funfetti recipes that we found from our search.

`for link in soup.select(".recipeLink > a"):`

  `webpage = requests.get(link)`
  
  `new_soup = BeautifulSoup(webpage)`
  
This loop will go through each link in each `.recipeLink` div and create a soup object out of the webpage it links to. So, it would first make soup out of `<a href="spaghetti.html">Funfetti Spaghetti</a>`, then `<a href="lasagna.html">Lasagna de Funfetti</a>`, and so on.

## Reading Text

When we use BeautifulSoup to select HTML elements, we often want to grab the text inside of the element, so that we can analyze it. We can use `.get_text()` to retrieve the text inside of whatever tag we want to call it on.


In [6]:
"""<h1 class="results">Search Results for: <span class='searchTerm'>Funfetti</span></h1>"""

'<h1 class="results">Search Results for: <span class=\'searchTerm\'>Funfetti</span></h1>'

If this is the HTML that has been used to create the soup object, we can make the call:

`soup.get_text()`

Which will return:

**'Search Results for: Funfetti'**

Notice that this combined the text inside of the outer h1 tag with the text contained in the span tag inside of it! Using `get_text()`, it looks like both of these strings are part of just one longer string. If we wanted to separate out the texts from different tags, we could specify a separator character. This command would use a . character to separate:

`soup.get_text('|')`

Now, the command returns:

**'Search Results for: |Funfetti'**

## Exercises

1. After the loop, print out turtle_data. We have been storing the names as the whole p tag containing the name. <br>
Instead, let’s call `get_text()` on the `turtle_name` element and store the result as the key of our dictionary instead.


2. Instead of associating each turtle with an empty list, let’s have each turtle associated with a list of the stats that are available on their page. <br>
It looks like each piece of information is in a `li` element on the turtle’s page.
Get the `ul` element on the page, and get all of the text in it, separated by a `'|'` character so that we can easily split out each attribute later.<br>
Store the resulting string in `turtle_data[turtle_name]` instead of storing an empty list there.


3. When we store the list of info in each `turtle_data[turtle_name]`, separate out each list element again by splitting on `'|'`.

In [6]:
import requests
from bs4 import BeautifulSoup
import pandas as pd


In [5]:
prefix = "https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/"
webpage_response = requests.get('https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/shellter.html')

webpage = webpage_response.content
soup = BeautifulSoup(webpage, "html.parser")

turtle_links = soup.find_all("a")
links = []
#go through all of the a tags and get the links associated with them"
for a in turtle_links:
    links.append(prefix+a["href"])
    
#Define turtle_data:
turtle_data = {}
s
#follow each link:
for link in links:
  webpage = requests.get(link)
  turtle = BeautifulSoup(webpage.content, "html.parser")
  turtle_name = turtle.select(".name")[0].get_text()
  
  stats = turtle.find("ul")
  stats_text = stats.get_text("|")
  turtle_data[turtle_name] = stats_text.split("|")
  

turtle_df = pd.DataFrame.from_dict(turtle_data,orient ="index")
print(turtle_df.head())
print(turtle_df.columns)
print(turtle_df.dtypes)
turtle_df = turtle_df.drop(turtle_df.columns[-1], axis=1)

print(turtle_df.head())

turtle_df2 = turtle_df.drop(turtle_df.columns[8],axis=1)

print(turtle_df2.head())
print(turtle_df2.dtypes)

turtle_df3 = turtle_df2.drop(turtle_df2.columns[0],axis=1)
pd.set_option('display.max_columns', None)

print(turtle_df3.head())
turtle_df4 = turtle_df3.drop(turtle_df3.columns[1],axis=1).reset_index()

print(turtle_df4.head())

turtle_df5 = turtle_df4.drop(turtle_df4.columns[3],axis=1)
print(turtle_df5.head())

turtle_df6 = turtle_df5.drop(turtle_df5.columns[4],axis=1)
print(turtle_df6.head())

final_df = turtle_df6
print(final_df)

final_df.columns= ['name','age','weight','sex','breed','source']
print(final_df)
age_split_df = final_df['age'].str.split(':', expand=True)
print(age_split_df)
age_split_2 = age_split_df.get(1).str.split('(\s)', expand = True)
age_split_2 = pd.to_numeric(age_split_2.get(2))
age_split_2
final_df['age'] = age_split_2
print(final_df)

wt_split_df = final_df['weight'].str.split(':', expand=True)
print(wt_split_df)

wt_split_2 = wt_split_df.get(1).str.split('(\s)', expand = True)
print(wt_split_2)


final_df['weight'] = pd.to_numeric(wt_split_2.get(2))
final_df
sex_split = final_df['sex'].str.split(':', expand=True)
breed_split = final_df['breed'].str.split(':', expand=True)
source_split = final_df['source'].str.split(':', expand=True)

print(sex_split)
print(breed_split)
source_split
final_df['sex'] = sex_split.get(1)
final_df['breed'] = breed_split.get(1)
final_df['source'] = source_split.get(1)

final_df


        0                 1   2              3   4            5   6   \
Aesop   \n  AGE: 7 Years Old  \n  WEIGHT: 6 lbs  \n  SEX: Female  \n   
Caesar  \n  AGE: 2 Years Old  \n  WEIGHT: 4 lbs  \n    SEX: Male  \n   
Sulla   \n   AGE: 1 Year Old  \n   WEIGHT: 1 lb  \n    SEX: Male  \n   
Spyro   \n  AGE: 6 Years Old  \n  WEIGHT: 3 lbs  \n  SEX: Female  \n   
Zelda   \n  AGE: 3 Years Old  \n  WEIGHT: 2 lbs  \n  SEX: Female  \n   

                                            7   8   \
Aesop   BREED: African Aquatic Sideneck Turtle  \n   
Caesar                   BREED: Greek Tortoise  \n   
Sulla   BREED: African Aquatic Sideneck Turtle  \n   
Spyro                    BREED: Greek Tortoise  \n   
Zelda                BREED: Eastern Box Turtle  \n   

                                  9   10  
Aesop     SOURCE: found in Lake Erie  \n  
Caesar      SOURCE: hatched in house  \n  
Sulla     SOURCE: found in Lake Erie  \n  
Spyro       SOURCE: hatched in house  \n  
Zelda   SOURCE: surrendered

Unnamed: 0,name,age,weight,sex,breed,source
0,Aesop,7.0,6.0,Female,African Aquatic Sideneck Turtle,found in Lake Erie
1,Caesar,2.0,4.0,Male,Greek Tortoise,hatched in house
2,Sulla,1.0,1.0,Male,African Aquatic Sideneck Turtle,found in Lake Erie
3,Spyro,6.0,3.0,Female,Greek Tortoise,hatched in house
4,Zelda,3.0,2.0,Female,Eastern Box Turtle,surrendered by owner
5,Bandicoot,2.0,2.0,Male,African Aquatic Sideneck Turtle,hatched in house
6,Hal,1.0,1.5,Female,Eastern Box Turtle,surrendered by owner
7,Mock,10.0,10.0,Male,Greek Tortoise,surrendered by owner
8,Sparrow,1.5,4.5,Female,African Aquatic Sideneck Turtle,found in Lake Erie
