# Introduction to Web Scraping
__Created by Sarah Pugachev  
Last Updated on 2019-02-15__

* This work is shared with a [CC-BY-SA 2.0 license](https://creativecommons.org/licenses/by-sa/2.0/). Please feel free to share and adapt this material. Check out the license for more details!
* This tutorial was created for an in-person workshop, but it can also be able used as a standalone resource. 
* The material in this notebook is designed to be introductory and covered in 1-2 hours. For more in-depth information on Web Scraping, I recommend the material created by [Dr. Brian C. Keegan for his five week mini-course at University of Colorado Boulder](https://github.com/CU-ITSS/Web-Data-Scraping-S2019). 
* If you have any questions about this tutorial, please contact the [University of Oklahoma Libraries' Digital Scholarship Lab](https://libraries.ou.edu/content/digital-scholarship-laboratory). 

__This tutorial will use python, but defintiely you don't need to know python to follow along. Go ahead and run the code even if you don't understand every line. One of the best way to learn to program is to start by modifying code. Check our [Carpentries' workshops](https://libraries.ou.edu/content/software-and-data-carpentry) if you want a full introduction to Python!__

## Why Web Scraping?
* More and more material is available online, but it isn't always easy to get it in a useable format
* Web scraping empowers you to collect information in a more automated fashion. 

## Legal and Ethical Implications
Web scraping can get a bad rep. It is important be sure you are aware of ethical and legal issues before you scrape!

### Legal: Just because you can technically scrape, doesn't mean you legally can. ###
* First consider copyright law in the country you are in and where the content was published. 
    * [Here are some guidelines for US Copyright Law](https://www.copyright.gov/circs/circ01.pdf). 
    * The Doctrine of Fair Use may also be revelant for reserachers in the US. [Here is more information about Fair Use](https://en.wikipedia.org/wiki/Fair_use).

* Then, look at site's Terms of Use or other use policies. These kinds of licenses are placed on top of copyright laws so they can be more restrictive, but you have to agree to the terms to use the site. 
    
We are going to be using Craigslist as our data sample for this lesson. Let's take a look and see what their Terms of Use are. [Here are is the full policy.](https://www.craigslist.org/about/terms.of.use) Under the license header, you will find this condition.
> You agree not to display, "frame," make derivative works, distribute, license, or sell, content from CL, excluding postings you create. You grant us a perpetual, irrevocable, unlimited, worldwide, fully paid/sublicensable license to use, copy, display, distribute, and make derivative works from content you post.
   
* Does anything surprise you in this statement?
* Are we allowed to scrape information from the site?
* What can we do with the data we scrape?
   
**Note: Some sites also put up technical barriers to prevent you from violating legal ones. For example, scraping journals that you access through a library subscription is a good way to get the whole campus locked out. Please don't do it.**


### Ethical: Just because you can legally and technically scrape, doesn't mean you ethncially should. ###
* Before scraping data, ask yourself some of the questions listed below to ensure that you are not putting anyone or any group at risk by collecting and/or sharing this information. 
    * Are there any privacy concerns in the data you will scrape?  
    * Did the creators know there information would be publically available (think social media)?
    * What are your questions?


## Techincal Explanation
* We are going to use the programming language Python and the Python library Beautiful Soup to get data out of webpages. It will help us target specific information within a page, extract the data, and remove the HTMl markup.
* We need to take machine readable data (i.e. HTML) and make it human readable. For example, we may want to pull out the data between <p></p> html tags.
* Once you start to understand how web pages are built, you will be able to spot patterns that help you extract data easier. 
* This notebook requires Python 3.7 and a number of packages (additional code to run). If you are accessing this via mybinder(mybinder.org) then you will not have to worry about the software install. Otherwise, you will need to install python 3 and the relevant libraries. I reccomend install via the [anaconda installer](https://www.anaconda.com/distribution), which should also install all of the packages needed for this lesson. 

## Let's Look at Our Data Source
We will be using OKC Craigslist apartments/housing listings as our datasource. 
1. Go to https://oklahomacity.craigslist.org/d/apts-housing-for-rent/search/apa in a Google Chrome. (We recommend Google Chrome because it has a handy way to see the underlying HTML).  

2. Right click on the first link. Select inspect. ![selecting inspect](images/image_01.png "Inspecting the First Link")  

3. After selecting inspect, a box should appear showing you the underlying html for the website. It will highlight the HTML element you had selected when you selected inspect.   

4. Let's take break down the HTML we see in our inspect an element box. ![the inspect an element box](images/image_02.png) 
    * HTML uses tags (<>) to mark different elements. There is an opening tag (i.e. `<a>`) and and closing tag (`</a>`).
        * In the highlighted text above, the tag is a. The a tag is used for links. 
        * Other tags you should see in our inspect screen are span, time, p, and li.  
    * Inside the tags, their may be attributes (`attribute_name = "attribute"`). There can be multiple attributes and different tags have differnt attribute options. 
        * In the highlighted text, you should see the attribute href. This is used for the url you want to link to. You should also see the class attribute. This is used to mark similar elements so they can be styled the same way. We will use the class attribute to pull titles from the page.  
        * Other attributes you may see include role, title, datetime.
    * In between the opening and closing tag, you have the text you want to appear on the website. 
        * In the highlighted text, this is Free Rent Special! Get Your 1st Month Free On Us!
        * You can see other text, Jan 25, in the time element. 
5. Let's return to our question. Can we extract information from these apartment listings? We will need find a tag and/or attribute that will only pull the information we need. Let's start with the postings' titles.
    * What do you see that might work?
    * It looks class ='result-title hdrlink" will return the titles. 
        

## Now, let's start programming! ##

### Important: in all the cells with code, press __shift + enter__ to execute each block of code. ###

There are comments throughout (indicated by #) to say exactly what is happening with each piece of code. There are also more complete instructions/explainations in separate cells to stand out better. There will also be challenges throughout for you to practice what we have learned.


Let's start by importing our libraries. Libraries are chunks of code written by others that have been made available for anyone to use. They are great because they prevent everyone from re-writting the same code over and over again. In this tutorial we will use the libraries requests and BeautifulSoup. Requests will let us pull down html from the web and BeautifulSoup will allow us to parse through that html. We will use pandas to structure our data in a dataframe, which is a nice way to work with tabular data.

___

In [1]:
#Import requests
import requests

#Import BeautifulSoup
from bs4 import BeautifulSoup

#Import pandas
import pandas as pd

___
Notice the different methods for importing libraries. Why do you think there are 3 different syntaxes?

Now, let's pull down our webpage using requests.

___

In [2]:
#Create a variable to store the URL from where we will pull data
url = "https://oklahomacity.craigslist.org/d/apts-housing-for-rent/search/apa"

In [3]:
# Using the get function from request to pull down our url
r = requests.get(url)

# Check to make sure our pull was successful - Response Code 200 means it was
print(r)

#Print out the text of the page we just pulled down
print(r.text)

<Response [200]>
﻿<!DOCTYPE html>
<html class="no-js"><head>
    <title>oklahoma city apts/housing for rent  - craigslist</title>

    <meta name="description" content="oklahoma city apts/housing for rent  - craigslist">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge"/>
    <link rel="canonical" href="https://oklahomacity.craigslist.org/search/apa">
    <link rel="alternate" type="application/rss+xml" href="https://oklahomacity.craigslist.org/search/apa?format=rss" title="RSS feed for craigslist | oklahoma city apts/housing for rent  - craigslist">
        <link rel="next" href="https://oklahomacity.craigslist.org/search/apa?s=120">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link type="text/css" rel="stylesheet" media="all" href="//www.craigslist.org/styles/cl.css?v=93aa8b9183682fd14ca12ff38c0d801c">
    <link type="text/css" rel="stylesheet" media="all" href="//www.craigslist.org/styles/search.css?v=84cf86bc094026e12fa066bbbab154ac">
    <lin

___
Let's transform the html into a BeautifulSoup object. This should look nearly identical to the r.text object, but we can do much more with it through the BeautifulSoup library!
___

In [4]:
#Parse the data using BeautifulSoup. This will allow us to pull data by html tags and attributes
soup = BeautifulSoup(r.text, 'html.parser')

In [5]:
#Print out the BeautifulSoup object. 
print(soup)

﻿<!DOCTYPE html>

<html class="no-js"><head>
<title>oklahoma city apts/housing for rent  - craigslist</title>
<meta content="oklahoma city apts/housing for rent  - craigslist" name="description"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible">
<link href="https://oklahomacity.craigslist.org/search/apa" rel="canonical"/>
<link href="https://oklahomacity.craigslist.org/search/apa?format=rss" rel="alternate" title="RSS feed for craigslist | oklahoma city apts/housing for rent  - craigslist" type="application/rss+xml"/>
<link href="https://oklahomacity.craigslist.org/search/apa?s=120" rel="next"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="//www.craigslist.org/styles/cl.css?v=93aa8b9183682fd14ca12ff38c0d801c" media="all" rel="stylesheet" type="text/css"/>
<link href="//www.craigslist.org/styles/search.css?v=84cf86bc094026e12fa066bbbab154ac" media="all" rel="stylesheet" type="text/css"/>
<link href="//www.craigslist.org/styles/jquery-ui-clcustom.

___
The printed out 'soup' object looks a lot like the inspect element window we opened in Chrome. The power of Beautiful Soup is that it let's us select parts of the HTML webpage using the tags or attributes. Let's run a function called findALL using the attribute for titles that we identified earlier.
___  
  

In [6]:
titles =soup.findAll(attrs= {'class': 'result-title hdrlnk'})

___
Let's print out the titles we just found.
___

In [7]:
print(titles)

[<a class="result-title hdrlnk" data-id="6815368320" href="https://oklahomacity.craigslist.org/apa/d/oklahoma-city-spacious-2-bedroom/6815368320.html">Spacious 2 bedroom apartment!  Call for specials today!</a>, <a class="result-title hdrlnk" data-id="6813852414" href="https://oklahomacity.craigslist.org/apa/d/weatherford-move-in-in-june-for-99/6813852414.html">Move-in in June for $99!</a>, <a class="result-title hdrlnk" data-id="6819840298" href="https://oklahomacity.craigslist.org/apa/d/oklahoma-city-fitness-center/6819840298.html">Fitness Center</a>, <a class="result-title hdrlnk" data-id="6814146412" href="https://oklahomacity.craigslist.org/apa/d/oklahoma-city-washer-dryer-in-unit-and/6814146412.html">Washer/Dryer In Unit and New Appliances Available</a>, <a class="result-title hdrlnk" data-id="6823725855" href="https://oklahomacity.craigslist.org/apa/d/wheatland-nice-single-family-2-bedroom/6823725855.html">Nice single family 2 bedroom house for rent</a>, <a class="result-title h

___
The data printed out as a list (indicated by the [ ] with each item in the list separtated by a comma). With lists in python, we can pull out individual elements. To do that we use the syntax **name_of_the_list[item_number]**.

Notice that we pulled the full html a tag not just the title text. Luckily, BeautifulSoup has a function called **get_text()** that will allow us extract the title text only. Run the code below compare the first item in the list with and without the get_text() function added.
___

In [8]:
#print the first title - Note python starts counting at 0 not 1 -- also added a new line at the end for visiblity
print('Print first title without get_text():\n', titles[0], '\n')

#Now we will try get_text()
print('Print first title with get_text():\n', titles[0].get_text())

Print first title without get_text():
 <a class="result-title hdrlnk" data-id="6815368320" href="https://oklahomacity.craigslist.org/apa/d/oklahoma-city-spacious-2-bedroom/6815368320.html">Spacious 2 bedroom apartment!  Call for specials today!</a> 

Print first title with get_text():
 Spacious 2 bedroom apartment!  Call for specials today!


___
Loops are a great way to go through a BeautifulSoup list. Let's run a 'for' loop to print out each title.

For loops are structured as follows. Identation matters in python.   
__`for variable in collection:
    do this`__
    
___

In [9]:
for title in titles:
    print (title.get_text())

Spacious 2 bedroom apartment!  Call for specials today!
Move-in in June for $99!
Fitness Center
Washer/Dryer In Unit and New Appliances Available
Nice single family 2 bedroom house for rent
Affordable Nightly,Weekly, and Extended Stay for a Spacious Studio
Two Sparkling Pools, Two Laundry Care Centers, Courtyard Picnic Areas
Nice single family 2 bedroom house for rent
2 bedroom 1 bathroom
Tenemos apartamentos como te gustan a ti
Pool, New State of the Art Fitness Center, New Community Center
FREE RENT UNTIL  - GREAT SPECIALS
2 bedroom home for rent
2-bedroom duplex
4-bedroom house for rent
Limited Access Community, Playground, Minutes From Quail Springs Mall
SHORTEN YOUR COMMUTE IN THIS DEL CITY STUDIO!
Just Reduced! Now Only $699
Don't Miss Out On Our Amazing 1 MONTH FREE Special!
Spacious closets, Central heat and air, Fire place in some units
2br - 2 Bedroom Townhome with Garage in oklahoma city
Dishwasher, Air Conditioning, New Owners & Management
֍ Oklahoma City Apartment Living, 

___
Now, let's pull some other relevant data from the craiglist page. Below, I have created two new variables to collect pricing and size data.
___ 

In [10]:
prices =soup.findAll(attrs= {'class': 'result-price'})
sizes =soup.findAll(attrs= {'class': 'housing'})

___
### Challenge 1
Complete the code below to also collect neighborhood data.
___

In [11]:
neighborhoods =soup.findAll(attrs= {'class': '____-____'})

___
### Challenge 2
Now, complete the for loop to print out all the prices. Remember, we want to print out the prices and not the full html element.

*Bonus: Write another loop to print out the either the sizes or neighborhoods.*
___

In [None]:
for price in ______:
    print(____._____())

___
Did you notice that the outputs for titles and prices seem to be different lengths? 

There is a function we can use called **len( )** that will return the length of an object like a list. Run the code below to see the different lengths of the two lists.
___

In [13]:
print('Number of titles returned:', len(titles))
print('Number of prices returned:', len(prices))

Number of titles returned: 120
Number of prices returned: 239


___
Why are the two lists drastically different lengths? 

Do the lengths returned give you any clues?

Go back to the original website to try to figure out what is happening!



### Challenge 3
Finish the code below to check the length of neighborhoods and sizes lists.

Why would these be differnet lengths?

___

In [51]:
print('Number of neighborhoods returned:', len(___________))
print('Number of sizes returned:', ________)

NameError: name '___________' is not defined

___
If we want to match up titles, prices, and neighborhoods we are going to need different call than just each items individual attributes. Let's go back to inspect element on the craigslist page to see our options. Remember that web scraping is based how structured the HTML is. You can't pull out structured data where there isn't any (or at least it is going to be a lot more work)!__

Notice the __`<p class = "result-info">`__ tag. This seems like a good place to dive in and extract the information we want in a systematic. AKA we will be able to match each listing's title, price, neighboorhood, and size (if those characteristics are included in the listing).
___


___
## Diving Deeper! ##
If you have made it this far, you learned some of the basics of working with BeautifulSoup! Congrats!__

This next section will dive deeper and use loops and conditions to extract more structured data that we could use as inputs for other code or tools like Tableau or Excel.



### Challenge 4
Complete the code below to download the `<p class = "result-info">` data.
___

In [53]:
#Pulling out all listing data
listings =soup.findAll(attrs= {__________________})

___
### Challenge 5
Print out the fifth listing from the list you just created. Hint: python starts counting at zero so the first listing would be item 0.
___

In [None]:
print(_______[_])

___
Let's use this fifth listing as a test case for the rest of our code! 
* First, we will find all the attributes we are interested in pull. 
* Then, we will print out each item.
___

In [60]:
title = listings[4].find(attrs = {"class": "result-title hdrlnk"})
price = listings[4].find(attrs = {"class":"result-price"})
size = listings[4].find(attrs = {"class": "housing"})
neighborhood = listings[4].find(attrs = {"class": "result-hood"})

___
### Challenge 6
Print out each of the attributes you just pulled down. Please print out the text only not including the html data like tags and attributes.
___

In [None]:
#Print out each attribute in the the fifth listing

print('Title:',_________)
print('Price:',___________)
print('Size:', _______________)
print('Neighborhood:', ___________)

___
Did you get an error message when you ran the above code? If so, what do you think it means?

You may or may not get an error based on the code above. It will all depend on what the fifth listing was on craigslist was when you pulled the url. If you didn't get an error, the error message would be **`'NoneType' object has no attribute 'get_text'`**.
 * What do you think that error message means?
 * How can we get around it?
 
 If you didn't get an error message in the code above, you might want to go up to the code before challenge 6 and try out some different listings until you see the error message for yourself.


We can use what is called **conditional statements** to handle these missing data cases. Using the **if else** structure we can first check to see if a value is equal to None, if it isn't we will print the value that is there if note we will print "None."

Run the code for for the if/else statement for prices below. 
___

In [17]:
#Checking on price
if price != None:
    print('Price:',price.get_text())
else: 
    print('Price:','None')

Price: $815


___
### Challenge 7
Complete the rest of the code to check the size and neighboorhood variables. 

Why didn't we need to check the title variable?
___

In [64]:
#Writing conditional statements to handle missing data. 
print('Title:',title.get_text())

#Checking on price
if price != None:
    print('Price:',price.get_text())
else: print('Price:','None')
    
# Checking on size
if ________________:
    print('Size:', size.get_text())
else: 
    print('Size:','None')

#Checking on neighboorhood
_______________________

Title: Apply TODAY Get FEBRUARY FREE + a $500 VISA Gift Card!
Price: $849
Size: 
                    2br -
                    920ft2 -
                
Neighborhood: None


___
Now we can get pull the data for each listing even noting missing data, but this doesn't seem very useful by itself, and certainly not much faster than looking on the website and transcribing the data. 

On of the big benefits of programming is that we can automate our processes to save them. This is definitely true with web scraping. Let' use a python technique we learned above, the **for loop**, to iternate through each listing and pull out the data we want. 

### Challenge 8 
Adapt code we have already written above into a for loop that will print out the data for each listing in the cell below. I've provided comments to guide you. 

Here is the code that should be adapted for your for loop:

```python
title = listings[4].find(attrs = {"class": "result-title hdrlnk"})
price = listings[4].find(attrs = {"class":"result-price"})
size = listings[4].find(attrs = {"class": "housing"})
neighborhood = listings[4].find(attrs = {"class": "result-hood"})
```

And 

``` python
print('Title:',title.get_text())
if price != None:
    print('Price:',price.get_text())
else: 
    print('Price:','None')  
if size != None:
    print('Size:', size.get_text())
else: print('Size:','None')
if neighborhood != None:
    print('Neighborhood:', neighborhood.get_text())     
else: 
    print('Neighborhood:','None')
```
___

In [None]:
for listing in listings:
    #Pull info for each listing
    
    
    
    
    
    #Print out info for each listing

    
    
    

___
How long did it take for your code to run? How long would have taken you to manually collect all that data? 

Now, you should be starting to see how web scraping can really save you time. The next step is to get the data is a format that is useable. So instead of printing out each of the values, let's save them to a variable. A list will work well to hold the data until we export it to another format. 

Let's try it out with just the titles. Luckily we can modify code we have already written to achieve this goal. Read over the code and comments below to see if you can figure out what is happening. Run the code to see what happens. 
___

In [23]:
#Start with an empty list to hold the data
titles = []

#Run a for loop to pull all the titles
for listing in listings: 
    title = listing.find(attrs = {"class": "result-title hdrlnk"})
    
    #Append each title for the list.
    titles.append(title.get_text())

___
What happened? 

Well, it is hard to know since we didn't have anything set to print to our screen. 

I like to have print statements in my code to show progress as a loop runs. And to have some final print statement to let me know what happened. 

### Challenge 9
Edit the code below to print out each title as the loop runs. Also, add code to print out the final list of titles. I've added comments to help you place these statements. 

Notice what is inside and outside of the loop. ***Indentation marks inclusion in loops.***
___

In [None]:
#Start with an empty list to hold the data
titles = []

#Run a for loop to pull all the titles
for listing in listings: 
    title = listing.find(attrs = {"class": "result-title hdrlnk"})
    #print out each title

    
    #Append each title for the list.
    titles.append(title.get_text())

#print full list of titles


___
Now, that we know what the results are from our code. Let's take a step back and take a closer look at the code.

* __Start with an empty variable.__ We need to have a place to store our output. We need to put this outside the loop so it doesn't reset every time we run the loop. __[ ]__ is the symbol to indicate an empty list. 
* __The append function let's us add values to a list.__ Append will add the new value to the end of the list by default. The syntax is __list.append(value)__, where list is an exisiting list and value is what you want to add. 

### Challenge 10
Take the code you have already written (provided with slight modifcations in the cell below) and expand it to include the price, size, and neighborhood values. Don't forget to include your conditational statements. 
___

In [None]:
#Start with an empty lists to hold the data
titles = []



#Run a for loop to pull all the relevant data
for listing in listings: 
    title = listing.find(attrs = {"class": "result-title hdrlnk"})
    
    
    
    #print out each title to show progress
    print(title.get_text())

    #Append each value to relevant lists
    titles.append(title.get_text())
    
    
    
    
    

#print full list of titles
print('Done')

___
Now, that we have all the data we want in individual lists. We can combine them into one data structure. 

We will create a dataframe to store the data. Dataframes work very well for tabular/flat data. They are also very easy to export as files to use with other programs like Tableau, Excel, or other code you have written in a differnt language. 

To use dataframes in python, we rely on the pandas library, which we have imported as pd. Run the code below to create your dataframe. 
___

In [None]:
data = pd.DataFrame({'Neighborhood': neighborhoods,
                     'Size': sizes,
                     'Title': titles,
                     'Price': prices
                    })

___
A handy way to check if our data did what we want is to print out the first few lines of our dataframe. We can do this using the **head(num_of_rows)** function. Run the code below to see the first 15 rows of your dataframe.
___


In [99]:
data.head(15)

Unnamed: 0,Neighborhood,Price,Size,Title
0,,$1005,\n 3br -\n ...,"Dishwasher, Private Patio, Heat"
1,,$835,\n 2br -\n ...,"24-Hour Availability, BBQ Area, On-site Mainte..."
2,,$849,\n 2br -\n ...,Apply TODAY Get FEBRUARY FREE + a $500 VISA Gi...
3,(Yukon),$695,\n 2br -\n ...,Our Residents Love Us So Will You!!
4,(Norman),$710,\n 2br -\n ...,WE LOVE PETS!!
5,(Midwest City),$844,,PRE-LEASE YOUR 2 BEDROOM FOR APRIL TODAY!!
6,(Midwest City),$844,,PRE-LEASE YOUR 2 BEDROOM FOR APRIL TODAY!!
7,(Midwest City),$844,,PRE-LEASE YOUR 2 BEDROOM FOR APRIL TODAY!!
8,(Yukon),$599,\n 1br -\n ...,Love Where You Live
9,(SWOSU),$539,\n 2br -\n ...,"Computer Lounge, Roommate matching services, H..."


___
Is there anything you don't like from this output?

I don't like what I get in my size category. I have two issues with this column. 
* First, the **\n** at the end and beginning of each line.
    * We can fix this by indexing the character string in the size column. This is similar to what with our listings list to pull the fifth item earlier in the lesson. Remember `listing[4].` 
    * Characters strings are similar except the values in the [] represent individual characters. For example, to start with the second chacter of the string, `string[1:]`. To get everything but the last 2 characters, you would use `string[:-2]`.
* Second, I am bothered by the fact that the size variable has two distinct pieces of information in it. That isn't very tidy data!
    * To fix this, We need to find a pattern in the text. You may have noticed that the number of bedrooms and square feet are separtated by a **-**.
    * Thanks to this handy pattern, we can use a function called **split** to separate out the the two distinct pieces of information. The syntax of split is **item_to_be_split.('separator')**. 
    
I have fixed this issues by using the code below. Read through it to see if you can understand each piece. Why did I need to use two indexes after the split? 

After you have a good understanding, run the code. 
___

In [26]:
titles = []
prices =[]
sizes = []
neighborhoods = []
bedrooms = []
sqfts = []

for listing in listings:
    #Pull info for each listing
    title = listing.find(attrs = {"class": "result-title hdrlnk"})
    price = listing.find(attrs = {"class":"result-price"})
    size = listing.find(attrs = {"class": "housing"})
    neighborhood = listing.find(attrs = {"class": "result-hood"})
    
    #Print out Title to show progress
    print('Title:',title.get_text())
    
    #Append each listing to a new list
    titles.append(title.get_text())
    if price != None:
        prices.append(price.get_text())
    else: 
        prices.append('None')
    if size != None:
        bedroom = size.get_text().split('-')[0][1:-3]
        bedrooms.append(bedroom)
        sizes.append(size.get_text())
        
        sqft = size.get_text().split('-')[1][2:-4]
        sqfts.append(sqft)
    else: 
        bedrooms.append('None')
        sqfts.append('None')
    if neighborhood != None:
        neighborhoods.append(neighborhood.get_text())    
    else: 
        neighborhoods.append('None')

Title: Spacious 2 bedroom apartment!  Call for specials today!
Title: Move-in in June for $99!
Title: Fitness Center
Title: Washer/Dryer In Unit and New Appliances Available
Title: Nice single family 2 bedroom house for rent
Title: Affordable Nightly,Weekly, and Extended Stay for a Spacious Studio
Title: Two Sparkling Pools, Two Laundry Care Centers, Courtyard Picnic Areas
Title: Nice single family 2 bedroom house for rent
Title: 2 bedroom 1 bathroom
Title: Tenemos apartamentos como te gustan a ti
Title: Pool, New State of the Art Fitness Center, New Community Center
Title: FREE RENT UNTIL  - GREAT SPECIALS
Title: 2 bedroom home for rent
Title: 2-bedroom duplex
Title: 4-bedroom house for rent
Title: Limited Access Community, Playground, Minutes From Quail Springs Mall
Title: SHORTEN YOUR COMMUTE IN THIS DEL CITY STUDIO!
Title: Just Reduced! Now Only $699
Title: Don't Miss Out On Our Amazing 1 MONTH FREE Special!
Title: Spacious closets, Central heat and air, Fire place in some units
Ti

___

### Challenge 11
Now,transform these lists (inluding the the two need ones into a dataframe. 
___

In [154]:
# Create your dataframe below. 
data =

___
Now, run the code below to print out your new dataframe. 
___

In [155]:
data.head(15)

Unnamed: 0,Bedrooms,Neighborhood,Price,SqFt,Title
0,3.0,,$1005,1062.0,"Dishwasher, Private Patio, Heat"
1,2.0,,$835,822.0,"24-Hour Availability, BBQ Area, On-site Mainte..."
2,2.0,,$849,920.0,Apply TODAY Get FEBRUARY FREE + a $500 VISA Gi...
3,2.0,(Yukon),$695,920.0,Our Residents Love Us So Will You!!
4,2.0,(Norman),$710,876.0,WE LOVE PETS!!
5,,(Midwest City),$844,,PRE-LEASE YOUR 2 BEDROOM FOR APRIL TODAY!!
6,,(Midwest City),$844,,PRE-LEASE YOUR 2 BEDROOM FOR APRIL TODAY!!
7,,(Midwest City),$844,,PRE-LEASE YOUR 2 BEDROOM FOR APRIL TODAY!!
8,1.0,(Yukon),$599,660.0,Love Where You Live
9,2.0,(SWOSU),$539,700.0,"Computer Lounge, Roommate matching services, H..."


That concludes the introduction to web scraping workshop. Hope it was helpful! Take a few moments to think about what you learned and how you could a apply to your own project. What are the next steps? Are they other concepts you still need to know? [Get in touch if you need additional guidance]((https://libraries.ou.edu/content/digital-scholarship-laboratory). 