### How to Run
1. Click "Run All Cells"

### Step by Step Algorithm
1. import requests
2. import pandas
3. import BeautifulSoup
4. make a request to a specific stackoverflow question page such as the following url: <br>
`'https://stackoverflow.com/questions/419163/what-does-if-name-main-do'` <br>
5. parse the response from the request using `'html.parser'`
6. find all `div`'s with the `'class'='s-prose js-post-body'`, save these in an array called posts
7. the first post is the question itself
8. the second post is the most popular / highest rated answer (response).
9. the rest of the posts are responses in decreasing popularity.

### Next Steps
1. The contents of the posts require some cleaning of html tags and preprocessing before further data wrangling can be carried out.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

response = requests.get(url='https://stackoverflow.com/questions/419163/what-does-if-name-main-do')
soup = BeautifulSoup(response.text,"html.parser")

posts = soup.find_all('div',{'class':'s-prose js-post-body'})

In [2]:
question = posts[0].contents
question

['\n',
 <p>Given the following code, what does the <code>if __name__ == "__main__":</code> do?</p>,
 '\n',
 <pre><code># Threading example
 import time, thread
 
 def myfunction(string, sleeptime, lock, *args):
     while True:
         lock.acquire()
         time.sleep(sleeptime)
         lock.release()
         time.sleep(sleeptime)
 
 if __name__ == "__main__":
     lock = thread.allocate_lock()
     thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
     thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))
 </code></pre>,
 '\n']

In [3]:
popular_answer = posts[1].contents
popular_answer

['\n',
 <h1>Short Answer</h1>,
 '\n',
 <p>It's boilerplate code that protects users from accidentally invoking the script when they didn't intend to. Here are some common problems when the guard is omitted from a script:</p>,
 '\n',
 <ul>
 <li><p>If you import the guardless script in another script (e.g. <code>import my_script_without_a_name_eq_main_guard</code>), then the second script will trigger the first to run <em>at import time</em> and <em>using the second script's command line arguments</em>. This is almost always a mistake.</p>
 </li>
 <li><p>If you have a custom class in the guardless script and save it to a pickle file, then unpickling it in another script will trigger an import of the guardless script, with the same problems outlined in the previous bullet.</p>
 </li>
 </ul>,
 '\n',
 <h1>Long Answer</h1>,
 '\n',
 <p>To better understand why and how this matters, we need to take a step back to understand how Python initializes scripts and how this interacts with its module 

In [4]:
text = []
for post in posts:
    print(post.contents)

['\n', <p>Given the following code, what does the <code>if __name__ == "__main__":</code> do?</p>, '\n', <pre><code># Threading example
import time, thread

def myfunction(string, sleeptime, lock, *args):
    while True:
        lock.acquire()
        time.sleep(sleeptime)
        lock.release()
        time.sleep(sleeptime)

if __name__ == "__main__":
    lock = thread.allocate_lock()
    thread.start_new_thread(myfunction, ("Thread #: 1", 2, lock))
    thread.start_new_thread(myfunction, ("Thread #: 2", 2, lock))
</code></pre>, '\n']
['\n', <h1>Short Answer</h1>, '\n', <p>It's boilerplate code that protects users from accidentally invoking the script when they didn't intend to. Here are some common problems when the guard is omitted from a script:</p>, '\n', <ul>
<li><p>If you import the guardless script in another script (e.g. <code>import my_script_without_a_name_eq_main_guard</code>), then the second script will trigger the first to run <em>at import time</em> and <em>using the se