# [8 of the best spring break party destinations](https://abcnews.go.com/GMA/Travel/best-spring-break-party-destinations/story?id=52904679)

### Web Scraping

##### Imports

In [1]:
from bs4 import BeautifulSoup
import requests

##### URLs

In [2]:
url = "https://abcnews.go.com/GMA/Travel/best-spring-break-party-destinations/story?id=52904679"

##### Requesting the Page

In [3]:
r = requests.get(url)

##### Get Text

In [4]:
t = r.text

##### Close the request

In [5]:
r.close()

##### Make BeautifulSoup

In [6]:
soup = BeautifulSoup(t)

##### What is Soup?

In [7]:
type(t)

str

In [8]:
str?

In [9]:
type(soup)

bs4.BeautifulSoup

In [10]:
BeautifulSoup?

##### Soup can `find` things

In [11]:
# find all hyperlinks
# 'a' stands for anchor tag, href stands for Hypertext REFrence
for link in soup.find_all(name='a', href=True):
    print(link['href'])

#
/
https://abcnews.go.com/Video
https://abcnews.go.com/US
https://abcnews.go.com/International
https://abcnews.go.com/Politics
https://abcnews.go.com/Lifestyle
https://abcnews.go.com/Entertainment
https://abcnews.go.com/VR
https://abcnews.go.com/Health
https://abcnews.go.com/Technology
https://abcnews.go.com/Sports
https://abcnews.go.com/alerts/weather
https://fivethirtyeight.com
#
https://www.goodmorningamerica.com
https://abcnews.go.com/WN
https://abcnews.go.com/Nightline
https://abcnews.go.com/2020
https://abcnews.go.com/ThisWeek
https://abcnews.go.com/TheView
https://abcnews.go.com/WhatWouldYouDo
#
https://abcnews.go.com/live/video/special-live-11
https://abcnews.go.com/live/video/special-live-2
https://abcnews.go.com/live/video/special-live-09
https://abcnews.go.com/live/video/special-live-07
https://abcnews.go.com/live/video/special-live-5
https://abcnews.go.com/live/video/special-live-06
#
http://disneyprivacycenter.com
https://disneyprivacycenter.com/notice-to-california-resid

# Get the List of Vacation Spots

In [12]:
# soup.find returns the first child (see docs)
soup.find(name='strong')

<strong>1. Cancun</strong>

In [13]:
# store the the result
result = soup.find(name='strong')

# display the type of the `result`
type(result)

bs4.element.Tag

In [14]:
# access the text in `result`
result.text

'1. Cancun'

In [15]:
# find all locations in the html
soup.find_all(name='strong')

[<strong>1. Cancun</strong>,
 <strong>2. Las Vegas</strong>,
 <strong>3. Jamaica</strong>,
 <strong>4. Miami</strong>,
 <strong>5. Dominican Republic</strong>,
 <strong>6. South Padre Island</strong>,
 <strong>7. Puerto Vallarta</strong>,
 <strong>8. Bahamas</strong>]

In [16]:
# store all results
results = soup.find_all(name='strong')

In [17]:
# display the type of `results`
type(results)

bs4.element.ResultSet

In [18]:
# print the text for each result in `results`
for result in results:
    print(result.text)

1. Cancun
2. Las Vegas
3. Jamaica
4. Miami
5. Dominican Republic
6. South Padre Island
7. Puerto Vallarta
8. Bahamas


In [19]:
# make a list of the result.text's
# define result_texts as an empty list
result_texts = []
# loop through the results...
for result in results:
    # and append them to the result_texts list
    result_texts.append(result.text)

# show result_texts
result_texts

['1. Cancun',
 '2. Las Vegas',
 '3. Jamaica',
 '4. Miami',
 '5. Dominican Republic',
 '6. South Padre Island',
 '7. Puerto Vallarta',
 '8. Bahamas']

In [20]:
# print the location names without the index
for location in result_texts:
    print(location[3:])

Cancun
Las Vegas
Jamaica
Miami
Dominican Republic
South Padre Island
Puerto Vallarta
Bahamas


In [21]:
# problem: What if I had a list of 10 locations?
# I wouldn't be able to use my logic above because it would keep an extra space in front of the name.
# use `split` to remove the index
for location in result_texts:
    print(location.split()[1:])

['Cancun']
['Las', 'Vegas']
['Jamaica']
['Miami']
['Dominican', 'Republic']
['South', 'Padre', 'Island']
['Puerto', 'Vallarta']
['Bahamas']


In [22]:
# store the location names without indices in a new list
no_index = []
for location in result_texts:
    no_index.append(location.split()[1:])

# show `no_index`
no_index

[['Cancun'],
 ['Las', 'Vegas'],
 ['Jamaica'],
 ['Miami'],
 ['Dominican', 'Republic'],
 ['South', 'Padre', 'Island'],
 ['Puerto', 'Vallarta'],
 ['Bahamas']]

In [23]:
# `no_index` is a list of lists
# want to join the strings in each list together to get a list of strings
for l in no_index:
    print(''.join(l))

Cancun
LasVegas
Jamaica
Miami
DominicanRepublic
SouthPadreIsland
PuertoVallarta
Bahamas


In [24]:
# it worked, but the join didn't put a space between the words
# try it again, but include a space
for l in no_index:
    print(' '.join(l))

Cancun
Las Vegas
Jamaica
Miami
Dominican Republic
South Padre Island
Puerto Vallarta
Bahamas


In [25]:
# much better
# let's save these locations in another list
locations = []
for l in no_index:
    locations.append(' '.join(l))

# show locations
locations

['Cancun',
 'Las Vegas',
 'Jamaica',
 'Miami',
 'Dominican Republic',
 'South Padre Island',
 'Puerto Vallarta',
 'Bahamas']

In [26]:
# check the `len`gth of locations (should == 8)
len(locations)

8

In [27]:
# that seemed like a lot of work just to extract the names from our results...
# well here's a one-liner
[' '.join(i.text.split()[1:]) for i in soup.find_all(name='strong')]

['Cancun',
 'Las Vegas',
 'Jamaica',
 'Miami',
 'Dominican Republic',
 'South Padre Island',
 'Puerto Vallarta',
 'Bahamas']

This bit of code is called [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions). It's very handy and is often faster than standard for-loops (when n is large).

In [28]:
%%timeit -n 1000
result_texts = []
for i in soup.find_all(name='strong'):
    result_texts.append(' '.join(i.text.split()[1:]))

1.53 ms ± 512 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [29]:
%%timeit -n 1000
[' '.join(i.text.split()[1:]) for i in soup.find_all(name='strong')]

1.2 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


As a precautionary side-note, don't use list comprehension if you don't understand it or the loop is too complex. You can get unexpected results, especially when attempting multiple loops.

# Get Information on the Locations

### Wikipedia

In [30]:
# wikipedia url
wiki_url = "https://en.wikipedia.org/wiki/"

In [31]:
# loop through the locations
for l in locations:
    # add the wiki url to the location name and replace spaces with underscores
    print(f"{wiki_url}{l.replace(' ', '_')}")

https://en.wikipedia.org/wiki/Cancun
https://en.wikipedia.org/wiki/Las_Vegas
https://en.wikipedia.org/wiki/Jamaica
https://en.wikipedia.org/wiki/Miami
https://en.wikipedia.org/wiki/Dominican_Republic
https://en.wikipedia.org/wiki/South_Padre_Island
https://en.wikipedia.org/wiki/Puerto_Vallarta
https://en.wikipedia.org/wiki/Bahamas


In [32]:
from time import sleep

In [33]:
# scrape each page and store in a dictionary
# define empty dictionary
location_wiki = {}
# loop through the locations...
for l in locations:
    # request the page from wikipedia
    r = requests.get(f"{wiki_url}{l.replace(' ', '_')}")
    # get the soup
    soup = BeautifulSoup(r.text)
    # close the request
    r.close()
    # store the soup in the dictionary
    location_wiki[l] = soup
    # wait 2 seconds before scraping again (this respects the robots.txt... and keeps us from getting blocked by wikipedia)
    sleep(2)

In [None]:
# 

In [38]:
for k,v in location_wiki.items():
    print(k)
    for p in v.find_all(name='p')[:5]:
        print(p.text)
    print('---------------------------------------------------------------------------------------------------------------')

Cancun


Cancún (/kænˈkuːn/ or /kɑːn-/;[2] Spanish pronunciation: [kaŋˈkun]) is a city in southeast Mexico on the northeast coast of the Yucatán Peninsula in the Mexican state of Quintana Roo. It is a significant tourist destination in Mexico[3] and the seat of the municipality of Benito Juárez. The city is on the Caribbean Sea and is one of Mexico's easternmost points. 

Cancún is just north of Mexico's Caribbean coast resort band known as the Riviera Maya. In older English-language documents, the city’s name is sometimes spelled "Cancoon," an attempt to convey the sound of the name.[4]

There are two possible translations of Cancún, based on the Mayan pronunciation kaan kun. The first translation is "nest of snakes". The second version and less accepted is "place of the gold snake".[5]

The shield of the municipality of Benito Juárez, which represents the city of Cancún, was designed by the Mexican-American artist Joe Vera.[6]
It is divided into three parts: the color blue symbolizes