Lab | Web Scraping Multiple Pages

Business goal:

Check the case_study_gnod.md file.
Make sure you've understood the big picture of your project:
- the goal of the company (Gnod),
- their current product (Gnoosic),
- their strategy, and
- how your project fits into this context.
Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

Instructions

Prioritize the MVP

In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

Expand the project

If you're done, you can try to expand the project on your own. Here are a few suggestions:

Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

Practice web scraping

As you've seen, scraping the internet is a skill that can get you all sorts of information. Here are some little challenges that you can try to gain more experience in the field:

Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: url ='https://en.wikipedia.org/wiki/Python'
Find the number of titles that have changed in the United States Code since its last release point: url = 'http://uscode.house.gov/download/download.shtml'
Create a Python list with the top ten FBI's Most Wanted names: url = 'https://www.fbi.gov/wanted/topten'
Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe: url = 'https://www.emsc-csem.org/Earthquake/'
List all language names and number of related articles in the order they appear in wikipedia.org: url = 'https://www.wikipedia.org/'
A list with the different kind of datasets available in data.gov.uk: url = 'https://data.gov.uk/'
Display the top 10 languages by number of native speakers stored in a pandas dataframe: url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
.ipynb_checkpoints		.ipynb_checkpoints
W scraping mult pages final.ipynb		W scraping mult pages final.ipynb
case-study-gnod.md		case-study-gnod.md
lab-web-scraping-multiple-pages.ipynb		lab-web-scraping-multiple-pages.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lab | Web Scraping Multiple Pages

Business goal:

Instructions

Prioritize the MVP

Expand the project

Practice web scraping

About

Releases

Packages

Languages

jecastrom/lab-web-scraping-multiple-pages

Folders and files

Latest commit

History

Repository files navigation

Lab | Web Scraping Multiple Pages

Business goal:

Instructions

Prioritize the MVP

Expand the project

Practice web scraping

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages