Skip to content

A small Web scraping and RegEx exercise to mine the japanese hardware and softare video game sales.

Notifications You must be signed in to change notification settings

ordovas/vg_jp_sales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Japanese Video Game Sales - A Web scraping & RegEx exercise in Python

I present here a small and fun exercise with high doses of nerdiness. This is done to practice web scraping techniques and regular expressions. The objective: to build an interesting database of weekly hardware and the top 30 software video game sales in Japan.

Here is the premise. The user Chris1964 in the ResetEra forums collects and posts each week the sales that appears in the Famitsu Magazine (previously Media Create as well but they stopped giving these numbers). This publication lists the weekly top 30 video game sales and the console sales each week. All the weekly posts are listed in another post.

These posts are written with the same style, so this constitutes a nice Web scraping & RegEx exercise.

In this repo I show all the steps to create functions in order to perform recursively the data mining using BeautifulSoup and RegEx. After these steps you will be able to obtain a weekly software and hardware datasets to play with. I divided the exercise in two notebooks: one to obtain the software data and another to mine the hardware.

Tom Nook

About

A small Web scraping and RegEx exercise to mine the japanese hardware and softare video game sales.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published