Skip to content

kaggle dataset: 25k+ matches, players & teams attributes for European Professional Football

Notifications You must be signed in to change notification settings

knbknb/football-data-collection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

knbknb 20171202:

How to use:

See my instructions

Collecting football data

Welcome !

This is an open source project aiming to provide tools for people to collect and format large set of data about football matches and players. The project is essentially a crawler written in Python and relies on two sources:

Using Scrapy

To facilitate the crawling, I use an open source python library called Scrapy. Have a look at the tutorials on their webpage if you're not already familiar with the lib.

Collection process

  • 1: collect the matches stats and team lineups using the Match Crawler
  • 2: build a list of unique player names
  • 3: loop this list with the Player Crawler. Create a list of the players you haven't successfully crawled and again follow the third step, adjusting the crawling paramaters. Repeat until you've got all the players you need.

Using Search Engines

Sometimes, a player name is rather complicated or not consistent accross different sources. To help identify a player, the algorithm can be parameterized to make use of search engines. Google is a prime choice thanks to its large database and tolerance to mispelling player names. Unfortunately, the Google API has a limited usage rate per day. Hence I suggest you use Yahoo or Bing first and only use Google for those players you stuggle to find.

About

kaggle dataset: 25k+ matches, players & teams attributes for European Professional Football

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%