Skip to content

mgerasimidis/Web-Scraping-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Web Scraping with Python

Getting data from the original website of greek super league about each game from seasons 2006-2007 to 2021-2022. My purpose is to do an exploratory data analysis on these data at a later point.

Prerequisites

  • Requests
  • BeautifulSoup
  • pandas

Walk-through of the project

Considering the schedule page, there are two things to consider at first:

Seasons:

seasons_1 until ... seasons_2

Super league games (and not play off's / play out's):

league

Here we get a list of links, about each season and only the Super league games:

sl_seasons

In each of these links, in order to get inside each game, we should get access in the circled arrows. Moreover, we should take care of the days that the games happened.

get_inside_games

In each game there were different tables, but I wanted the statistics table as shown in the picture below. What needed was to add in the end of each game link the following string "/statistics".

statistics_1

In each page, I used the following sources for information (specific fixture and statistics).

fixture statistics

From the statistics table, I got the information about the teams and the game stats

team_names statistics_part

Considering the game stats, each trow was related to a specific category (goals, assists, etc.)

each_trow_is_category

Inside each "trow" there was info about the category, home_team statistic and away_team statistic

inside_trow

Then I noticed that for each statistic, the "pattern" of the html code was the same

For goals

equal_class_1

For headers

equal_class_2(headers)

The final step before scraping the information, was to create an empty dataframe with the appropriate columns

empty_df

The code

coding

The obtained dataframe

DATAFRAME

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published