In [None]:
# default_exp scraper

# The Guardian Scraper

> Scraping Premier League Previews from the Guardian.

<div style="font-size: 200px">
    
|            Issues                 |          Solutions          |
|------------------------------     |-------------------|
|   4 possible formats for previews(old format, new format,Cup's format and a particular format) |Select the appropriate html tags|
|   Preview titles are not the same ( we can find Squad Sheets or match preview)|Pick only the names of the teams and eliminate the rest|
|   The date of the match is not always available |Pick the preview date|
|   The order of the elements and labels are not the same |Using regex patterns to get information|
|   Missing values for betting odds |We treat the general case separately and we set up specific regex patterns for these particular cases|
|   Odds format is different|We treat the general case separately and we set up specific regex patterns for these particular cases|
|   We can find non-numeric values for Odds like (Evens,evens,Eve)|Replace evens by 1-1|
|   There are some previews that don't have author and text|For previews that have no text, we put 'n/a' (not available)|
|   The existence of previews for the FA CUP,Carabao Cup,Champions league,World Cup|Filter previews by title,link,topic,aside html section and preview text and allow only Premier League previews|
|   We are not sure if the names of the teams are the same as the ones in Opta|Set up a dictionary or check manually to map teams to their IDs|
|When we send many requests, the guardian server blocks your IP address, which is interpreted as a DDOS attack|Do a sleep of a random x seconds between requests or change your IP and work with rotating proxy|
</div >


### Import Libraries and Modules

In [None]:
# hide
from nbdev.showdoc import *

In [None]:
# export
import re
import dateparser
import pandas as pd
from typing import *
from requests_html import HTMLSession
from bs4 import BeautifulSoup
from time import sleep
from datetime import datetime

### Parser Class

##### This class is used to parse pages and has two functions:

1- <b> parse_page </b> function:retrieves the html format of a given web page link.

2- <b> get_next_page </b> function: retrieves the link to the next page and determines if it is the last page of previews in order to stop scraping. 


In [None]:
# export
class Parser:
    """
    A class to represent previews pages parser.

    ...

    Methods
    -------
    parse_page(page_url,session)
        returns the html format of the page.
    get_next_page(page)
        returns the link of the following page and if it's the last page.
    """

    @staticmethod
    def parse_page(page_url: str, session: HTMLSession) -> BeautifulSoup:
        """
        returns the html format of the page.

        Parameters
        ----------
        page_url : str
            the url of the page
        session : requests_html.HTMLSession
            the scraper session

        Returns
        -------
        page : bs4.BeautifulSoup
              the html format of the page

        """
        # Request the url
        request = session.get(page_url)
        # Get the html document of the page
        page = BeautifulSoup(request.text, "html.parser")
        return page

    @staticmethod
    def get_next_page(page: BeautifulSoup) -> Tuple[str, bool]:
        """
        returns the link of the following page and if it's the last page.

        Parameters
        ----------
        page : bs4.BeautifulSoup
            the html format of the page

        Returns
        -------
        url: str
          the url of the next page
        last_page: bool
          True if it's the last page, False otherwise.

        """
        # If we are at the last page , last_page = True else last_page = False
        last_page = False
        # Pick up the pagination HTML part
        pagination_section = page.find("div", {"class": "pagination__list"})
        # If we don't find the "next" button (it's the last page)
        # We are in the last page
        if not page.find("a", {"rel": "next"}):
            # We pick up the number of the page and we return the link
            html_location = dict({"aria-label": "Current page"})
            page_number = page.find("span", html_location).text
            url = (
                "https://www.theguardian.com/football/series/match-previews?page="
                + page_number
            )
            last_page = True
            return url, last_page
        # If it's not the last page, we pick up the link of the following page
        else:
            url = page.find("a", {"rel": "next"})["href"]
            return url, last_page

### PageExtractor Class

##### This class has five functions for extracting data from a given football preview:

1- <b> get_values_matching_regex </b> returns values that match a regex expression.

&emsp;Because the "Guardian" website has two possible formats, we defined two possible classifiers for the p tags <br>&emsp;containing the information to be extracted.<br>
&emsp;We go through each p section, and if we find the result, we return it; otherwise, a None is returned.<br>
&emsp;The result is a list of tuples, with each tuple representing a value that matches the regex pattern.<br> &emsp;Unsatisfied patterns for regexes that include <b>OR</b> conditions will be empty tuples. That's why you need to get rid of it.

2- <b> extract_teams_names </b> returns the names of the two teams in a football preview.

&emsp;The preview includes team names at the title level.
 <br>&emsp;example:
          &emsp;&emsp;{{Squad Sheets: Team A v Team B}} 
         or &emsp;&emsp;{{Team A v Team B: match preview}} 
         or &emsp;&emsp;{{Team A v Team B: Squad Sheets}}
<br>&emsp;As a result, our strategy is to delete the text preceding or following the names and recover each name <br>&emsp;individually.
<br>&emsp;If we were successful in obtaining the names, they will be returned in a Python dictionary; <br>&emsp;otherwise, the values will be 'n/a'(Not available).

3- <b> extract_text_authors </b>returns the text and author of a football preview.

&emsp;It's difficult to determine the position of the text, but it's almost certainly the block with the most <br>&emsp;characters.
<br>&emsp;To proceed, we store each paragraph and its size in a Python dictionary, and then we take the <br>&emsp;block with the largest size.
<br>&emsp;To be sure, we double-check by only accepting texts with a size greater than 160 because there <br>&emsp;are football previews with no text or author.
<br>&emsp;Furthermore, the author information is always under the text section, more specifically in a <br>&emsp;strong tag, so if the text does not exist, the author is missing as well.
If we were successful in <br>&emsp;obtaining the text and the author, they will be returned in a Python dictionary. Otherwise, the <br>&emsp;values will be 'n/a'(Not available).

4- <b> extract_preview_date </b> returns the date of publication of a football preview.

&emsp;We have distinguished two dates for the date of publication: the first is the date of publication, <br>&emsp;and the second is the date of the most recent modification.
In this sense, we go through the <br>&emsp;section where the two dates are located and take only the first and use 'dateparser' to convert <br>&emsp;the string into a date in "yyyy-mm-dd" format.
If we were successful in obtaining the date, it will <br>&emsp; be returned. Otherwise, the value will be 'n/a'(Not available).

5- <b> extract_match_infos </b> returns a football match information (venue, referee, odds).

&emsp;Here, we'll call the first function <b>get_values_matching_regex</b> , which will allow us to retrieve this<br>&emsp;information by specifying a regex expression for each.<br>&emsp;If this data is not available, the value will be 'n/a'. 

In [None]:
# export
class PageExtractor:
    """
    A class to represent an information extractor from a football preview.

    ...

    Methods
    -------
    get_values_matching_regex(page, regex)
        return all matched patterns from a preview page.
    extract_teams_names(title)
        returns team names from the preview title.
    extract_text_authors(page)
        returns the text and author of the preview.
    extract_preview_date(page)
        returns the publication date of the preview.
    extract_match_infos(page, venue_regex, referee_regex, odds_regex)
        returns a football match information (venue,referee,odds).
    """

    @staticmethod
    def get_values_matching_regex(
        page: BeautifulSoup, regex: str
    ) -> Union[List[str], None]:
        """
        returns all matched patterns from a preview page.

        Parameters
        ----------
        page: bs4.BeautifulSoup
            the html format of the page
        regex: str
            the regex expression

        Returns
        -------
        result: list of str
          matched values of the regex expression, None otherwise

        """
        # All Information are located in the "p tag" of html
        # We pick up all the p tags
        # some previews in 2009 have a different html tags and classes
        all_p_tags_new_formats = page.find_all("p", {"class": "dcr-bixwrd"})
        all_p_tags_old_format = page.select("div > p")
        # if exist
        if all_p_tags_new_formats:
            paragraphs = all_p_tags_new_formats
        else:
            paragraphs = all_p_tags_old_format

        for paragraph in paragraphs:
            # We pick up the string values located in the paragraph
            # For "odds" information, "Evens" or "Evs" are replaced by 1-1
            pattern_odds = re.compile("Evens|Evs", re.IGNORECASE)
            section = pattern_odds.sub("1-1", paragraph.text)
            # To extract our information regex pattern
            # To ignore case sensitivity we use re.I
            pattern_returned_values = re.compile(regex, re.IGNORECASE)
            # If a regex match is found, we return the list of values.
            # otherwise, an empty array is returned.
            if pattern_returned_values.findall(section):
                matching_result = pattern_returned_values.findall(section)
                # remove empty tuples from the list
                # example of a matching_result value
                # [('12-5', '11-10', '23-10', '', '')]
                result = [element for element in matching_result[0] if element]
                return result

    @staticmethod
    def extract_teams_names(title: str) -> Dict[str, str]:
        """
        returns team names from the preview title.

        Parameters
        ----------
        title: str
            the title of the preview

        Returns
        -------
        names: dict of str

        """
        # 3 possible formats for previews title
        # For example:
        # {Squad Sheets: Team A v Team B} or
        # {{Team A v Team B : match preview}} or
        # {{Team A v Team B : Squad sheets}}
        # We remove text before or after team names
        pattern = re.compile(
            "Squad Sheets:|: Squad[\s]sheets|Squad sheets|Squad sheet:|: match preview",
            re.IGNORECASE,
        )
        preview_title = pattern.sub("", title).strip()
        # Names are located in the title of the preview
        # Home team
        try:
            home_team = preview_title.split(" v ")[0]
        except Exception as e:
            home_team = "n/a"
        # Away team
        try:
            away_team = preview_title.split(" v ")[1].split("\t")[
                0
            ]  # for some preview we find team A v Team B \t date
        except Exception as e:
            away_team = "n/a"
        # we return names
        names = dict({"home": home_team, "away": away_team})
        return names

    @staticmethod
    def extract_text_authors(page: BeautifulSoup) -> Dict[str, str]:
        """
        returns the text and author of the preview.

        Parameters
        ----------
        page: bs4.BeautifulSoup
            the html format of the page

        Returns
        -------
        preview_text_author: dict of str

        """
        # Preview may not have text and author,
        # We initialize author and text to 'n/a' (not available),
        author = "n/a"
        text = "n/a"
        # all items are stored in a p tag
        # Some previews in 2009 have different html tags and classes
        all_p_tags_new_formats = page.find_all("p", {"class": "dcr-bixwrd"})
        all_p_tags_old_format = page.select("div > p")
        # if exist
        if all_p_tags_new_formats:
            all_p_tags = all_p_tags_new_formats
        else:
            all_p_tags = all_p_tags_old_format

        # it's quite difficult to determine which section is the text
        # the length of the text is usually the longest
        # dictionnary to store each p and its length
        length_texts = {}
        for p in all_p_tags:
            section = p.text
            length_texts[p] = len(section)

        # we pick the section with the largest size
        possible_text_section = max(length_texts, key=length_texts.get)
        # We double-check and only select texts with a size greater than 160
        if len(possible_text_section.text) > 160:
            text_section = possible_text_section
            text = text_section.text
            # the author name is located inside the text section
            # it is located in the strong tag
            possible_author_section = text_section.find("strong")
            # for some previews the author information is not found
            # if it's available we take it , else it will be 'n/a'
            if str(possible_author_section) != "None":
                author = possible_author_section.text

        preview_text_author = dict({"text": text, "author": author})
        return preview_text_author

    @staticmethod
    def extract_preview_date(page: BeautifulSoup) -> Union[datetime, str]:
        """
          returns the publication date of the preview.

        Parameters
        ----------
        page: bs4.BeautifulSoup
            the html format of the page

        Returns
        -------
        preview_date: datetime.date
          if not found 'n/a'

        """
        # there are 2 dates for the preview
        # the first is the date of publication
        # the second is the date of the last modification which is hidden
        # we pick only the first one
        try:
            # Some preview in 2009 have different html tags and classes
            html_new_location = dict({"class": "dcr-km9fgb"})
            html_old_location = dict({"itemprop": "datePublished"})
            dates_section_new_format = page.find("div", html_new_location)
            dates_section_old_format = page.find("time", html_old_location)
            if dates_section_new_format:
                dates_section = dates_section_new_format.strings
            else:
                dates_section = dates_section_old_format.strings

            for date in dates_section:
                preview_date = dateparser.parse(date).date()
                break
        except Exception as e:
            preview_date = "n/a"

        return preview_date

    @staticmethod
    def extract_match_infos(
        page: BeautifulSoup, venue_regex: str, referee_regex: str, odds_regex: str
    ) -> Dict[str, str]:
        """
          returns a football match information (venue,referee,odds).

        Parameters
        ----------
        page: bs4.BeautifulSoup
            the html format of the page
        venue_regex: str
            venue regex expression
        referee_regex: str
            referee regex expression
        odds_regex: str
            odds regex expression

        Returns
        -------
        match_infos: dict of str

        """
        # Extract venue, referee and odds values
        try:
            venue = PageExtractor.get_values_matching_regex(page, venue_regex)[
                0
            ].strip()
        except Exception as e:
            venue = "n/a"
        try:
            referee = PageExtractor.get_values_matching_regex(page, referee_regex)[
                0
            ].strip()
        except Exception as e:
            referee = "n/a"

        odds = PageExtractor.get_values_matching_regex(page, odds_regex)

        match_infos = dict({"venue": venue, "referee": referee, "odds": odds})
        return match_infos

### ScrapingTheGuardian Class

##### This class represents a scraper from the "Guardian" website and has 3 functions:

1- <b> calculate_betting_odds </b> returns decimal odds.

&emsp;In this section, we will calculate the odds derived from the football preview.
<br>&emsp;Considering the following example:
<br>&emsp;&emsp; ["9-20","29-5","6-5"] 
<br>&emsp;&emsp;We calculate each sport's rating separately using the following formula:
<br>&emsp;&emsp;&emsp; home = (9/20) + 1 
<br>&emsp;&emsp;&emsp; away = (29/5) + 1
<br>&emsp;&emsp;&emsp; draw = (6/5) + 1
<br>&emsp;If we were successful in obtaining decimal odds, they will be returned in a Python dictionary.<br>&emsp;Otherwise, the values will be 'n/a'(Not available).


2- <b> extract_preview_items </b>returns the entire contents of a football preview.

&emsp;In this section, we will call the functions defined in the PageExtractor class and return a Python dictionary containing all of this information.
<br>&emsp;But first, we use the <b>calculate_betting_odds</b> function to calculate the sports odds for the home team's victory, the away team's victory, and a draw.

"home team","away team","text","author","venue","referee","odds","odds home team","odds away team","odds draw", "preview date" are the values returned.

3- <b> extract_previews </b>returns all the information of all browsed previews.

&emsp;For a given page, we retrieve all the previews and go through them one by one, taking the link, title, subject, and aside section.
<br>&emsp;if the words "cup" or "champions league" do not belong in these sections, we process the preview
<br>&emsp;otherwise, we move on to the next preview.
<br>&emsp;The previous function, <b>extract_preview_items</b>, will be called here to extract the information from each preview, which will then be stored in a list called <br>&emsp;<b>all_previews_information</b>.

In [None]:
# export
class ScrapingTheGuardian:
    """
    A class to represent a scraper from the "Guardian" website.

    ...

    Attributes
    ----------
    session : requests_html.HTMLSession
        a web session
    VENUE_REGEX : str
        venue regex expression
    REFEREE_REGEX : str
        referee regex expression
    ODDS_REGEX : str
        odds regex expression

    Methods
    -------
    calculate_betting_odds(odds)
        returns decimal odds.
    extract_preview_items(page,title)
        returns all information of a football preview
    extract_previews(self,page)
        returns all the information of all browsed previews
    """

    # venue, referee, odds pattern regex
    # in some previews, all of the information is on the same line.
    VENUE_REGEX = "Venue(.*)Tickets|Venue(.*),|Venue(.*)"
    REFEREE_REGEX = "Referee(.*)This season's|Referee(.*)Last season's|Referee(.*)Odds|Referee(.*)|Ref(.*)Odds"
    # {Odds H 11-8 A 11-8 D 11-8}
    # {Odds Liverpool 11-8 Aston Villa 11-8 Draw 11-8}
    # missing label {Odds H 11-8 11-8 D 11-8}
    # missing value {Odds H 11-8 A 11-8}
    ODDS_REGEX = "Odds[\s]*.*[\s]+(\d{1,3}-\d{1,3})[\s]*.*[\s]+(\d{1,3}-\d{1,3})[\s]*.*[\s]+(\d{1,3}-\d{1,3})|Odds[\s]*.*[\s]+(\d{1,3}-\d{1,3})[\s]*.*[\s]+(\d{1,3}-\d{1,3})"

    def __init__(self):

        # Initialize session to start scraping
        self.session = HTMLSession()

    @staticmethod
    def calculate_betting_odds(odds: list) -> Dict[str, float]:
        """
          returns decimal odds.

        Parameters
        ----------
        odds: list of str
            odds values

        Returns
        -------
        betting_odds: dict of float

        """
        # Initialize betting odds to n/a (not available)
        # Some previews may not include odds
        odds_home = "n/a"
        odds_away = "n/a"
        odds_draw = "n/a"

        if odds is not None:  # If odds exist
            # example of odds:
            # {H 4-6 A 43-10 D 3-1}
            # {liverpool 4-6 Tottenham 43-10 Draw 3-1}
            # {H 4-6 43-10 D 3-1}
            # {H 4-6 A 43-10}
            # The formula will be (4/6)+1 , (43/10)+1 , (3/1)+1
            # Home team odds
            betting_odds_home = odds[0]
            try:
                odds_home = (
                    int(betting_odds_home.split("-")[0])
                    / int(betting_odds_home.split("-")[1])
                ) + 1
            except ZeroDivisionError:
                pass
            # Away team odds
            betting_odds_away = odds[1]
            try:
                odds_away = (
                    int(betting_odds_away.split("-")[0])
                    / int(betting_odds_away.split("-")[1])
                ) + 1
            except ZeroDivisionError:
                pass
            # if we have the normal format of odds
            # we will have 3 parts(odds_home,odds_away,odds_draw)
            if len(odds) == 3:
                # Draw odds
                betting_odds_draw = odds[2]
                try:
                    odds_draw = (
                        int(betting_odds_draw.split("-")[0])
                        / int(betting_odds_draw.split("-")[1])
                    ) + 1
                except ZeroDivisionError:
                    pass

        betting_odds = dict(
            {"odds_home": odds_home, "odds_away": odds_away, "odds_draw": odds_draw}
        )
        return betting_odds

    @staticmethod
    def extract_preview_items(page: BeautifulSoup, title: str) -> Dict[str, object]:
        """
          returns all information of a football preview

        Parameters
        ----------
        page: bs4.BeautifulSoup
            the html format of the page
        title: str
            the title of the preview

        Returns
        -------
        preview_items: dict of object

        """
        # meth1: extract team names
        names = PageExtractor.extract_teams_names(title)
        # Home team and  Away Team
        home_team = names["home"]
        away_team = names["away"]
        # meth2: extract match infos (venue,referee,odds)
        match_infos = PageExtractor.extract_match_infos(
            page,
            ScrapingTheGuardian.VENUE_REGEX,
            ScrapingTheGuardian.REFEREE_REGEX,
            ScrapingTheGuardian.ODDS_REGEX,
        )
        venue = match_infos["venue"]
        referee = match_infos["referee"]
        odds = match_infos["odds"]
        # meth3: extract text and author of the preview
        text_author = PageExtractor.extract_text_authors(page)
        text = text_author["text"]
        author = text_author["author"]
        # meth4: extract preview date
        preview_date = PageExtractor.extract_preview_date(page)
        # meth5: calculate betting odds
        betting_odds = ScrapingTheGuardian.calculate_betting_odds(odds)
        # Home team betting odds
        odds_home_team = betting_odds["odds_home"]
        # Away team betting odds
        odds_away_team = betting_odds["odds_away"]
        # Draw betting odds
        odds_draw = betting_odds["odds_draw"]
        # Return preview items
        preview_items = dict(
            {
                "home_team": home_team,
                "away_team": away_team,
                "text": text,
                "author": author,
                "venue": venue,
                "referee": referee,
                "odds": odds,
                "odds_home_team": odds_home_team,
                "odds_away_team": odds_away_team,
                "odds_draw": odds_draw,
                "preview_date": preview_date,
            }
        )
        return preview_items

    def extract_previews(self, page: BeautifulSoup) -> List[Dict[str, object]]:
        """
          returns all the information of all browsed previews

        Parameters
        ----------
        page: bs4.BeautifulSoup
            the html format of the page

        Returns
        -------
        preview_items: List of (dict of object)

        """
        all_previews_information = []
        # We pick all of the match previews on the webpage.
        previews = page.findAll("div", {"class": "fc-item__content"})
        # for each preview we extract its information
        for preview in previews:
            preview_items = {}
            # Pick up the preview link
            preview_link = preview.find("a")["href"]
            # Pick up the match preview page
            preview_page = Parser.parse_page(preview_link, self.session)
            # We need only Premier League Previews
            # To filter previews we need to Find the title of the preview
            # Champions league and Cups are not allowed
            preview_title = preview_page.find("h1").text
            # Check if "cup" or "Champions league" exists in:
            # title, link, preview topic section,preview aside section
            # we pick preview topic
            try:
                preview_topic = preview_page.find("div", {"class": "dcr-lwa3gj"}).text
            except Exception as e:
                # some previews in 2009 have different html tags
                preview_topic = preview_page.find("div", {"class": "submeta "}).text
            # we pick preview_aside
            try:
                preview_aside = preview_page.find(
                    "aside", {"data-gu-name": "title"}
                ).text
            except Exception as e:
                # some previews in 2009 have different html tags
                preview_aside = preview_page.find(
                    "div", {"class": "content__labels"}
                ).text
            # if the preview is not a cup or not for Champions league:
            # we proceed the extraction

            not_premier_league_found = False
            eliminated_matches = ["Champions League", "champions-league", "cup"]
            for word in eliminated_matches:
                # test if the word in the preview title
                if re.search(word, preview_title, re.IGNORECASE):
                    not_premier_league_found = True
                    break
                # test if the word in the preview link
                if re.search(word, preview_link, re.IGNORECASE):
                    not_premier_league_found = True
                    break
                # test if the word in the preview topic
                if re.search(word, preview_topic, re.IGNORECASE):
                    not_premier_league_found = True
                    break
                # test if the word in the preview aside
                if re.search(word, preview_aside, re.IGNORECASE):
                    not_premier_league_found = True
                    break
            # some previews include the type of competition in the text
            # we find FA Cup – Kick-off
            # so we want to eliminate these previews
            cup_in_text = PageExtractor.get_values_matching_regex(
                preview_page, "FA Cup – Kick-off"
            )

            if not not_premier_league_found and not cup_in_text:
                preview_items = ScrapingTheGuardian.extract_preview_items(
                    preview_page, preview_title
                )
                all_previews_information.append(preview_items)

        return all_previews_information

## USE CASE
#### scraping only one given page

In [None]:
url = "https://www.theguardian.com/football/series/match-previews?page=140"
scraper = ScrapingTheGuardian()
previews_infos = []
page = Parser.parse_page(url, scraper.session)
previews_infos = scraper.extract_previews(page)

### Visualize data

In [None]:
data =pd.DataFrame(previews_infos)

In [None]:
data

Unnamed: 0,home_team,away_team,text,author,venue,referee,odds,odds_home_team,odds_away_team,odds_draw,preview_date
0,Liverpool,Queens Park Rangers,Try telling Jamie Carragher that nothing is ri...,Andy Hunter,Anfield,Martin Atkinson,"[1-4, 12-1, 11-2]",1.25,13.0,6.5,2013-05-17
1,Chelsea,Everton,This will be the last hurrah for both Rafael B...,Dominic Fifield,Stamford Bridge,Anthony Taylor,"[7-10, 9-2, 3-1]",1.7,5.5,4.0,2013-05-17
2,Manchester City,Norwich City,Now that Roberto Mancini has been sacked by Ma...,Jamie Jackson,Carrow Road,Mark Halsey,"[4-11, 11-1, 9-2]",1.363636,12.0,5.5,2013-05-17
3,Manchester United,Swansea City,Playing away at Manchester United in the Premi...,Rich Flower,Old Trafford,Jon Moss,"[1-3, 11-1, 19-4]",1.333333,12.0,5.75,2013-05-10
4,Stoke City,Tottenham Hotspur,Tottenham travel to the Britannia Stadium to f...,Alex Sutch,Britannia Stadium,Kevin Friend,"[7-2, 10-11, 13-5]",4.5,1.909091,3.6,2013-05-10
5,Fulham,Liverpool,"Four defeats in a row, the most recent of whic...",Sachin Nakrani,Craven Cottage,Mark Halsey,"[3-1, 10-11, 13-5]",4.0,1.909091,3.6,2013-05-10
6,Norwich City,West Bromwich Albion,"Norwich are not in turmoil, according to their...",Rich Flower,Carrow Road,Howard Webb,"[6-5, 5-2, 5-2]",2.2,3.5,3.5,2013-05-10
7,Everton,West Ham United,David Moyes should receive a rousing and emoti...,Andy Hunter,Goodison Park,Mike Jones,"[8-15, 13-2, 10-3]",1.533333,7.5,4.333333,2013-05-10
8,Queens Park Rangers,Newcastle United,Newcastle United felt a little sore that their...,David Hytner,Loftus Road,Lee Probert,"[12-5, 13-10, 12-5]",3.4,2.3,3.4,2013-05-10
9,Aston Villa,Chelsea,With Chelsea chasing a top-four finish and Ast...,Stuart James,Villa Park,Lee Mason,"[7-2, 10-11, 14-5]",4.5,1.909091,3.8,2013-05-10


### Scraping all pages