# Bite 266. Composition, Inheritance, Abstract Base Class, what?

   It’s not as bad as that sounds, really. If you don’t know the difference between composition and inheritance, I would recommend reading up on it from Real Python. As most of their articles, the subject is covered pretty thoroughly!
The scenario:

   So you’ve been tasked with scraping some presidential polling sites. The plan is to create a scraper that can be used on multiple sites.

   I’ve already created the following namedtuples, but you will need to add type hints to them:

   `Candidate`, `LeaderBoard`, and `Poll`

   I’ve never tried to add type hinting to namedtuples, so it was a great learning experience for me and I hope to pass that experience along to you.

   The plan is to create the following core classes:

- File:\
    Variables:
        name: str
        path: Path
    Methods:
        data -> Optional[str]
- Web:\
    Variables:
        url: str
        file: File
    Methods:
        data -> Optional[str]
        soup -> Soup
- Site(ABC):\
    Variables:
        web: Web
    Methods:
        find_table -> str
        parse_rows -> Union[List[LeaderBoard], List[Poll]]
        polls -> Union[List[LeaderBoard], List[Poll]]
        stats

Site is an abstract base class which decorates some methods with abstractmethods!

If you are not familiar with Abstract Base Classes, read up on the documentation: `ABC`

__Adding new parsers__

   After creating the core of the application, you will have to create parsers for The New York Times and RealClearPolitics. Don’t be scared by all the data, we’re only interested in the Current State of the Race table from NYTimes and the third table from RCP. Since the tables in RCP all pretty much have the same layout, your parser should work with any of them, but that won’t be checked.

   The parsers should derive from the Site class. While coding this, I was able to get the `find_table` method to work for all sites, so that one is not decorated with `@abstractmethod`. The other methods however, need to be overwritten in order for them to be instantiated.

- RealClearPolitics(Site):\
    Variables:
        - web: Web
    Methods:
        - find_table -> str
        - parse_rows -> List[Poll]
        - polls -> List[Poll]
        - stats
- NYTimes(Site):\
    Variables:
        - web: Web
    Methods:
        - find_table -> str
        - parse_rows -> List[Poll]
        - polls -> List[Poll]
        - stats

__The output format__

   Each of the two different parsers will have different outputs. We’ll just keep it simple, but there are some rules to adhere to. First, this is what sample output from each should look like.

   __NYTimes__

```
NYTimes
=================================

                   Pete Buttigieg
---------------------------------
National Polling Average: 10%
       Pledged Delegates: 25
Individual Contributions: $76.2m
    Weekly News Coverage: 3

                   Bernie Sanders
---------------------------------
    etc..
        Weekly News Coverage: 3
```

__Things to note about this output:__

- Starts and ends with blank lines
- There is a similar output for each of the remaining candidates:
    - Bernie Sanders
    - Joseph R. Biden Jr.
    - Tulsi Gabbard
- Not shown here, but there is a blank line between each candidate
- There are 33 equal (=) signs and hyphens (-) in the dividers
- The name of the candidate is right aligned
- The data lines are all right aligned
- The values from the data lines are all left aligned

etc.. is just a place holder, indicating that not all data was shown

__RealClearPolitics__

```
RealClearPolitics
=================
    Biden: 214.0
  Sanders: 142.0
  Gabbard: 6.0
```

__Some things to note here.__

- Starts and ends with blank line
- This is the whole output for this scraper
- There are as many equal (=) signs as the length of the title
- The candidate last names are right aligned


__Time to start coding__

That's it. I've scattered a generous amount of docstring all over the code to try and make it as explicit as possible in order to help you out. When you complete this bite, you will have learned and or gained more experience with:

- Object Oriented Programming
- Dataclasses
- Inheritance
- Composition
- Abstract Base Classes
    - The @abstractmethod decorator
- Type hinting namedtuples
- String formatting
- Web scraping with BeautifulSoup

In [53]:
from abc import ABC, abstractmethod
from collections import namedtuple
from dataclasses import dataclass
from datetime import date
from os import getenv
from pathlib import Path
from typing import Any, List, Optional
from urllib.request import urlretrieve

from bs4 import BeautifulSoup as Soup  # type: ignore

TMP = getenv("TMP", "/tmp")
TODAY = date.today()
Candidate = namedtuple("Candidate", "name votes")
LeaderBoard = namedtuple(
    "LeaderBoard", "Candidate Average Delegates Contributions Coverage"
)
Poll = namedtuple(
    "Poll",
    "Poll Date Sample Sanders Biden Gabbard Spread",
)


@dataclass
class File:
    """File represents a filesystem path.

    Variables:
        name: str -- The filename that will be created on the filesystem.
        path: Path -- Path object created from the name passed in.

    Methods:
        [property]
        data: -> Optional[str] -- If the file exists, it returns its contents.
            If it does not exist, it returns None.
    """
    name: str
    path: Path = Path(TMP)
    
    def __post_init__(self):
        name_with_date = str(TODAY) + '_' + self.name
        self.path = self.path / name_with_date
    
    @property
    def data(self) -> Optional[str]:
        
        if self.path.exists():
            with open(self.path, 'r') as file:
                content = file.read()
            return content
        else:
            return None
    
@dataclass
class Web:
    """Web object.

    Web is an object that downloads the page from the url that is passed
    to it and stores it in the File instance that is passed to it. If the
    File already exists, it just reads the file, otherwise it downloads it
    and stores it in File.

    Variables:
        url: str -- The url of the web page.
        file: File -- The File object to store the page data into.

    Methods:
        [property]
        data: -> Optional[str] -- Reads the text from File or retrieves it from the
            web if it does not exists.

        [property]
        soup: -> Soup -- Parses the data from File and turns it into a BeautifulSoup
            object.
    """
    url: str
    file: File
        
    @property
    def data(self) -> Optional[str]:
        """Reads the data from the File object.

        First it checks if the File object has any data. If it doesn't, it retrieves
        it and saves it to the File. It then reads it from the File and returns it.

        Returns:
            Optional[str] -- The string data from the File object.
        """
        pass

    @property
    def soup(self) -> Soup:
        """Converts string data from File into a BeautifulSoup object.

        Returns:
            Soup -- BeautifulSoup object created from the File.
        """
        pass


Working with type hints in python:

- __Optional__ is used when you want to define a specific type or None.


In [54]:
file = File('teste.html')

In [56]:
file.path

WindowsPath('C:/Users/b5bd/AppData/Local/Temp/2022-04-19_teste.html')

In [None]:
@dataclass
class Web:
    """Web object.

    Web is an object that downloads the page from the url that is passed
    to it and stores it in the File instance that is passed to it. If the
    File already exists, it just reads the file, otherwise it downloads it
    and stores it in File.

    Variables:
        url: str -- The url of the web page.
        file: File -- The File object to store the page data into.

    Methods:
        [property]
        data: -> Optional[str] -- Reads the text from File or retrieves it from the
            web if it does not exists.

        [property]
        soup: -> Soup -- Parses the data from File and turns it into a BeautifulSoup
            object.
    """
    pass

    @property
    def data(self) -> Optional[str]:
        """Reads the data from the File object.

        First it checks if the File object has any data. If it doesn't, it retrieves
        it and saves it to the File. It then reads it from the File and returns it.

        Returns:
            Optional[str] -- The string data from the File object.
        """
        pass

    @property
    def soup(self) -> Soup:
        """Converts string data from File into a BeautifulSoup object.

        Returns:
            Soup -- BeautifulSoup object created from the File.
        """
        pass


class Site(ABC):
    """Site Abstract Base Class.

    Defines the structure for the objects based on this class and defines the interfaces
    that should be implemented in order to work properly.

    Variables:
        web: Web -- The web object stores the information needed to process
            the data.

    Methods:
        find_table: -> str -- Parses the Web object for table elements and
            returns the first one that it finds unless an integer representing
            the required table is passed.

        [abstractmethod]
        parse_rows: -> Union[List[LeaderBoard], List[Poll]] -- Parses a BeautifulSoup
            table element and returns the text found in the td elements as
            namedtuples.

        [abstractmethod]
        polls: -> Union[List[LeaderBoard], List[Poll]] -- Does the parsing of the table
            and rows for you. It takes the table index number if given, otherwise
            parses table 0.

        [abstractmethod]
        stats: -- Formats the results from polls into a more user friendly
            representation.
    """
    pass

    def find_table(self, loc: int = 0) -> str:
        """Finds the table elements from the Soup object

        Keyword Arguments:
            loc {int} -- Parses the Web object for table elements and
                returns the first one that it finds unless an integer representing
                the required table is passed. (default: {0})

        Returns:
            str -- The html table
        """
        pass

    def parse_rows(self, table: Soup) -> List[Any]:
        """Abstract Method
        
        Parses the row data from the html table.

        Arguments:
            table {Soup} -- Parses a BeautifulSoup table element and
                returns the text found in the td elements as NamedTuple.

        Returns:
            List[NamedTuple] -- List of NamedTuple that were created from the
                table data.
        """
        pass

    def polls(self, table: int = 0) -> List[Any]:
        """Abstract Method

        Parses the data

        The find_table and parse_rows methods are called for you and the table index
        that is passed to it is used to get the correct table from the soup object.

        Keyword Arguments:
            table {int} -- Does the parsing of the table and rows for you.
                It takes the table index number if given, otherwise parses table 0.
                (default: {0})

        Returns:
            List[NamedTuple] -- List of NamedTuple that were created from the
                table data.
        """
        pass

    def stats(self, loc: int = 0):
        """Abstract Method
        
        Produces the stats from the polls.

        Keyword Arguments:
            loc {int} -- Formats the results from polls into a more user friendly
            representation.
        """
        pass


@dataclass
class RealClearPolitics(Site):
    """RealClearPolitics object.

    RealClearPolitics is a custom class to parse a Web instance from the
    realclearpolitics website.

    Variables:
        web: Web -- The web object stores the information needed to process
            the data.

    Methods:
        find_table: -> str -- Parses the Web object for table elements and
            returns the first one that it finds unless an integer representing
            the required table is passed.

        parse_rows: -> List[Poll] -- Parses a BeautifulSoup table element and
            returns the text found in the td elements as Poll namedtuples.

        polls: -> List[Poll] -- Does the parsing of the table and rows for you.
            It takes the table index number if given, otherwise parses table 0.

        stats: -- Formats the results from polls into a more user friendly
            representation:

            Example:

            RealClearPolitics
            =================
                Biden: 214.0
              Sanders: 142.0
              Gabbard: 6.0

    """

    pass

    def parse_rows(self, table: Soup) -> List[Poll]:
        """Parses the row data from the html table.

        Arguments:
            table {Soup} -- Parses a BeautifulSoup table element and
                returns the text found in the td elements as Poll namedtuples.

        Returns:
            List[Poll] -- List of Poll namedtuples that were created from the
                table data.
        """
        pass

    def polls(self, table: int = 0) -> List[Poll]:
        """Parses the data

        The find_table and parse_rows methods are called for you and the table index
        that is passed to it is used to get the correct table from the soup object.

        Keyword Arguments:
            table {int} -- Does the parsing of the table and rows for you.
                It takes the table index number if given, otherwise parses table 0.
                (default: {0})

        Returns:
            List[Poll] -- List of Poll namedtuples that were created from the
                table data.
        """
        pass

    def stats(self, loc: int = 0):
        """Produces the stats from the polls.

        Keyword Arguments:
            loc {int} -- Formats the results from polls into a more user friendly
            representation.

        """
        pass


@dataclass
class NYTimes(Site):
    """NYTimes object.

    NYTimes is a custom class to parse a Web instance from the nytimes website.

    Variables:
        web: Web -- The web object stores the information needed to process
            the data.

    Methods:
        find_table: -> str -- Parses the Web object for table elements and
            returns the first one that it finds unless an integer representing
            the required table is passed.

        parse_rows: -> List[LeaderBoard] -- Parses a BeautifulSoup table element and
            returns the text found in the td elements as LeaderBoard namedtuples.

        polls: -> List[LeaderBoard] -- Does the parsing of the table and rows for you.
            It takes the table index number if given, otherwise parses table 0.

        stats: -- Formats the results from polls into a more user friendly
            representation:

            Example:

            NYTimes
            =================================

                               Pete Buttigieg
            ---------------------------------
            National Polling Average: 10%
                   Pledged Delegates: 25
            Individual Contributions: $76.2m
                Weekly News Coverage: 3

    """

    web: Web

    def parse_rows(self, table: Soup) -> List[LeaderBoard]:
        """Parses the row data from the html table.

        Arguments:
            table {Soup} -- Parses a BeautifulSoup table element and
                returns the text found in the td elements as LeaderBoard namedtuples.

        Returns:
            List[LeaderBoard] -- List of LeaderBoard namedtuples that were created from
            the table data.
        """
        pass

    def polls(self, table: int = 0) -> List[LeaderBoard]:
        """Parses the data

        The find_table and parse_rows methods are called for you and the table index
        that is passed to it is used to get the correct table from the soup object.

        Keyword Arguments:
            table {int} -- Does the parsing of the table and rows for you.
                It takes the table index number if given, otherwise parses table 0.
                (default: {0})

        Returns:
            List[LeaderBoard] -- List of LeaderBoard namedtuples that were created from
                the table data.
        """
        pass

    def stats(self, loc: int = 0):
        """Produces the stats from the polls.

        Keyword Arguments:
            loc {int} -- Formats the results from polls into a more user friendly
            representation.
        """
        pass


def gather_data():
    rcp_file = File("realclearpolitics.html")
    rcp_url = (
        "https://bites-data.s3.us-east-2.amazonaws.com/2020-03-10_realclearpolitics.html"
    )
    rcp_web = Web(rcp_url, rcp_file)
    rcp = RealClearPolitics(rcp_web)
    rcp.stats(3)

    nyt_file = File("nytimes.html")
    nyt_url = (
        "https://bites-data.s3.us-east-2.amazonaws.com/2020-03-10_nytimes.html"
    )
    nyt_web = Web(nyt_url, nyt_file)
    nyt = NYTimes(nyt_web)
    nyt.stats()


if __name__ == "__main__":
    gather_data()