## HTML Structure

* Each activity (like, dislike, or subscription) is contained within a div with the class "content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1".
* Inside this div, we have the activity type (Liked, Subscribed), followed by the video/channel title, channel name, and date.
* The video/channel title and channel name are wrapped in `<a>` tags, which contain the URLs.

## divs

* In HTML, div elements are used to create sections or containers. Here, each div represents a single activity entry.
* The classes applied to these divs (mdl-cell, etc.) suggest this is using the Material Design Lite framework for styling.


In [1]:
from collections import Counter

from utility.logging import setup_logging
from constants import FileDirectory, Google
from utility.file_manager import FileManager

logger = setup_logging()

In [4]:
def analyse_activity_types(soup):
    """Analyses the activity types of user and returns a dictionary with activity types and their counts"""
    activity_elements = soup.find_all(
        "div", class_="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1"
    )
    activity_types = Counter()

    for element in activity_elements:
        activity_type = element.contents[0].strip()
        activity_types[activity_type] += 1

    return dict(activity_types)

In [5]:
file_manager = FileManager()
soup = file_manager.load_file(FileDirectory.MANUAL_EXPORT_PATH, Google.RAW_HTML_DATA)

activity_types = analyse_activity_types(soup)
logger.info(f"Activity types found: {activity_types}")

[2024-06-28 12:19:30] [INFO]  Successfully loaded file from /Users/hadid/Library/Mobile Documents/com~apple~CloudDocs/Shared/ETL/MyActivity.html
[2024-06-28 12:19:30] [INFO]  Activity types found: {'Liked': 9568, 'Subscribed to': 668, 'Disliked': 265, 'Voted on': 194, 'Saved': 2, 'Voted on a post that is no longer available': 2}
