# G.2 - Additional Explorations - Validation of Goals against Top 101

This notebook contains further additional explorations of our network and data. In this specific notebook, we consider the validity of our sample by comparing the top 101 most popular goals in our network (measured by the 'want to do it' count) against a pre-set list on the website, containg the top 101 most popular goals defined by the same measure we are using. The conclusions drawn based on the analysis conducted in this notebook have been used in the discussion section of our paper.

In [1]:
import pandas as pd
import networkx as nx
import requests as rq
import json
import re
import pickle
from pathlib import Path
import urllib.parse
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

## Retrieving the Top 101 Most Popular Goals by 'Want to Do'

To scrape this part of the website, we need to log in again.

In [3]:
# creating cookies needed
driver = webdriver.Chrome()
driver.get("https://dayzeroproject.com")

# pause here so you can log in manually
input("Log in in the browser, then press Enter here...")

# after login, save cookies
cookies = driver.get_cookies()
cookie_file = Path("cookies.json")
cookie_file.write_text(json.dumps(cookies, indent=2))

driver.quit()
print("Saved cookies to cookies.json")

Log in in the browser, then press Enter here... 


Saved cookies to cookies.json


After creating and saving the cookies to simulate a log in, we can use selenium to scrape the page we need.

In [4]:
# -----------------------------
# 1. Setup Selenium
# -----------------------------
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(options=chrome_options)

# -----------------------------
# 2. Go to the site first
# -----------------------------
driver.get("https://dayzeroproject.com")

# -----------------------------
# 3. Add cookies for login
# -----------------------------
# path to your saved cookie file
cookie_file = Path("cookies.json")  # adjust path if needed

# load cookies from JSON file
with cookie_file.open("r", encoding="utf-8") as f:
    cookies = json.load(f)

# add cookies to the Selenium driver
for cookie in cookies:
    driver.add_cookie(cookie)

# Refresh to apply cookies
driver.refresh()
time.sleep(2)


# -----------------------------
# 4. Go to the Top 101 goals page
# -----------------------------
driver.get("https://dayzeroproject.com/top101/todo")
time.sleep(3)

# -----------------------------
# 5. Extract ONLY goals in <span class="top101-goalname2">
# -----------------------------
soup = BeautifulSoup(driver.page_source, "html.parser")

goal_spans = soup.select("span.top101-goalname2")

top101_goals = []
for span in goal_spans:
    link_el = span.find("a")
    if not link_el:
        continue

    title = link_el.get_text(strip=True)

    href = link_el.get("href", "")              # e.g. "/goal/ZASkQdso"
    goal_id = href.split("/")[-1] if href else None

    top101_goals.append({
        "title": title,
        "goal_id": goal_id
    })

print(f"Collected {len(top101_goals)} items.")


Collected 101 items.


Next, we assign a rank to the goals, so that we can later compare them to our rank.

In [6]:
for idx, item in enumerate(top101_goals, start=1):
    item["rank"] = idx

## Comparing This to the Most Popular Goals (most want to do) in Our Network

In [None]:
url = "https://raw.githubusercontent.com/nicosrp/The-Architecture-of-Aspiration-A-Network-Perspective-on-Human-Goals/main/Networks/dayzero_network.pkl"
response = rq.get(url)
G = pickle.loads(response.content)

In [9]:
len(G.nodes())

2890

In [13]:
# Collect (node_id, wants_to_do, title) for all nodes
nodes_data = []
for n in G.nodes():
    wants = G.nodes[n].get("wants_to_do", 0)
    title = G.nodes[n].get("title", None)
    nodes_data.append((n, wants, title))

# Sort by wants_to_do descending
sorted_nodes = sorted(nodes_data, key=lambda x: x[1], reverse=True)

# Take top 101 and add rank
top101_goals_network = []
for idx, (goal_id, wants, title) in enumerate(sorted_nodes[:101], start=1):
    top101_goals_network.append({
        "rank": idx,
        "goal_id": goal_id,
        "title": title,
        "wants_to_do": wants
    })

To be able to compare the two ranked lists, we create a dataframe and export it to excel for manual investigation.

In [17]:
import pandas as pd

# Convert each list of dicts to a DataFrame
df_network = pd.DataFrame(top101_goals_network)
df_scrape = pd.DataFrame(top101_goals)

# Add prefixes so columns are easy to distinguish
df_network = df_network.add_prefix("network_")
df_scrape = df_scrape.add_prefix("scrape_")

# Concatenate side by side
df_side_by_side = pd.concat([df_network, df_scrape], axis=1)

# Export to Excel
df_side_by_side.to_excel("top101_side_by_side2.xlsx", index=False)

df_side_by_side.head()



Unnamed: 0,network_rank,network_goal_id,network_title,network_wants_to_do,scrape_title,scrape_goal_id,scrape_rank
0,1,ZASkQdso,Donate blood,12150,Donate blood,ZASkQdso,1
1,2,CsM74nCv,Get a tattoo,10458,Get a tattoo,CsM74nCv,2
2,3,nnEnjr7O,Leave an inspirational note inside a book for ...,9856,"Answer the ""50 Questions That Will Free Your M...",CgdCnAZ5,3
3,4,AkkEn2a2,See the Northern Lights,9451,Write a letter to myself to open in 10 years,Ar7Czx43,4
4,5,dMFs7rSe,Make a new friend,7325,Sleep under the stars,2xEGAx77,5


Based on the manual exploration conducted using the excel file, we conclude that while there were only around 36% of the goals from the website’s top 101 list found in our network, the top two most popular goals were the same, and all of our network’s top ten most popular goals are found within the website’s top 101 goals. This shows us that while our sample is not perfectly representative, there is sufficient alignment at the to indicate that our sample still captures the platform’s most salient goals, supporting its use as a broadly valid representation of the website’s overall goal landscape.