# presidents

This dataset comes from the [Sleep and Dream Database (SDDb)](https://sleepanddreamdatabase.org). It merges multipled datasets that were collected from YouGov surveys that asked for dreams of US presidents or presidential candidates. See the krank [GitHub Issue](https://github.com/remrama/krank/issues/) and the [presidents docs page](https://remrama.github.io/krank/corpora/presidents) for more info.

The SDDb source file comes from a [Zenodo archive](https://doi.org/10.5281/zenodo.18076716). See the description there for details on how the data was initially collected.

## Important context

- Dream reports for dreams about specific presidents.
- Collected from a YouGov survey.

## Known issues

- Duplicated participant IDs treated as separate.
- Potentially reports that are not reports (non-responses or political opinions).

## Setup

### Load required packages.

In [4]:
import hashlib
import os
import re
import sys
import textwrap
from copy import copy
from datetime import datetime, timezone

import pandas as pd
import pooch

from IPython.display import display, IFrame

sys.path.append(os.path.abspath(".."))
import utils_dreambank

Identify (most) constants.

Identify the SDDb dataset IDs of the datasets that will be included.

In [25]:
DATASETS_IDS = [
    "Barack Obama Dreams 2008",
    "Hillary Clinton Dreams 2008",
    "Trump Dreams 2016-2017",
]

In [6]:
CACHE_DIR = pooch.os_cache("pooch").joinpath("krank").joinpath("sddb")
DOWNLOAD_URL = "https://zenodo.org/records/18076716/files/dream_search_2025-12-28T15_50_30.592Z.csv?download=1"
KNOWN_HASH = "md5:4ecfb8cd2a83eabe75d2b6537ca846b6"

## Load

### Download source data (all of SDDb)

Use pooch to download/cache the source SDDb file and read in the cached filename.

In [9]:
# fname = pooch.retrieve(url=DOWNLOAD_URL, known_hash=KNOWN_HASH, path=CACHE_DIR)
fname = "../../../../Downloads/dream_search_2025-12-28T15_50_30.592Z.csv"

Use pandas to read in the full SDDb file.

In [None]:
df = pd.read_csv(fname, parse_dates=["Dream Date"], encoding="utf-8")


Subset out datasets.

In [26]:
df = df[df["Survey Name"].isin(DATASETS_IDS)]

Drop lots of columns (see the SDDb [prepare.ipynb](../collections/sddb/prepare.ipynb) file for details).

Note the `Dream Date` column actually varies for these datasets, but it is most likely time of survey submission rather than date of the dream, in which case not worth keeping.

In [None]:
df = df.drop(
    columns=[
        "Dream Report ID",
        "Title",
        "Gender",
        "Age",
        "Word Count",
        "Dream Date",
        "Survey ID",
        "Question",
        "Categories",
    ],
)

In [None]:
df = df.rename(
    columns={
        "Survey Name": "dataset",
        "Participant ID": "author",
        "Dream Text": "report",
    },
)
df = df.reindex(columns=["dataset", "author", "report"])

In [None]:
df

In [31]:
display(df)

Unnamed: 0,dataset,author,report
9620,Trump Dreams 2016-2017,td2017:td156,I dreamt that I was hosting a party/conference...
9621,Trump Dreams 2016-2017,td2017:td157,Donald Trump is here. My impression is that I ...
9622,Trump Dreams 2016-2017,td2017:td158,I was walking down an avenue in Midtown Manhat...
9623,Trump Dreams 2016-2017,td2017:td159,I was sitting in a bench in Manhattan on a par...
9624,Trump Dreams 2016-2017,td2017:td160,I had a dream where he became a professor at m...
...,...,...,...
42011,Hillary Clinton Dreams 2008,dreambank_hilary:9,"I was sitting in a church, listening while the..."
42012,Hillary Clinton Dreams 2008,dreambank_hilary:10,I was at a seedy mall with a friend. These men...
42013,Hillary Clinton Dreams 2008,dreambank_hilary:11,Hillary Clinton and George W. Bush were having...
42014,Hillary Clinton Dreams 2008,dreambank_hilary:12,I had an awful dream last night that Hillary C...
