# Scraping former President Trump's 'desk'

### Import Python tools and Jupyter configuration

In [1]:
%load_ext lab_black

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import datetime as dt
import tweepy

In [3]:
import altair as alt
import altair_latimes as lat
import matplotlib.pyplot as plt

In [4]:
alt.themes.register("latimes", lat.theme)
alt.themes.enable("latimes")

ThemeRegistry.enable('latimes')

In [5]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
alt.data_transformers.disable_max_rows()
pd.options.display.max_colwidth = None

---

### Read the page

In [6]:
r = requests.get("https://www.donaldjtrump.com/desk")
soup = BeautifulSoup(r.text, "html.parser")

### Grab everything from each post div

In [7]:
rows = soup.find_all("div", class_="ftdli-main ftd-d")

In [8]:
data = []
for r in rows:
    if r.find("img") is not None:
        image = r.find("img")["src"]
    else:
        image = ""
    post_url = r.find("div", class_="title ftd-d").get("onclick")
    post = r.find("p", class_="ftd-post-text").text
    author = r.find("h2").text
    date = r.find("div", class_="date ftd-d").text
    raw = r.find("p", class_="ftd-post-text").text
    data.append(
        dict(
            date=date,
            url=post_url,
            author=author,
            post=post,
            image=image,
        )
    )

### First item from the dictionary

In [9]:
data[0]

{'date': '\n1:41pm May 7, 2021\n',
 'url': "location.href='/desk/desk-z9jmnjza6f/';",
 'author': 'Donald J. Trump',
 'post': 'Josh Hawley, our fantastic Senator from the beautiful and great State of Missouri, has a fantastic new book, just out this week, about the terrible Big Tech companies and their attempt to ruin our Country. It’s called The Tyranny of Big Tech—it has my Full and Complete Endorsement. Buy it now!',
 'image': ''}

### Clean up before importing as a dataframe

In [10]:
for d in data:
    d["date"] = d["date"].replace("\n", "")
    d["url"] = (
        d["url"]
        .replace("location.href='", "https://www.donaldjtrump.com")
        .replace("/';", "")
    )

In [11]:
src = pd.DataFrame(data)

---

### Pull in early posts

In [12]:
archive_df = pd.read_csv("input/archive.csv")

In [13]:
archive_df.drop(["video"], axis=1, inplace=True)

In [14]:
df = pd.concat([src, archive_df]).drop_duplicates(subset="url", keep="first")

### How many posts total? 

In [15]:
len(df)

49

### Last five posts

In [16]:
df.head(5)

Unnamed: 0,date,url,author,post,image
0,"1:41pm May 7, 2021",https://www.donaldjtrump.com/desk/desk-z9jmnjza6f,Donald J. Trump,"Josh Hawley, our fantastic Senator from the beautiful and great State of Missouri, has a fantastic new book, just out this week, about the terrible Big Tech companies and their attempt to ruin our Country. It’s called The Tyranny of Big Tech—it has my Full and Complete Endorsement. Buy it now!",
1,"10:11am May 7, 2021",https://www.donaldjtrump.com/desk/desk-sv5wzzsjev,Donald J. Trump,"At 6:31 in the morning on November 4th, a dump of 149,772 votes came in to the State of Michigan. Biden received 96% of those votes and the State miraculously went to him. Has the Michigan State Senate started their review of the Fraudulent Presidential Election of 2020 yet, or are they about to start? If not, they should be run out of office. Likewise, at 3:42 in the morning, a dump of 143,379 votes came in to the state of Wisconsin, also miraculously, given to Biden. Where did these “votes” come from? Both were State Election changing events, and that is on top of the other corruption without even including the fact that neither state got Legislative approval, which is required under the United States Constitution.",
2,"9:03am May 7, 2021",https://www.donaldjtrump.com/desk/desk-w2ur8p3wzk,Donald J. Trump,"The Federal Election Commission in Washington, D.C., has totally dropped the phony case against me concerning payments to women relative to the 2016 Presidential Election. It was a case built on lies from Michael Cohen, a corrupt and convicted lawyer, a lawyer in fact who was so corrupt he was sentenced to three years in jail for lying to Congress and many other things having nothing to do with me. I thank the Commission for their decision, ending this chapter of Fake News. Between two sleazebag lawyers, Michael Avenatti and Michael Cohen, we were all able to witness law and justice in our Country at its lowest!",
3,"11:57am May 6, 2021",https://www.donaldjtrump.com/desk/desk-hgnfasttbf,Donald J. Trump,"The Fake News Media, working in close conjunction with Big Tech and the Radical Left Democrats, is doing everything they can to perpetuate the term “The Big Lie” when speaking of 2020 Presidential Election Fraud. They are right in that the 2020 Presidential Election was a Big Lie, but not in the way they mean. The 2020 Election, which didn’t even have Legislative approvals from many States (which is required under the U.S. Constitution), and was also otherwise corrupt, was indeed The Big Lie. So when they try to sell the American people the term The Big Lie, which they do in unison and coordination, think of it instead as the greatest Fraud in the history of our Country! An even greater Hoax than Russia, Russia, Russia, Mueller, Mueller, Mueller, Impeachment Hoax #1, Impeachment Hoax #2, or any of the other many scams the Democrats pulled!",
4,"9:48am May 6, 2021",https://www.donaldjtrump.com/desk/desk-dqvrd5gscw,Donald J. Trump,"Congratulations to the great Patriots of Windham, New Hampshire for their incredible fight to seek out the truth on the massive Election Fraud which took place in New Hampshire and the 2020 Presidential Election. The spirit for transparency and justice is being displayed all over the Country by media outlets which do not represent Fake News. People are watching in droves as these Patriots work tirelessly to reveal the real facts of the most tainted and corrupt Election in American history. Congratulations Windham—look forward to seeing the results.",


### How many mention 'election'?

In [17]:
election = df[
    (df["post"].str.contains("election")) | (df["post"].str.contains("Election"))
]

In [18]:
len(election)

28

### Clean up the dates

In [19]:
df["fulldate"] = pd.to_datetime(df["date"])
df["date"] = df["fulldate"].dt.date
df["time"] = df["fulldate"].dt.time

In [20]:
post_urls = list(df["url"])

---

### Posts per day 

In [21]:
daily = df.groupby(["date"])["author"].count().reset_index(name="count")

In [22]:
daily["date"] = pd.to_datetime(daily["date"])

In [23]:
daily.sort_values("count", ascending=False).head()

Unnamed: 0,date,count
23,2021-05-03,4
25,2021-05-05,4
27,2021-05-07,3
19,2021-04-27,3
8,2021-04-07,3


### Chart it!

In [24]:
lines = (
    alt.Chart(
        daily,
        title="Trump posts to the 'desk' since it launched",
    )
    .mark_bar(size=10)
    .encode(
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
        y=alt.Y(
            "count:Q",
            scale=alt.Scale(domain=(0, 5)),
            axis=alt.Axis(
                gridColor="#dddddd",
                offset=6,
                tickSize=0,
                domainOpacity=0,
                tickCount=3,
                title="Daily post count and mean",
            ),
        ),
    )
)

rule = alt.Chart(daily).mark_rule(color="red").encode(y="mean(count):Q")

# rule label -- would like to add "Average: " annotation
text = rule.mark_text(
    align="center",
    baseline="middle",
    dx=220,
    dy=10,
    fontWeight="bold",
).encode(text=alt.Text("mean(count):Q", format=".2"))

(lines + rule + text).properties(height=350, width=600).configure_view(strokeOpacity=0)

In [25]:
(lines + rule + text).properties(height=350, width=600).configure_view(
    strokeOpacity=0
).save("visuals/daily_posts.png")

---

### Exports

In [26]:
today = dt.date.today().strftime("%m-%d-%Y")

In [27]:
df.to_csv("archive/posts_" + str(today) + ".csv", index=False)
df.to_csv("output/allposts.csv", index=False)