# Scraping former President Trump's 'desk'

### Import Python tools and Jupyter configuration

In [1]:
%load_ext lab_black

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import datetime as dt
import tweepy

In [3]:
import altair as alt
import altair_latimes as lat
import matplotlib.pyplot as plt

In [4]:
alt.themes.register("latimes", lat.theme)
alt.themes.enable("latimes")

ThemeRegistry.enable('latimes')

In [5]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
alt.data_transformers.disable_max_rows()
pd.options.display.max_colwidth = None

---

### Read the page

In [6]:
r = requests.get("https://www.donaldjtrump.com/desk")
soup = BeautifulSoup(r.text, "html.parser")

### Grab everything from each post div

In [7]:
rows = soup.find_all("div", class_="ftdli-main ftd-d")

In [8]:
data = []
for r in rows:
    if r.find("img") is not None:
        image = r.find("img")["src"]
    else:
        image = ""
    post_url = r.find("div", class_="title ftd-d").get("onclick")
    post = r.find("p", class_="ftd-post-text").text
    author = r.find("h2").text
    date = r.find("div", class_="date ftd-d").text
    raw = r.find("p", class_="ftd-post-text").text
    data.append(
        dict(
            date=date,
            url=post_url,
            author=author,
            post=post,
            image=image,
        )
    )

### First item from the dictionary

In [9]:
data[0]

{'date': '\n7:11pm May 16, 2021\n',
 'url': "location.href='/desk/desk-q7wsgfnztk/';",
 'author': 'Donald J. Trump',
 'post': 'Breaking News! New polling by CBS News on the state of the Republican Party (which is very strong!). “President Trump has a strong hold on the GOP.” 80% of Republicans agree with the removal of Liz Cheney from GOP Leadership and only 20% disagree. The poll also showed that 67% of Republicans said that they do not consider Sleepy Joe Biden to be the legitimate winner of the 2020 Presidential Election. I agree with them 100%, just look at the facts and the data—there is no way he won the 2020 Presidential Election!',
 'image': ''}

### Clean up before importing as a dataframe

In [10]:
for d in data:
    d["date"] = d["date"].replace("\n", "")
    d["url"] = (
        d["url"]
        .replace("location.href='", "https://www.donaldjtrump.com")
        .replace("/';", "")
    )

In [11]:
src = pd.DataFrame(data)

---

### Pull in early posts

In [12]:
archive_df = pd.read_csv("input/archive.csv")

In [13]:
archive_df.drop(["video"], axis=1, inplace=True)

In [14]:
df = pd.concat([src, archive_df]).drop_duplicates(subset="url", keep="first")

### How many posts total? 

In [15]:
len(df)

54

### Last five posts

In [16]:
df.head(5)

Unnamed: 0,date,url,author,post,image
0,"7:11pm May 16, 2021",https://www.donaldjtrump.com/desk/desk-q7wsgfnztk,Donald J. Trump,"Breaking News! New polling by CBS News on the state of the Republican Party (which is very strong!). “President Trump has a strong hold on the GOP.” 80% of Republicans agree with the removal of Liz Cheney from GOP Leadership and only 20% disagree. The poll also showed that 67% of Republicans said that they do not consider Sleepy Joe Biden to be the legitimate winner of the 2020 Presidential Election. I agree with them 100%, just look at the facts and the data—there is no way he won the 2020 Presidential Election!",
1,"6:20pm May 15, 2021",https://www.donaldjtrump.com/desk/desk-rdqw9ezkfx,Donald J. Trump,Congratulations to Drew McKissick on a great win today in his re-elect as Chairman of the Republican Party of South Carolina. It was a great win against a strong and talented opponent. The Republican Party of South Carolina is in good hands and we will continue to go on to victory as we have had in the past two Presidential Elections!,
2,"2:12pm May 15, 2021",https://www.donaldjtrump.com/desk/desk-4yeh37peju,Donald J. Trump,"The entire Database of Maricopa County in Arizona has been DELETED! This is illegal and the Arizona State Senate, who is leading the Forensic Audit, is up in arms. Additionally, seals were broken on the boxes that hold the votes, ballots are missing, and worse. Mark Brnovich, the Attorney General of Arizona, will now be forced to look into this unbelievable Election crime. Many Radical Left Democrats and weak Republicans are very worried about the fact that this has been exposed. The DELETION of an entire Database and critical Election files of Maricopa County is unprecedented. Many other States to follow. The Mainstream Media and Radical Left Democrats want to stay as far away as possible from the Presidential Election Fraud, which should be one of the biggest stories of our time. Fox News is afraid to cover it—there is rarely a mention. Likewise, Newsmax has been virtually silent on this subject because they are intimidated by threats of lawsuits. One America News (OAN), one of the fastest growing networks on television, and the “hottest”, is doing a magnificent job of exposing the massive fraud that took place. The story is only getting bigger and at some point it will be impossible for the weak and/or corrupt media not to cover. Thank you to OAN and other brave American Patriots. It is all happening quickly!",
3,"12:35pm May 15, 2021",https://www.donaldjtrump.com/desk/desk-m9xwwg56fp,Donald J. Trump,"Wall Street Journal has reported (they finally got something right), that 2020 was the “Worst Presidential Poll Miss in 40 Years.” The public opinion surveys ahead of the 2020 Presidential Election were the most inaccurate ever, according to a major polling panel. This was done purposely. The polls were a joke. I won States in a landslide that I was predicted to lose days before the election. Other states had me purposely so far down that it would force people, even fans, to say “Let's stay home Darling. We love our President, but he can’t win.” And then I would win those states or at least come very close. In one state that I actually won, but the results were rigged, ABC and the Washington Post had me down by 17 points. Even the rigged final result was extremely close. It’s called SUPPRESSION POLLING and it should be illegal. These are crooked, disgusting, and very dishonest media outlets and they know exactly what they are doing. The 2020 Presidential Election was, by far, the greatest Election Fraud in the history of our Country. The good news is, the American people get it and the truth is rapidly coming out! Had Mike Pence had the courage to send the Electoral College vote back to states for recertification, and had Mitch McConnell fought for us instead of being the weak and pathetic leader he is, we would right now have a Republican President who would be VETOING the horrific Socialistic Bills that are rapidly going through Congress, including Open Borders, High Taxes, Massive Regulations, and so much else!",
4,"12:07pm May 15, 2021",https://www.donaldjtrump.com/desk/desk-qt8ayyzxe7,Donald J. Trump,"As our Country is being destroyed, both inside and out, the Presidential Election of 2020 will go down as THE CRIME OF THE CENTURY!",


### How many mention 'election'?

In [17]:
df["election"] = df["post"].str.contains("election") | df["post"].str.contains(
    "Election"
)

In [18]:
len(df[df["election"] == True])

31

### Clean up the dates

In [19]:
df["fulldate"] = pd.to_datetime(df["date"])
df["date"] = df["fulldate"].dt.date
df["time"] = df["fulldate"].dt.time

In [20]:
post_urls = list(df["url"])

---

### Posts per day 

In [21]:
election = df.groupby(["date", "election"]).agg({"author": "size"}).reset_index()

In [22]:
election.head()

Unnamed: 0,date,election,author
0,2021-03-24,True,1
1,2021-03-26,True,1
2,2021-03-30,True,1
3,2021-04-02,True,2
4,2021-04-03,True,1


In [23]:
daily = df.groupby(["date"])["author"].count().reset_index(name="count")

In [24]:
daily.rename(columns={"author": "count"}, inplace=True)

In [25]:
daily["seven-day-avg"] = daily["count"].rolling(7).mean()

In [26]:
daily["date"] = pd.to_datetime(daily["date"])

In [27]:
daily.sort_values("count", ascending=False).head()

Unnamed: 0,date,count,seven-day-avg
28,2021-05-15,4,2.714286
27,2021-05-14,4,2.285714
25,2021-05-05,4,2.142857
23,2021-05-03,4,1.857143
9,2021-04-08,3,1.857143


### Chart it!

In [28]:
bars = (
    alt.Chart(
        daily,
        title="Trump posts to the 'desk' since it launched",
    )
    .mark_bar(size=10)
    .encode(
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
        y=alt.Y(
            "count:Q",
            scale=alt.Scale(domain=(0, len("count"))),
            axis=alt.Axis(
                gridColor="#dddddd",
                offset=6,
                tickSize=0,
                domainOpacity=0,
                tickCount=3,
                title="Daily post count and seven-day average",
            ),
        ),
    )
)

rolling = (
    alt.Chart(daily)
    .mark_line(color="red")
    .encode(
        y="seven-day-avg",
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
    )
)

(bars + rolling).properties(height=350, width=600).configure_view(strokeOpacity=0)

In [29]:
election["date"] = pd.to_datetime(election["date"])

In [30]:
bars_elex = (
    alt.Chart(
        election,
        title="Trump posts to the 'desk' re: election",
    )
    .mark_bar(size=10)
    .encode(
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
        y=alt.Y(
            "author:Q",
            scale=alt.Scale(domain=(0, len("author:Q"))),
            axis=alt.Axis(
                gridColor="#dddddd",
                offset=6,
                tickSize=0,
                domainOpacity=0,
                tickCount=3,
                title="Daily post count",
            ),
        ),
        color=alt.Color(
            "election",
            title="About election?",
            scale=alt.Scale(domain=["true", "false"], range=["#f1a340", "#998ec3"]),
        ),
    )
)

(bars_elex).properties(height=350, width=600).configure_view(strokeOpacity=0)

In [31]:
(bars + rolling).properties(height=350, width=600).configure_view(strokeOpacity=0).save(
    "visuals/daily_posts.png"
)

In [32]:
(bars_elex).properties(height=350, width=600).configure_view(strokeOpacity=0).save(
    "visuals/daily_posts_re_election.png"
)

---

### Exports

In [33]:
today = dt.date.today().strftime("%m-%d-%Y")

In [34]:
df.to_csv("archive/posts_" + str(today) + ".csv", index=False)
df.to_csv("output/allposts.csv", index=False)