# Scraping former President Trump's 'desk'

### Import Python tools and Jupyter configuration

In [1]:
%load_ext lab_black

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import datetime as dt

In [3]:
import altair as alt
import altair_latimes as lat
import matplotlib.pyplot as plt

In [4]:
alt.themes.register("latimes", lat.theme)
alt.themes.enable("latimes")

ThemeRegistry.enable('latimes')

In [5]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
alt.data_transformers.disable_max_rows()
pd.options.display.max_colwidth = None

---

### Read the page

In [6]:
r = requests.get("https://www.donaldjtrump.com/desk")
soup = BeautifulSoup(r.text, "html.parser")

### Grab everything from each post div

In [7]:
rows = soup.find_all("div", class_="ftdli-main ftd-d")

In [8]:
data = []
for r in rows:
    if r.find("img") is not None:
        image = r.find("img")["src"]
    else:
        image = ""
    post_url = r.find("div", class_="title ftd-d").get("onclick")
    post = r.find("p", class_="ftd-post-text").text
    author = r.find("h2").text
    date = r.find("div", class_="date ftd-d").text
    raw = r.find("p", class_="ftd-post-text").text
    data.append(
        dict(
            date=date,
            url=post_url,
            author=author,
            post=post,
            image=image,
        )
    )

### First item from the dictionary

In [9]:
data[0]

{'date': '\n6:17pm May 21, 2021\n',
 'url': "location.href='/desk/desk-cc9k2cjkge/';",
 'author': 'Donald J. Trump',
 'post': 'Many people have asked about the beautiful Boeing 757 that became so iconic during the Trump rallies. It was effectively kept in storage in Upstate New York in that I was not allowed to use it during my presidency. It is now being fully restored and updated and will be put back into service sometime prior to the end of the year. It will soon be brought to a Louisiana service facility for the completion of work, inspection and updating of Rolls-Royce engines, and a brand new paint job. When completed, it will be better than ever, and again used at upcoming rallies!',
 'image': ''}

### Clean up before importing as a dataframe

In [10]:
for d in data:
    d["date"] = d["date"].replace("\n", "")
    d["url"] = (
        d["url"]
        .replace("location.href='", "https://www.donaldjtrump.com")
        .replace("/';", "")
    )

In [11]:
src = pd.DataFrame(data)

---

### Pull in early posts

In [12]:
archive_df = pd.read_csv("input/archive.csv")

In [13]:
archive_df.drop(["video"], axis=1, inplace=True)

In [14]:
df = pd.concat([src, archive_df]).drop_duplicates(subset="url", keep="first")

### How many posts total? 

In [15]:
len(df)

54

### Last five posts

In [16]:
df.head(5)

Unnamed: 0,date,url,author,post,image
0,"6:17pm May 21, 2021",https://www.donaldjtrump.com/desk/desk-cc9k2cjkge,Donald J. Trump,"Many people have asked about the beautiful Boeing 757 that became so iconic during the Trump rallies. It was effectively kept in storage in Upstate New York in that I was not allowed to use it during my presidency. It is now being fully restored and updated and will be put back into service sometime prior to the end of the year. It will soon be brought to a Louisiana service facility for the completion of work, inspection and updating of Rolls-Royce engines, and a brand new paint job. When completed, it will be better than ever, and again used at upcoming rallies!",
1,"12:41pm May 20, 2021",https://www.donaldjtrump.com/desk/desk-ecxupbx2p2,Donald J. Trump,"See, 35 wayward Republicans—they just can’t help themselves. We have much better policy and are much better for the Country, but the Democrats stick together, the Republicans don’t. They don’t have the Romney’s, Little Ben Sasse’s, and Cheney’s of the world. Unfortunately, we do. Sometimes there are consequences to being ineffective and weak. The voters understand!",
2,"9:31pm May 19, 2021",https://www.donaldjtrump.com/desk/desk-jtag3nem59,Donald J. Trump,"A loan of $1.2 billion has closed on the asset known as the Bank of America Building (555 California Street) in San Francisco, CA. The interest rate is approximately 2%. Thank you!",
3,"7:58pm May 19, 2021",https://www.donaldjtrump.com/desk/desk-bc3qjwctre,Donald J. Trump,"Stick with Kirstie Alley! She is a great actress, loved by so many people, and a true original. She is also strong and smart. Many millions of people greatly appreciate her support of our Country. Thank you Kirstie, you are truly appreciated!",
4,"12:02pm May 19, 2021",https://www.donaldjtrump.com/desk/desk-w2snhrwejb,Donald J. Trump,"I have just learned, through leaks in the mainstream media, that after being under investigation from the time I came down the escalator 5 ½ years ago, including the fake Russia Russia Russia Hoax, the 2 year, $48M, No Collusion Mueller Witch Hunt, Impeachment Hoax #1, Impeachment Hoax #2, and others, that the Democrat New York Attorney General has “informed” my organization that their “investigation” is no longer just a civil matter but also potentially a “criminal” investigation working with the Manhattan District Attorney’s Office.\nThere is nothing more corrupt than an investigation that is in desperate search of a crime. But, make no mistake, that is exactly what is happening here.\nThe Attorney General of New York literally campaigned on prosecuting Donald Trump even before she knew anything about me. She said that if elected, she would use her office to look into “every aspect” of my real estate dealings. She swore that she would “definitely sue” me. She boasted on video that she would be, and I quote, “a real pain in the ass.” She declared, “just wait until I’m in the Attorney General’s office,” and, ”I’ve got my eyes on Trump Tower.” She also promised that, if elected, she would “join with law enforcement and other Attorney Generals across this nation in removing this President from office,” and, “It’s important that everyone understand that the days of Donald Trump are coming to an end.”\nThe Attorney General made each of these statements, not after having had an opportunity to actually look at the facts, but BEFORE she was even elected, BEFORE she had seen even a shred of evidence. This is something that happens in failed third world countries, not the United States. If you can run for a prosecutor’s office pledging to take out your enemies, and be elected to that job by partisan voters who wish to enact political retribution, then we are no longer a free constitutional democracy.\nLikewise, the District Attorney’s office has been going after me for years based on a lying, discredited low life, who was not listened to or given credibility by other prosecutorial offices, and sentenced to 3 years in prison for lying and other events unrelated to me.\nThese investigations have also been going on for years with members and associates of the Trump Organization being viciously attacked, harassed, and threatened, in order to say anything bad about the 45th President of the United States. This would include having to make up false stories. Numerous documents, all prepared by large and prestigious law and accounting firms, have been examined, and many hours of testimony have been taken from many people, some of whom I have not seen in years.\nThese Democrat offices are consumed with this political and partisan Witch Hunt at a time when crime is up big in New York City, shootings are up 97%, murders are up 45%, a rate not seen in 40 years, drugs and criminals are pouring into our Country in record numbers from our now unprotected Southern Border, and people are fleeing New York for other much safer locations to live. But the District Attorney and Attorney General are possessed, at an unprecedented level, with destroying the political fortunes of President Donald J. Trump and the almost 75 million people who voted for him, by far the highest number ever received by a sitting President.\nThat is what these investigations are all about—a continuation of the greatest political Witch Hunt in the history of the United States. Working in conjunction with Washington, these Democrats want to silence and cancel millions of voters because they don’t want “Trump” to run again. As people are being killed on the sidewalks of New York at an unprecedented rate, as drugs and crime of all kinds are flowing through New York City at record levels, with absolutely nothing being done about it, all they care about is taking down Trump.\nOur movement, which started with the Great Election Win of 2016, is perhaps the biggest and most powerful in the history of our Country. But the Democrats want to cancel the Make America Great Again movement, not by Making America First, but by Making America Last. No President has been treated the way I have. With all of the crime and corruption you read about with others, nothing happens, they only go after Donald Trump.\nAfter prosecutorial efforts the likes of which nobody has ever seen before, they failed to stop me in Washington, so they turned it over to New York to do their dirty work. This is what I have been going through for years. It’s a very sad and dangerous tale for our Country, but it is what it is, and we will overcome together.\nI have built a great company, employed thousands of people, and all I do is get unfairly attacked and abused by a corrupt political system. It would be so wonderful if the effort used against President Donald J. Trump, who lowered taxes and regulations, rebuilt our military, took care of our Veterans, created Space Force, fixed our border, produced our vaccine in record-setting time (years ahead of what was anticipated), and made our Country great and respected again, and so much more, would be focused on the ever more dangerous sidewalks and streets of New York.\nIf these prosecutors focused on real issues, crime would be obliterated, and New York would be great and free again!",


### How many mention 'election'?

In [17]:
df["election"] = df["post"].str.contains("election") | df["post"].str.contains(
    "Election"
)

In [18]:
len(df[df["election"] == True])

26

### Clean up the dates

In [19]:
df["fulldate"] = pd.to_datetime(df["date"])
df["date"] = df["fulldate"].dt.date
df["time"] = df["fulldate"].dt.time

In [20]:
post_urls = list(df["url"])

---

### Posts per day 

In [21]:
election = df.groupby(["date", "election"]).agg({"author": "size"}).reset_index()

In [22]:
election.head()

Unnamed: 0,date,election,author
0,2021-03-24,True,1
1,2021-03-26,True,1
2,2021-03-30,True,1
3,2021-04-02,True,2
4,2021-04-03,True,1


In [23]:
daily = df.groupby(["date"])["author"].count().reset_index(name="count")

In [24]:
daily.rename(columns={"author": "count"}, inplace=True)

In [25]:
daily["seven-day-avg"] = daily["count"].rolling(7).mean()

In [26]:
daily["date"] = pd.to_datetime(daily["date"])

In [27]:
daily.sort_values("count", ascending=False).head()

Unnamed: 0,date,count,seven-day-avg
25,2021-05-05,4,2.142857
23,2021-05-03,4,1.857143
28,2021-05-19,4,2.571429
8,2021-04-07,3,1.571429
9,2021-04-08,3,1.857143


### Chart it!

In [28]:
bars = (
    alt.Chart(
        daily,
        title="Trump posts to the 'desk' since it launched",
    )
    .mark_bar(size=10)
    .encode(
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
        y=alt.Y(
            "count:Q",
            scale=alt.Scale(domain=(0, len("count"))),
            axis=alt.Axis(
                gridColor="#dddddd",
                offset=6,
                tickSize=0,
                domainOpacity=0,
                tickCount=3,
                title="Daily post count and seven-day average",
            ),
        ),
    )
)

rolling = (
    alt.Chart(daily)
    .mark_line(color="red")
    .encode(
        y="seven-day-avg",
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
    )
)

(bars + rolling).properties(height=350, width=600).configure_view(strokeOpacity=0)

In [29]:
election["date"] = pd.to_datetime(election["date"])

In [30]:
bars_elex = (
    alt.Chart(
        election,
        title="Trump posts to the 'desk' re: election",
    )
    .mark_bar(size=10)
    .encode(
        x=alt.X(
            "date:T",
            axis=alt.Axis(grid=False, title="", tickCount=5, format=("%B %-d")),
        ),
        y=alt.Y(
            "author:Q",
            scale=alt.Scale(domain=(0, len("author:Q"))),
            axis=alt.Axis(
                gridColor="#dddddd",
                offset=6,
                tickSize=0,
                domainOpacity=0,
                tickCount=3,
                title="Daily post count",
            ),
        ),
        color=alt.Color(
            "election",
            title="About election?",
            scale=alt.Scale(domain=["true", "false"], range=["#f1a340", "#998ec3"]),
        ),
    )
)

(bars_elex).properties(height=350, width=600).configure_view(strokeOpacity=0)

In [31]:
(bars + rolling).properties(height=350, width=600).configure_view(strokeOpacity=0).save(
    "visuals/daily_posts.png"
)

In [32]:
(bars_elex).properties(height=350, width=600).configure_view(strokeOpacity=0).save(
    "visuals/daily_posts_re_election.png"
)

---

### Exports

In [33]:
today = dt.date.today().strftime("%m-%d-%Y")

In [34]:
df.to_csv("archive/posts_" + str(today) + ".csv", index=False)
df.to_csv("output/allposts.csv", index=False)