## **Twitter Data Lake**

### **1. Setup**

In [None]:
!pip install pandas==1.1.3

In [None]:
import json
import pandas as pd
from urllib import request

### **2. Date**

In [None]:
date = "2020-10-10"

### **3. Data**

**3.1. Tweets dataset**

In [None]:
url = "https://raw.githubusercontent.com/alexlitel/congresstweets/master/data/{date}.json".format(date=date)
response = request.urlopen(url=url).read().decode()
data = json.loads(response)
tweets = pd.DataFrame(data)

```
{
   "id":"1315057755430977539",
   "screen_name":"RepJacobs",
   "user_id":"1276232539510919168",
   "time":"2020-10-10T18:32:56-04:00",
   "link":"https://www.twitter.com/SPECNewsBuffalo/statuses/1315057019246841857",
   "text":"RT @SPECNewsBuffalo The Challenger Learning Center in Lockport uses space exploration as a theme to get kids interested in science, technology, engineering, and math. http://specne.ws/VY1cpH?cid=twitter_SPECNewsBuffalo",
   "source":"Twitter for iPhone"
}
```



**3.2. Users dataset**

In [None]:
url = "https://raw.githubusercontent.com/alexlitel/congresstweets-automator/master/data/users.json"
response = request.urlopen(url=url).read().decode()
data = json.loads(response)
users = pd.DataFrame(data)



```
{
   "name":"Paul Cook",
   "chamber":"house",
   "type":"member",
   "party":"R",
   "accounts":[
      {
         "account_type":"campaign",
         "screen_name":"joinpaulcook",
         "id":"57177310"
      },
      {
         "id":"1074412920",
         "screen_name":"RepPaulCook",
         "account_type":"office"
      }
   ],
   "id":{
      "bioguide":"C001094",
      "govtrack":412513
   },
   "state":"CA"
}
```



In [None]:
users = users.explode(column="accounts").reset_index(drop=True).drop(columns=["id"])
users = pd.concat([users.drop(columns=["accounts"]), users["accounts"].apply(pd.Series)["screen_name"]], axis=1)



```
{
   "name":"Paul Cook",
   "chamber":"house",
   "type":"member",
   "party":"R",
   "state":"CA",
   "screen_name":"joinpaulcook"
}
```





```
{
   "name":"Paul Cook",
   "chamber":"house",
   "type":"member",
   "party":"R",
   "state":"CA",
   "screen_name":"RepPaulCook"
}
```



**3.3. Merged dataset**

In [None]:
dataset = tweets.merge(users, on=["screen_name"], how="inner")



```
{
   "id":"1315057755430977539",
   "screen_name":"RepJacobs",
   "user_id":"1276232539510919168",
   "time":"2020-10-10T18:32:56-04:00",
   "link":"https://www.twitter.com/SPECNewsBuffalo/statuses/1315057019246841857",
   "text":"RT @SPECNewsBuffalo The Challenger Learning Center in Lockport uses space exploration as a theme to get kids interested in science, technology, engineering, and math. http://specne.ws/VY1cpH?cid=twitter_SPECNewsBuffalo",
   "source":"Twitter for iPhone",
   "name":"Chris Jacobs",
   "chamber":"house",
   "type":"member",
   "party":"R",
   "state":"NY"
}
```

