# DSCI 511: Data acquisition and pre-processing <br>Chapter 3: Acquiring Data from the Internet

## 3.1 APIs
The entire infrastructure of the internet uses three basic ideas as building blocks: clients, servers, and requests. When you visit a website, your web browser is the _client_. In order to display the web page to you, your browser makes a _request_ to a remote _server_. The server then processes the request and sends back the page, which is essentially an HTML document. Your browser can interpret this document, which contains all kinds of information (including styling and presentation-related information and hyperlinks), and display it on your screen.

### 3.1.1 What is an API?
An _Application Processing Interface_ (API) allows you to make a request to a remote server to obtain data instead of a web page. So instead of an HTML document, usually an API request will return some data in a format like, JSON, CSV, or XML. To review how to load data from these formats review Chapter 1.

### 3.1.2 Accessing APIs
When you access a web page, your browser makes a request to the remote server. The browser uses a URL address to send the request. Similarly, URLs are also used to send requests to APIs. Usually, constructing the URL needed to send the request is an important early step in working with APIs.
#### 3.1.2.1 Making Requests and handling JSON responses
When writing Python code to get data from APIs, we'll use the `requests` module to make these requests. The `requests.get()` method can be supplied with a URL, and returns a "response" object, which has a very convenient `.json()` method to process the response when it is a JSON file and return a dictionary (so we don't necessarily have to use the `json` module to de-serialize a request's text result. For example, we'll use the GitHub API to grab some data about a user.

In [11]:
import requests

# format: "https://api.github.com/user/USERNAME"
response = requests.get("https://api.github.com/users/jakerylandwilliams")

pprint(response.json())

{'avatar_url': 'https://avatars2.githubusercontent.com/u/4721029?v=4',
 'bio': None,
 'blog': '',
 'company': None,
 'created_at': '2013-06-17T18:27:22Z',
 'email': None,
 'events_url': 'https://api.github.com/users/jakerylandwilliams/events{/privacy}',
 'followers': 7,
 'followers_url': 'https://api.github.com/users/jakerylandwilliams/followers',
 'following': 4,
 'following_url': 'https://api.github.com/users/jakerylandwilliams/following{/other_user}',
 'gists_url': 'https://api.github.com/users/jakerylandwilliams/gists{/gist_id}',
 'gravatar_id': '',
 'hireable': None,
 'html_url': 'https://github.com/jakerylandwilliams',
 'id': 4721029,
 'location': None,
 'login': 'jakerylandwilliams',
 'name': 'Jake Williams',
 'node_id': 'MDQ6VXNlcjQ3MjEwMjk=',
 'organizations_url': 'https://api.github.com/users/jakerylandwilliams/orgs',
 'public_gists': 0,
 'public_repos': 4,
 'received_events_url': 'https://api.github.com/users/jakerylandwilliams/received_events',
 'repos_url': 'https://api.gi

GitHub's API doesn't require an "access token" to return this information about a user. There are thousands of public APIs available on the web that can help you get useful data, and many of them can be used as simply as this GitHub example.
#### 3.1.2.2 A more local example of an API
The Southeastern Pennsylvania Transportation Authority (SEPTA) [makes a few APIs available](http://www3.septa.org/hackathon/). Some of these APIs can be used to access realtime data about SEPTA transit (trains, buses, trolleys). For example, we can request data about the next trains to arrive at a given station.

In [99]:
# format: "http://www3.septa.org/hackathon/Arrivals/*STATION_NAME*/*NUMBER_OF_TRAINS*"
arrivals_response = requests.get("http://www3.septa.org/hackathon/Arrivals/30th Street Station/5")

arrivals_dict = arrivals_response.json()
pprint(arrivals_dict)

{'30th Street Station Departures: August 22, 2018, 7:15 pm': [{'Northbound': [{'depart_time': '2018-08-22 '
                                                                                              '19:17:00.000',
                                                                               'destination': 'West '
                                                                                              'Trenton',
                                                                               'direction': 'N',
                                                                               'line': 'West '
                                                                                       'Trenton',
                                                                               'next_station': '30th '
                                                                                               'St',
                                                                               'o

#### 3.1.2.3 Exercise: processing a JSON response
Make a request to the SEPTA Arrivals API to get data on the next 10 trains to arrive at Suburban Station. Store this JSON-format data into a dictionary. Inspect the dictionary structure. Then, write code to create a list containing 10 dictionaries, one for each train. These new dictionaries should look like this:

In [141]:
# example of train dictionary format
train_dict = {
    'direction': 'S',
     'line': 'Media/Elwyn',
     'sched_time': '2018-08-22 17:31:01.000',
     'status': 'On Time',
     'track': '6'
}

pprint(train_dict)

{'direction': 'S',
 'line': 'Media/Elwyn',
 'sched_time': '2018-08-22 17:31:01.000',
 'status': 'On Time',
 'track': '6'}


In [90]:
# code goes here

# arrivals_response =
# arrivals_dict =

# trains = []

# for station in arrivals_dict.values():
#    

#### 3.1.2.4 Open geo-location data
Geolocation data can also be found in JSON format. The [OpenStreetMap (OSM) API](https://wiki.openstreetmap.org/wiki/Main_Page) can be used to request geographic data. Usually, map data is stored in the form of polygons, shapes with vertices consisting of latitude-longitude points. For example, we can obtain the polygon for Philadelphia from OSM like this:

In [14]:
response = requests.get("https://nominatim.openstreetmap.org/search.php?q=Philadelphia+Pennsylvania&polygon_geojson=1&format=json")

pprint(response.json())

[{'boundingbox': ['39.867005', '40.1379593', '-75.2802977', '-74.9558314'],
  'class': 'place',
  'display_name': 'Philadelphia, Philadelphia County, Pennsylvania, USA',
  'geojson': {'coordinates': [[[-75.2802977, 39.9750019],
                               [-75.2802246, 39.9748885],
                               [-75.280192, 39.974835],
                               [-75.28013, 39.974735],
                               [-75.280085, 39.974624],
                               [-75.280044, 39.97454],
                               [-75.280027, 39.974502],
                               [-75.279955, 39.974424],
                               [-75.279805, 39.97432],
                               [-75.2796, 39.974185],
                               [-75.279465, 39.974122],
                               [-75.279359, 39.974072],
                               [-75.279242, 39.974031],
                               [-75.27917, 39.973986],
                               [-75.279123, 39.9

### 3.1.3 Working with CSV responses
Sometimes API responses can be in CSV format, too. For example, the schedule data API for Center City regional arrivals by SEPTA returns CSVs. Since a CSV file is really just a text file, we can read the text from the response using a CSV reader. In the output below, showing the schedule data being displayed [here](http://www3.septa.org/ccstations/30th/) in CSV format, notice that the first line and the last two lines are not part of the table, rather messages and timestamps. Further, the CSV does not appear to have been written properly in the usual one-entry-per-line format. Rather, entries for trains on the same line are joined together. Irregularities like this are easy to miss, but can end up breaking your code.

In [109]:
# format: "http://www3.septa.org/ccstations/STATION/sched_data.csv", acceptable values for STATION are "me", "ss", and "30th"
schedule_response = requests.get("http://www3.septa.org/ccstations/30th/sched_data.csv")
schedule_text = schedule_response.text.strip().split("\n") # removing leading and trailing spaces and splitting lines
schedule_reader = csv.reader(schedule_text)
schedule = list(schedule_reader)
pprint(schedule)

[["EMG=' No Emg Message"],
 ['R4S=07:29',
  'Airport',
  '6',
  'ON TIME',
  'LOCAL                    ',
  '461   ',
  '<_NEXT_MSG>07:59',
  'Airport',
  '6',
  'ON TIME',
  'LOCAL                    ',
  '463   ',
  '<_NEXT_MSG>08:29',
  'Airport',
  '6',
  'ON TIME',
  'LOCAL                    ',
  '465   ',
  '<_NEXT_MSG>08:59',
  'Airport',
  '6',
  'ON TIME',
  'LOCAL                    ',
  '467   ',
  ''],
 ['R4N=07:30',
  'Glenside',
  '5',
  'ON TIME',
  'LOCAL                    ',
  '458   ',
  '<_NEXT_MSG>08:00',
  'Warminster',
  '5',
  'ON TIME',
  'LOCAL                    ',
  '460   ',
  '<_NEXT_MSG>08:30',
  'Glenside',
  '5',
  'ON TIME',
  'LOCAL                    ',
  '462   ',
  '<_NEXT_MSG>09:00',
  'Warminster',
  '5',
  'ON TIME',
  'LOCAL                    ',
  '464   ',
  ''],
 ['R2S=07:44',
  'Wilmington',
  '6',
  ' 1 LATE',
  'LOCAL                    ',
  '4269  ',
  '<_NEXT_MSG>08:10',
  '30th St',
  '',
  'ON TIME',
  'LOCAL                    ',
  

### 3.1.4 Working with XML Responses

The Wikipedia API returns results in XML format. Processing an XML reponse into dictionary form is more complex than JSON. We'll use a module named `xmltodict` (install using `pip3 install xmltodict`), discussed in Chapter 1. First, let's request the article for Philadelphia by constructing a search query.

In [2]:
response = requests.get("https://en.wikipedia.org/w/index.php?title=Special:Export&pages=" + 
                        "%0A".join(["Philadelphia", "Pennsylvania"])) # string.join() creates the search query "Philadelphia%0APennsylvania"
    
pprint(response.text)

('<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" '
 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '
 'xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/ '
 'http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="en">\n'
 '  <siteinfo>\n'
 '    <sitename>Wikipedia</sitename>\n'
 '    <dbname>enwiki</dbname>\n'
 '    <base>https://en.wikipedia.org/wiki/Main_Page</base>\n'
 '    <generator>MediaWiki 1.32.0-wmf.19</generator>\n'
 '    <case>first-letter</case>\n'
 '    <namespaces>\n'
 '      <namespace key="-2" case="first-letter">Media</namespace>\n'
 '      <namespace key="-1" case="first-letter">Special</namespace>\n'
 '      <namespace key="0" case="first-letter" />\n'
 '      <namespace key="1" case="first-letter">Talk</namespace>\n'
 '      <namespace key="2" case="first-letter">User</namespace>\n'
 '      <namespace key="3" case="first-letter">User talk</namespace>\n'
 '      <namespace key="4" case="first-letter">Wikipedia</namespa

Next, we'll convert this XML response into a dictionary. As you'll see, in order to get the desired data out of the XML, some studying of its structure is necessary.

In [7]:
import xmltodict

parsed = xmltodict.parse(response.text)
article = parsed["mediawiki"]["page"][0]
article_dict = {
            "id": article["id"],
            "title": article["title"],
            "text": article["revision"]["text"]["#text"]
        }
pprint(article_dict)

{'id': '50585',
 'text': '{{About|the Pennsylvania city}}\n'
         '{{Redirect|Philly}}\n'
         '{{pp-semi-indef}}\n'
         '{{pp-move-indef}}\n'
         '{{Use mdy dates|date=October 2017}}\n'
         '{{Short description|Largest city in Pennsylvania}}\n'
         '{{Infobox settlement\n'
         '| name                            = Philadelphia, Pennsylvania\n'
         '| official_name                   = City of Philadelphia\n'
         '| settlement_type                 = [[Consolidated city-county]]\n'
         '| etymology                       = [[Ancient Greek]]: '
         "[[wikt:φίλος|φίλος]] ''phílos'' (beloved, dear) and "
         "[[wikt:ἀδελφός|ἀδελφός]] ''adelphós'' (brother, brotherly)\n"
         '| image_skyline                   = {{Photomontage\n'
         '| photo1a                   = Philadelphia skyline from the '
         'southwest 2015.jpg{{!}}Philadelphia skyline (2015)\n'
         '| photo2a                   = Independence National Historic

## 3.2 API authentication
The GitHub and SEPTA examples we've looked at so far are APIs that don't require any authentication to access. Anyone can send a request to these APIs and receive a response. There are, however, quite a few APIs that require the user to have some authentication. This authentication usually takes the form of an access token that needs to be obtained from the API provider before making requests.

### 3.2.1 Example: Sportradar
As an example, we'll take a look at one of the [Sportradar APIs](https://developer.sportradar.com). Sportradar has APIs for a number of different sports.

In order to use any of their APIs, Sportradar requires you to open a developer account and register an app. Only then you are granted an access token or "API key", which you must plug in to any requests you make.

The steps to obtain an API key from Sportradar are:
1. Sign up as a developer [here](https://developer.sportradar.com/member/register)
2. Sign in to your account and go your [account page](https://developer.sportradar.com/member/my-account)
3. Go to your [applications page](https://developer.sportradar.com/apps/myapps)
4. Register a new application and select the API keys you need

We'll use the Sportradar Soccer API to obtain the match schedule for an English soccer team, Manchester City.

First, we'll construct the request address using the API key.

In [153]:
soccer_key = ""

In [198]:
# format: "https://api.sportradar.us/soccer-xt3/eu/en/teams/TEAM_ID/schedule.json?api_key=API_KEY"
address = "https://api.sportradar.us/soccer-xt3/eu/en/teams/sr:competitor:17/schedule.json?api_key=" + soccer_key

In [199]:
resp = requests.get(address)

In [200]:
result = resp.json()

In [201]:
pprint(result)

{'generated_at': '2018-08-23T14:03:58+00:00',
 'schedule': [{'competitors': [{'abbreviation': 'WOL',
                                'country': 'England',
                                'country_code': 'ENG',
                                'id': 'sr:competitor:3',
                                'name': 'Wolverhampton Wanderers',
                                'qualifier': 'home'},
                               {'abbreviation': 'MCI',
                                'country': 'England',
                                'country_code': 'ENG',
                                'id': 'sr:competitor:17',
                                'name': 'Manchester City',
                                'qualifier': 'away'}],
               'id': 'sr:match:14736299',
               'scheduled': '2018-08-25T11:30:00+00:00',
               'season': {'end_date': '2019-05-13',
                          'id': 'sr:season:54571',
                          'name': 'Premier League 18/19',
                

Inspecting the results, we can see that the schedule can be found under the "schedule" key to the top-level dictionary. It is a list of matches, with each match encoded in a dictionary.

In [207]:
schedule = result["schedule"]
print(type(schedule))
print(len(schedule))

<class 'list'>
36


Each match looks like this:

In [210]:
pprint(schedule[0])

{'competitors': [{'abbreviation': 'WOL',
                  'country': 'England',
                  'country_code': 'ENG',
                  'id': 'sr:competitor:3',
                  'name': 'Wolverhampton Wanderers',
                  'qualifier': 'home'},
                 {'abbreviation': 'MCI',
                  'country': 'England',
                  'country_code': 'ENG',
                  'id': 'sr:competitor:17',
                  'name': 'Manchester City',
                  'qualifier': 'away'}],
 'id': 'sr:match:14736299',
 'scheduled': '2018-08-25T11:30:00+00:00',
 'season': {'end_date': '2019-05-13',
            'id': 'sr:season:54571',
            'name': 'Premier League 18/19',
            'start_date': '2018-08-10',
            'tournament_id': 'sr:tournament:17',
            'year': '18/19'},
 'start_time_tbd': False,
 'status': 'not_started',
 'tournament': {'category': {'country_code': 'ENG',
                             'id': 'sr:category:1',
                         

#### 3.2.1.1 Exercise: accessing a soccer schedule

Make a request to the Sportradar Soccer schedule API to obtain the match schedule for Liverpool FC (team_id = sr:competitor:44). Then, from the obtained schedule, make a simple list of fixtures. Your output should be a list with strings as elements. The strings should be of the format "HOME_TEAM vs AWAY_TEAM".

In [None]:
# code goes here

### 3.3 Big Tech APIs

Large tech companies make a variety of APIs available for use. Most of these require authentication and can be used to access some very useful data. 

#### 3.3.1 Facebook

Facebook has an API called the Graph API that allows a developer to access data about posts, comments, users and more. However, one of the major sticking points of working with APIs from companies like Facebook is that these APIs change very frequently, and sometimes the changes can break a developer's code. Facebook changed their policy towards applications some time ago, and as a result, a developer simply working on a research project is not allowed access to any data from the Graph API. In order to gain data access, all developers must register their application and go through a review process with Facebook. Only then is data collection allowed through the API. These barriers make it hard to discuss and work with the Graph API. 

#### 3.3.2 Google

Similarly, Google Maps has a very powerful API that can be used to obtain a large variety of data, however, it is meant to be used as a licensed resource for which developers pay Google a fee. Some of Google's tools may be used for free by obtaining a \$200 API credit, however, this still requires setting up billing. If you are interested in these tools, take a look [here](https://cloud.google.com/maps-platform/pricing/).

An important consideration when working with these APIs is that companies usually enforce a "rate limit". This means a developer is only allowed to make calls under a certain frequency. When you are working with an API from a commercial entity, make sure to check on their rate limit.

#### 3.3.3 Twitter

A big-tech API we can demonstrate and play around with is the Twitter API. While the API can be accessed in a barebones way usings tools such as the `requests` module, we can also use an API client, which is a third-party library that allows us to easily work with an API by automating and simplifying low-level tasks.

For Twitter, we'll use a client called Twython (`pip3 install Twython` to install). First, we'll need to follow these steps to obtain API access and authentication: 

1. Sign up for a Twitter account
2. Sign in to [https://apps.twitter.com]
3. Create an app
4. Go to the API Keys section and click "Generate ACCESS TOKEN".

The resulting keys are:
- "oauth_access_token"
- "oauth_access_token_secret"
- "consumer_key"
- "consumer_secret"

We'll have to save these values in some variables that we'll need:

In [1]:
access_token = ""
access_token_secret = ""
consumer_key = ""
consumer_secret = ""

Now, we'll create a `twitter` object using our consumer key and secret and use this object to download a [list of tweets](https://www.buzzfeed.com/danieldalton/epic-tweet-bro?utm_term=.shgZJEe8V#.lywBJEpjG) using their unique IDs.

In [2]:
from twython import Twython

twitter = Twython(consumer_key, consumer_secret)

IDlist = [
    "1121915133", 
    "64780730286358528", 
    "64877790624886784", 
    "20", 
    "467192528878329856", 
    "474971393852182528",
    "475071400466972672",
    "475121451511844864",
    "440322224407314432",
    "266031293945503744",
    "3109544383",
    "1895942068",
    "839088619",
    "8062317551",
    "232348380431544320",
    "286910551899127808",
    "286948264236945408",
    "27418932143",
    "786571964",
    "467896522714017792",
    "290892494152028160",
    "470571408896962560"
]

for ID in IDlist:
    status = twitter.show_status(id = ID)
    print(status["text"])
    print()

http://twitpic.com/135xa - There's a plane in the Hudson. I'm on the ferry going to pick up the people. Crazy.

Helicopter hovering above Abbottabad at 1AM (is a rare event).

So I'm told by a reputable person they have killed Osama Bin Laden. Hot damn.

just setting up my twttr

India has won! भारत की विजय। अच्छे दिन आने वाले हैं।

We can neither confirm nor deny that this is our first tweet.

Thank you for the @Twitter welcome! We look forward to sharing great #unclassified content with you.

@CIA We look forward to sharing great classified info about you http://t.co/QcdVxJfU4X https://t.co/kcEwpcitHo More: https://t.co/PEeUpPAt7F

If only Bradley's arm was longer. Best photo ever. #oscars http://t.co/C9U5NOtGap

Four more years. http://t.co/bAJE6Vom

Facebook turned me down. It was a great opportunity to connect with some fantastic people. Looking forward to life's next adventure.

Got denied by Twitter HQ. That's ok. Would have been a long commute.

Are you ready to celebrate?  Wel

Next, we'll grab a user's timeline (Drexel University in this case) and print out the last 10 tweets by them:

In [15]:
drexel_twitter = twitter.get_user_timeline(screen_name = "drexeluniv")

for tweet in drexel_twitter[:10]:
    print(tweet["text"])
    print()

Good luck with #FinalsWeek, Dragons! 🐉 https://t.co/qfvPwxKYbO

Philly’s first-pay-what-you-can restaurant, @theEATcafe, which launched out of @HungerFreeCtr in @drexelpubhealth i… https://t.co/1DnAidIjss

Happy #LaborDay Dragons! 🇺🇸 https://t.co/waH0MAOVxz

RT @DrexelNow: It's the Friday before Labor Day! 

Photo: Espresso, one of @DrexelUniv's therapy dogs, is very excited. From @Drexelsdogs.…

RT @gabby_frost: big thanks to @DrexelUniv for interviewing me for @DrexelNow! I'm so lucky to go to a school that supports me not only wit…

Haven’t eaten lunch yet? Lashing out at coworkers? Are you just not you when you're hungry? Dr. Michael Lowe from… https://t.co/vBicyXV2nz

Imagine a computer keyboard knit from yarn, embedded with touch sensors and Bluetooth, so soft and flexible it coul… https://t.co/ddhb9k0Nm8

We've all heard of 30-day challenges, but this is definitely the craziest one so far - the McDonald's 30 Day Challe… https://t.co/kxJkwBMOJv

We got the best #MondayMotivation 

#### 3.3.3.1 Exercise: access some accidental haikus from Twitter's REST API
Create your Twitter API keys and download the last 15 tweets by @accidental575 (the hilarious Accidental Haiku Bot).

In [None]:
# code goes here

#### 3.3.3.2 Complex queries with filtering
We can use Twitter's Search API to grab some tweets about a particular topic:

In [5]:
for tweet in twitter.search(q = '"data science"')["statuses"]:
    print(tweet["text"])
    print()

Data Science for Fundraising: Build Data-Driven Solutions Using R https://t.co/jvG3q6rMTj

RT @IainLJBrown: Dive into Data Science: An Intro to Big Data, Advanced Analytics and AI/ML #MeetUp #AI #DataScience #MachineLearning #Deep…

Analizando el #bigdata y sus aplicaciones https://t.co/3OMIk6EHYe via @JoseA_Blanco by @PiperLab_es #sociosCEL referencia en el #DataScience

RT @kierisi: I'm giving a short presentation at work tomorrow about the data science department. it's going to be a great opportunity for m…

RT @Ronald_vanLoon: An Introduction to Key #DataScience Concepts [#INFOGRAPHIC]
 by @dataiku |

https://t.co/9clxyEJpRW

#PredictiveAnalyti…

Data Science Introductory Session: https://t.co/pLn2Cz94Qw via @YouTube

Aakkam @ PSGR college for women to present one-day FDP program on "IOT and Data Science in IOT using Python" https://t.co/6yHxhogmmA

RT @IainLJBrown: Dive into Data Science: An Intro to Big Data, Advanced Analytics and AI/ML #MeetUp #AI #DataScience #MachineLearning 

#### 3.3.3.3 Twitter's streaming APIs
he Streaming API lets us collect a portion of all the tweets currently being posted, in real time. This is more complicated than collecting old tweets, because it means essentially downloading an endless stream of data. Let's say we want to collect 30 tweets and stop. We need to write a class based on `TwythonStreamer` that has this failsafe built-in, while also only collecting English tweets:

In [17]:
from twython import TwythonStreamer

tweets = []

class Streamer(TwythonStreamer):
    
    def on_success(self, data):
        
        if data["lang"] == "en":
            tweets.append(data)
            print("Received tweet #" + str(len(tweets)))
            
        if len(tweets) >= 30:
            self.disconnect()
            
    def on_error(self, status_code, data):
        print(status_code, data)
        self.disconnect()

Next, we need to `Streamer` object and use it to collect tweets. Let's say we want to collect tweets with the keyword "science":

In [18]:
stream = Streamer(consumer_key, consumer_secret, access_token, access_token_secret)

stream.statuses.filter(track = "science")

Received tweet #1
Received tweet #2
Received tweet #3
Received tweet #4
Received tweet #5
Received tweet #6
Received tweet #7
Received tweet #8
Received tweet #9
Received tweet #10
Received tweet #11
Received tweet #12
Received tweet #13
Received tweet #14
Received tweet #15
Received tweet #16
Received tweet #17
Received tweet #18
Received tweet #19
Received tweet #20
Received tweet #21
Received tweet #22
Received tweet #23
Received tweet #24
Received tweet #25
Received tweet #26
Received tweet #27
Received tweet #28
Received tweet #29
Received tweet #30


Let's take a look at these 30 tweets, collected in real time:

In [20]:
for tweet in tweets:
    print(tweet["text"])
    print()

RT @MBLScience: MBL Fellows to focus scholarship on #regeneration: its history, philosophy, and science @MBLHistory https://t.co/XArGvYEL09…

RT @Nina_Jensen: Important step towards protecting the high seas from threats like deep-sea mining and over-fishing! #SaveOurOcean #UN #SDG…

RT @biyolokum: I feel so lucky that History and Philosophy of Science is thriving at the MBL around the time I am here, so I get to be part…

Diaphaneity not diaphancity. See, it doesn't even have a meaning in the dictionary. Tas pinagtutulakan mo pang sa S… https://t.co/VMRx7n1Uv5

RT @chidzhazenberry: “The British people didn’t vote to be outside the single market”

Excuse you?

The single market is a function of the…

RT @sewell7: “Truly, Terrific, Entertainment” 10/10 @SFFAudio
“Great science fiction fun!” @BlogtorWho
“This audio revival of Dan Dare is a…

@Maidenist85 @MennoPP @Jimispr @darrenrovell Climate change is liberal hysteria. Yes, it held science back. But if… https://t.co/5ibOC6qS9T

RT @_Mig

The Streaming API is a very powerful data source, but it can also generate quite a bit of data in a short period of time. So, whenever you're collecting tweets from the stream, make sure to design the collection in such a way that you don't run out of storage!