# Scripts tests

This notebook features a test of the modules written to collect data from Twitch and Twitter.

## Imports

In [1]:
import json
import sys
from pymongo import MongoClient
from bson.objectid import ObjectId

Add folder with Python scripts to path:

In [2]:
sys.path.insert(0, '../scripts')

# Twitch Script Test

A Twitch Client ID is needed to run the scrpit: https://dev.twitch.tv/console/apps/create

In [3]:
import twitch_collect_schedule

In [4]:
filename = "../keys.json"
with open(filename) as file:
    keys = json.loads(file.read())
    twitch_client_ID = keys['Twitch']['Client-ID']

header_v5 = {
    'Accept': 'application/vnd.twitchtv.v5+json',
    'Client-ID': twitch_client_ID,
}

The Twitch script, named twitch_collect_schedule.py, collects data from Twitch about top games. The data is collected every 30 seconds by default, but the interval time can be specified.

The collected data can be saved on disk as a json file, sent to a MongoDB collection, or both. Here we will just send it to a local test MongoDB.

Loading Mongo's client

In [5]:
client = MongoClient('localhost', 27017)

Using the db 'twitchtest'.

In [6]:
db = client.twitchtest

And the collection 'games'

In [7]:
games = db.games

We now start the script to collect data from twitch and sending it to the mongo collection. Once started, the script needs to be interrupted manually (which takes some time on jupyter).

For testing purposes, in order to collect data quicker, we will use an interval of 20 seconds.

In [8]:
twitch_collect_schedule.twitch_collector_scheduler(header_v5, games, trigger = 'interval',  seconds=20,
                                                   print_progress=True, save_local=False)

Script started at: 2019-07-08 13:02:00.079552

Press Ctrl+Break to exit
Job started at 2019-07-08 13:02:00.129418

Games processed: 199
Games processed: 299
Games processed: 399
Games processed: 499
Games processed: 599
Games processed: 699
Games processed: 799
Games processed: 899
Games processed: 999
Games processed: 1099
Games processed: 1199
Games processed: 1299
Games processed: 1399
Games processed: 1499
Games processed: 1598
Games processed: 1697
Games processed: 1711
Games processed: 1719
Done collecting!
Sent to Mongo: 5d233ed8f0bd06244c4e3d0b
Job completed at 2019-07-08 13:02:16.044579



If print_progress = True, the script also prints the ids of the data sent. E.g., in our case, '5d233ed8f0bd06244c4e3d0b'.

After stopping the script, we can look for it as follows.

In [9]:
test = list(games.find({'_id' : ObjectId('5d233ed8f0bd06244c4e3d0b')}))

In [10]:
test[0]['data'][0]

{'channels': 645,
 'game__id': 29595,
 'game_box_large': 'https://static-cdn.jtvnw.net/ttv-boxart/Dota%202-272x380.jpg',
 'game_giantbomb_id': 32887,
 'game_logo_large': 'https://static-cdn.jtvnw.net/ttv-logoart/Dota%202-240x144.jpg',
 'game_name': 'Dota 2',
 'game_norm_name': 'dota ii',
 'game_popularity': 120175,
 'viewers': 138786}

# Twitter Script Test

For testing the script that downloads data from Twitter we will store the data on the db "twittertest", using the collection "games". The test focuses on the library *collect_and_store_tweets.py*: for the project we imported this library inside the script *download_top50_tweets.py*, in order to have a reusable library.

In [11]:
db = client.twittertest

In [12]:
games = db.games

In [13]:
games.estimated_document_count()

129

In [14]:
games.find_one()

{'_id': ObjectId('5d233cbff0bd060e8c4fb12b'),
 'query': 'Assetto Corsa',
 'text': 'Assetto Corsa(AC) New lap record 01:45.779 on AudiTTcup session at Zandvoort - 23:00 GMT  (2019-07-06 23:40:105)',
 'language': 'en',
 'date': datetime.datetime(2019, 7, 6, 23, 41, 15),
 'username': 'Sim Racing System',
 'user_followers': 1514,
 'user_location': '',
 'retweets': 0,
 'likes': 0}

In [15]:
from collect_and_store_tweets import download_and_store_tweets_on_mongodb, twitter_api_setup

Setup Tweepy API

In [16]:
tweepy_api = twitter_api_setup(keys['Twitter']['consumer_key'], keys['Twitter']['consumer_secret'], keys['Twitter']['access_token'], keys['Twitter']['access_secret'])

We are interested in downloading the tweets for "Assetto Corsa" and "Project Cars 2" between 06/07/2019 and 07/07/2019

In [17]:
list_of_games = ['Assetto Corsa', "Project Cars 2"]

In [18]:
download_and_store_tweets_on_mongodb(tweepy_api, list_of_games, "2019-07-06", "2019-07-07", games)

Let's see if everything worked correctly:

In [19]:
games.find_one()

{'_id': ObjectId('5d233cbff0bd060e8c4fb12b'),
 'query': 'Assetto Corsa',
 'text': 'Assetto Corsa(AC) New lap record 01:45.779 on AudiTTcup session at Zandvoort - 23:00 GMT  (2019-07-06 23:40:105)',
 'language': 'en',
 'date': datetime.datetime(2019, 7, 6, 23, 41, 15),
 'username': 'Sim Racing System',
 'user_followers': 1514,
 'user_location': '',
 'retweets': 0,
 'likes': 0}