# Scripts tests

Just a test of the modules written to schedule the collection of data from Twitch, storing it on disk, and/or sending it to Mongo.

## Imports

In [1]:
from ast import literal_eval
from bson.son import SON
import json
import os
import sys
from pymongo import MongoClient
from bson.objectid import ObjectId

Add folder with Python scripts to path:

In [2]:
sys.path.insert(0, '../scripts')

# Twitch Script Test

A Twitch Client ID is needed to run the scrpit: https://dev.twitch.tv/console/apps/create

In [3]:
import twitch_collect_schedule

In [4]:
filename = "../keys.json"
with open(filename) as file:
    keys = json.loads(file.read())
    twitch_client_ID = keys['Twitch']['Client-ID']

header_v5 = {
    'Accept': 'application/vnd.twitchtv.v5+json',
    'Client-ID': twitch_client_ID,
}

The Twitch script, named twitch_collect_schedule.py, collects data from Twitch about top games. The data is collected every 30 seconds by default, but the interval time can be specified.

The collected data can be saved on disk as a json file, sent to a MongoDB collection, or both. Here we will just send it to a local test MongoDB.

Loading Mongo's client

In [6]:
client = MongoClient('localhost', 27017)

Using the db 'twitchtest'.

In [7]:
db = client.twitchtest

And the collection 'games'

In [8]:
games = db.games

We now start the script to collect data from twitch and sending it to the mongo collection. Once started, the script needs to be interrupted manually (which takes some time on jupyter).

For testing purposes, in order to collect data quicker, we will use an interval of 20 seconds.

In [9]:
twitch_collect_schedule.twitch_collector_scheduler(header_v5, games, trigger = 'interval',  seconds=20,
                                                   print_progress=True, save_local=False)

Script started at: 2019-07-08 12:51:59.162631

Press Ctrl+Break to exit
Job started at 2019-07-08 12:51:59.259441

Games processed: 199
Games processed: 299
Games processed: 399
Games processed: 499
Games processed: 599
Games processed: 699
Games processed: 799
Games processed: 899
Games processed: 999
Games processed: 1099
Games processed: 1199
Games processed: 1299
Games processed: 1399
Games processed: 1499
Games processed: 1596
Games processed: 1681
Games processed: 1682
Done collecting!
Sent to Mongo: 5d233c7cf0bd060e8c4fb12a
Job completed at 2019-07-08 12:52:12.409857



If print_progress = True, the script also prints the ids of the data sent. E.g., in our case, '5d233c7cf0bd060e8c4fb12a'.

After stopping the script, we can look for it as follows.

In [10]:
test = list(games.find({'_id' : ObjectId('5d233c7cf0bd060e8c4fb12a')}))

In [11]:
test[0]['data'][0]

{'channels': 645,
 'game__id': 29595,
 'game_box_large': 'https://static-cdn.jtvnw.net/ttv-boxart/Dota%202-272x380.jpg',
 'game_giantbomb_id': 32887,
 'game_logo_large': 'https://static-cdn.jtvnw.net/ttv-logoart/Dota%202-240x144.jpg',
 'game_name': 'Dota 2',
 'game_norm_name': 'dota ii',
 'game_popularity': 131086,
 'viewers': 133648}

# Twitter Script Test

For testing the script that downloads data from Twitter we will store the data on the db "twittertest", using the collection "games". The test focuses on the library *collect_and_store_tweets.py*: for the project we imported this library inside the script *download_top50_tweets.py*, in order to have a reusable library.

In [12]:
db = client.twittertest

In [13]:
games = db.games

In [14]:
games.estimated_document_count()

0

In [15]:
games.find_one()

In [16]:
from collect_and_store_tweets import download_and_store_tweets_on_mongodb, twitter_api_setup

Setup Tweepy API

In [17]:
tweepy_api = twitter_api_setup(keys['Twitter']['consumer_key'], keys['Twitter']['consumer_secret'], keys['Twitter']['access_token'], keys['Twitter']['access_secret'])

We are interested in downloading the tweets for "Assetto Corsa" and "Project Cars 2" between 06/07/2019 and 07/07/2019

In [18]:
list_of_games = ['Assetto Corsa', "Project Cars 2"]

In [19]:
download_and_store_tweets_on_mongodb(tweepy_api, list_of_games, "2019-07-06", "2019-07-07", games)

Let's see if everything worked correctly:

In [20]:
games.find_one()

{'_id': ObjectId('5d233cbff0bd060e8c4fb12b'),
 'query': 'Assetto Corsa',
 'text': 'Assetto Corsa(AC) New lap record 01:45.779 on AudiTTcup session at Zandvoort - 23:00 GMT  (2019-07-06 23:40:105)',
 'language': 'en',
 'date': datetime.datetime(2019, 7, 6, 23, 41, 15),
 'username': 'Sim Racing System',
 'user_followers': 1514,
 'user_location': '',
 'retweets': 0,
 'likes': 0}