# Twitch script test

Just a test of the modules written to schedule the collection of data from Twitch, storing it on disk, and/or sending it to Mongo.

## Imports

In [1]:
from ast import literal_eval
from bson.son import SON
import json
import os
from pymongo import MongoClient
from bson.objectid import ObjectId

Also import the modules with the functions for collection (the names are provisional)

List of python modules needed:
1. apscheduler
2. ast
3. datetime
4. json
5. os
6. pymongo
7. requests

In [2]:
import twitch
import uniformer
import collect_send_mongo
import collect_store

In [3]:
filename = "keys.json"
with open(filename) as file:
    keys = json.loads(file.read())
    twitch_client_ID = keys['Twitch']['Client-ID']

header_v5 = {
    'Accept': 'application/vnd.twitchtv.v5+json',
    'Client-ID': twitch_client_ID,
}

## Collecting the data from twitch and saving it on disk

This is done using the 'collect_store' module.

In case of errors, stackexchange suggests:
1. pip install --upgrade setuptools
2. pip install --ignore-installed apscheduler

By default the collection from twitch happens every 30 seconds (just to make collecting data for examples quicker).

The function needs to be stopped manually. (On jupyter clicking interrupt it takes some time, it's faster otherwhise).

In [5]:
collect_store.twitch_collector_scheduler(header_v5, trigger = 'interval',  seconds=30, lightweight=True, print_progress=True)

Script started at: 2019-05-17 11:21:46.589581

Press Ctrl+Break to exit
Job started at 2019-05-17 11:21:46.635710

Games processed: 199
Games processed: 299
Games processed: 399
Games processed: 499
Games processed: 599
Games processed: 699
Games processed: 799
Games processed: 899
Games processed: 999
Games processed: 1099
Games processed: 1199
Games processed: 1298
Games processed: 1396
Games processed: 1494
Games processed: 1508
Done collecting!
Job completed at 2019-05-17 11:21:53.464297



This creates two files, in my case '20190517_1050_data.json' and '20190517_1050_names.text'

### Opening the files

The first file contains the data collected from twitch. It has a document for each line, so it can be read line by line.

In [7]:
with open('20190517_1050_data.json', encoding = 'utf8') as file:
    collected_data = []
    for line in file:
        data = json.loads(line)
        collected_data.append(data)

In [8]:
len(collected_data)

5

Each dictionary consists of a timestamp recording when the data has been collected, and the actual data.

In [9]:
collected_data[0].keys()

dict_keys(['timestamp', 'data'])

In [10]:
collected_data[0]['timestamp']

'2019-05-17 10:50:16.123830'

In [12]:
collected_data[0]['data'][0]

{'channels': 1680,
 'game': {'_id': 21779,
  'box': {'large': 'https://static-cdn.jtvnw.net/ttv-boxart/League%20of%20Legends-272x380.jpg'},
  'giantbomb_id': 24024,
  'localized_name': 'League of Legends',
  'logo': {'large': 'https://static-cdn.jtvnw.net/ttv-logoart/League%20of%20Legends-240x144.jpg'},
  'name': 'League of Legends',
  'norm_name': 'league of legends',
  'popularity': 332896,
  'unary_name': 'league of legends'},
 'viewers': 362784}

Opening the file containing the names of all collected games.

In [13]:
with open('20190517_1050_names.txt', encoding = 'utf8') as file:
    games_names = file.read()

In [14]:
games_names = list(literal_eval(games_names))
games_names

['final fantasy tactics',
 'sunset overdrive',
 'welcome to the game ii',
 'sins of a solar empire rebellion',
 'rage ii',
 'dissidia final fantasy opera omnia',
 'paper mario the thousand-year door',
 'silent hill ii',
 'skyforge',
 'the sims iv',
 'dance dance revolution universe ii',
 'ashen',
 'call of duty world at war',
 'mad father',
 'fire emblem mobile',
 'just chatting',
 'maplestory ii',
 'need for speed undercover',
 'djmax respect',
 'dark age of camelot',
 'total war saga thrones of britannia',
 'mlb the show xviii',
 'rogue trip vacation mmxii',
 'pro evolution soccer mmxix',
 'for the king',
 'diesel brothers the game',
 'auto chess mobile',
 'age of empires iii the asian dynasties',
 'assetto corsa competizione',
 'nether the untold chapter',
 'mystical ninja starring goemon',
 'fish!',
 'the walking dead',
 'puppy love',
 'overlord ii',
 'shadow of the tomb raider',
 "uncharted 4 a thief's end",
 'starcraft',
 'warcraft iii the frozen throne',
 'heroes of might and ma

## Sending data to Mongo from the file

Loading Mongo's client

In [15]:
client = MongoClient('localhost', 27017)

Using the db 'twitchtest'(not sure if this needs to already exist or not).

In [17]:
db = client.twitchtest

And the collection 'games'

In [18]:
games = db.games

In [19]:
games

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'twitchtest'), 'games')

Example of sending data to MongoDB from the file of data collected from the script.
(mongodb's server needs to be running)

Each line of the file is a document, so we can send them line by line (so even if the file is very big, this should not use much memory, if I have read correctly).

In [20]:
with open('20190517_1050_data.json', "r", encoding = 'utf8') as json_file:
    inserted_ids = []
    for line in json_file:
        data = json.loads(line)
        post_id = games.insert_one(data).inserted_id
        inserted_ids.append(post_id)

I have kept track of the collected ids just for testing

In [21]:
inserted_ids 

[ObjectId('5cde9672f0bd0620ec9c2a07'),
 ObjectId('5cde9672f0bd0620ec9c2a08'),
 ObjectId('5cde9672f0bd0620ec9c2a09'),
 ObjectId('5cde9672f0bd0620ec9c2a0a'),
 ObjectId('5cde9672f0bd0620ec9c2a0b')]

In [22]:
list(games.find({'_id' : inserted_ids[0]}, {"timestamp" : 1}))

[{'_id': ObjectId('5cde9672f0bd0620ec9c2a07'),
  'timestamp': '2019-05-17 10:50:16.123830'}]

## Sending data to mongo in real time

This can be done using the 'collect_send_mongo' module

Using this module we can also store the data on disk, but this is not done by default.

Again, the script needs to be interrupted manually (which takes too long on jupyter).

In [26]:
collect_send_mongo.mongo_twitch_collector_scheduler(header_v5, games, trigger = 'interval',  seconds=30,
                                 lightweight=True, print_progress=True, store_files=False)

Script started at: 2019-05-17 11:12:59.846196

Press Ctrl+Break to exit
Job started at 2019-05-17 11:12:59.847188

Games processed: 199
Games processed: 299
Games processed: 399
Games processed: 499
Games processed: 599
Games processed: 699
Games processed: 799
Games processed: 899
Games processed: 999
Games processed: 1099
Games processed: 1197
Games processed: 1297
Games processed: 1394
Games processed: 1488
Done collecting!
Sent to Mongo: 5cde9742f0bd0620ec9c2a0c
Job completed at 2019-05-17 11:13:06.855826

Job started at 2019-05-17 11:13:29.855395

Games processed: 199
Games processed: 299
Games processed: 399
Games processed: 499
Games processed: 599
Games processed: 699
Games processed: 799
Games processed: 899
Games processed: 999
Games processed: 1099
Games processed: 1197
Games processed: 1297
Games processed: 1394
Games processed: 1488
Games processed: 1489
Done collecting!
Sent to Mongo: 5cde975bf0bd0620ec9c2a0d
Job completed at 2019-05-17 11:13:31.907193

Job started at 201

If print_progress = True, the script also prints the ids of the data sent. E.g., in my case, '5cde9742f0bd0620ec9c2a0c'.

We can look for it as follows.

In [27]:
list(games.find({'_id' : ObjectId('5cde9742f0bd0620ec9c2a0c')}, {"timestamp" : 1}))

[{'_id': ObjectId('5cde9742f0bd0620ec9c2a0c'),
  'timestamp': '2019-05-17 11:12:59.847188'}]

## Querying MongoDB

Some sources: 

1. https://techbrij.com/mongodb-query-elemmatch-dot-notation
2. http://api.mongodb.com/python/current/examples/aggregation.html
3. http://api.mongodb.com/python/current/tutorial.html

Finding games collected after 10:50 of May 17, group by name and sort by count and name. (-1 and 1 are for descending/ascending)

In [37]:
pipeline = [{"$unwind": "$data"},
            { "$match" : {'timestamp' : {'$gt': '2019-05-17 10:50'}}} ,
            { "$group": {"_id": {"name": "$data.game.name"},
                         "count": { "$sum": 1 }}},
           {"$sort": SON([("count", 1), ("_id", 1)])}]

list(db.games.aggregate(pipeline))

[{'_id': {'name': 'Asphalt 9: Legends '}, 'count': 1},
 {'_id': {'name': 'BioShock Infinite'}, 'count': 1},
 {'_id': {'name': 'Borderlands: The Handsome Collection'}, 'count': 1},
 {'_id': {'name': 'Brown Dust'}, 'count': 1},
 {'_id': {'name': 'Contra'}, 'count': 1},
 {'_id': {'name': 'Counter-Strike: Source'}, 'count': 1},
 {'_id': {'name': 'Cultures'}, 'count': 1},
 {'_id': {'name': 'ELEX'}, 'count': 1},
 {'_id': {'name': 'Fear the Wolves'}, 'count': 1},
 {'_id': {'name': 'Guilty Gear 20th Anniversary Pack'}, 'count': 1},
 {'_id': {'name': 'Human: Fall Flat'}, 'count': 1},
 {'_id': {'name': 'Jak II'}, 'count': 1},
 {'_id': {'name': 'Just Cause 3'}, 'count': 1},
 {'_id': {'name': 'KeyForge'}, 'count': 1},
 {'_id': {'name': 'Maniac Mansion: Day of the Tentacle'}, 'count': 1},
 {'_id': {'name': 'Mario Party 7'}, 'count': 1},
 {'_id': {'name': 'Mortal Kombat'}, 'count': 1},
 {'_id': {'name': 'MotoGP 18'}, 'count': 1},
 {'_id': {'name': 'Network'}, 'count': 1},
 {'_id': {'name': "Primera'

Finding games with more than 50000 viewers, 5000 popularity between 8:30 and 8:42 of May 16

In [38]:
pipeline = [{"$unwind": "$data"},
            { "$match" : {'$and' :
                          [{'data.viewers' : {'$gt' : 50000}},
                           {'data.game.popularity' : {'$gt' : 5000} },
                          {'timestamp' : {'$gt': '2019-05-16 08:39'}},
                          {'timestamp' : {'$lt': '2019-05-16 08:42'}}] } },
            {"$project":
             {'channels' : '$data.channels',
              '_id':1,
              'name' : '$data.game.name',
              'popularity' : '$data.game.popularity',
              'viewers' : '$data.viewers',
              'time':'$timestamp'} }]

In [39]:
list(db.games.aggregate(pipeline))

[{'_id': ObjectId('5cdd2e7137c21e79bdbc2cb8'),
  'channels': 438,
  'name': 'World of Warcraft',
  'popularity': 92890,
  'time': '2019-05-16 08:39:55.556686',
  'viewers': 94270},
 {'_id': ObjectId('5cdd2e7137c21e79bdbc2cb8'),
  'channels': 1714,
  'name': 'League of Legends',
  'popularity': 74325,
  'time': '2019-05-16 08:39:55.556686',
  'viewers': 77863},
 {'_id': ObjectId('5cdd2e7137c21e79bdbc2cb8'),
  'channels': 3203,
  'name': 'Fortnite',
  'popularity': 73389,
  'time': '2019-05-16 08:39:55.556686',
  'viewers': 71728},
 {'_id': ObjectId('5cdd2e7137c21e79bdbc2cb8'),
  'channels': 480,
  'name': 'Grand Theft Auto V',
  'popularity': 66849,
  'time': '2019-05-16 08:39:55.556686',
  'viewers': 50497},
 {'_id': ObjectId('5cdd2e7237c21e79bdbc2cb9'),
  'channels': 438,
  'name': 'World of Warcraft',
  'popularity': 92890,
  'time': '2019-05-16 08:40:25.543130',
  'viewers': 94270},
 {'_id': ObjectId('5cdd2e7237c21e79bdbc2cb9'),
  'channels': 1714,
  'name': 'League of Legends',
  '

Finding games with more than 1000 viewers between 8:39 and 9 og May 16, group by name, count the occurrences and sourt by count, descending and game name, ascending.

In [10]:
pipeline = [{"$unwind": "$data"},
            { "$match" : {'$and' :
                          [{'data.viewers' : {'$gt' : 1000}},
                          {'timestamp' : {'$gt': '2019-05-16 08:39'}},
                          {'timestamp' : {'$lt': '2019-05-16 9:00'}}] } },
            { "$group": {"_id": {"name": "$data.game.name"},
                         "count": { "$sum": 1 }}},
           {"$sort": SON([("count", -1), ("_id", 1)])}]

In [11]:
list(db.games.aggregate(pipeline))

[{'_id': {'name': 'A Plague Tale: Innocence'}, 'count': 72},
 {'_id': {'name': 'ASMR'}, 'count': 72},
 {'_id': {'name': 'Apex Legends'}, 'count': 72},
 {'_id': {'name': 'Art'}, 'count': 72},
 {'_id': {'name': 'Auto Chess'}, 'count': 72},
 {'_id': {'name': 'Black Desert Online'}, 'count': 72},
 {'_id': {'name': 'Call of Duty: Black Ops 4'}, 'count': 72},
 {'_id': {'name': 'Chess'}, 'count': 72},
 {'_id': {'name': 'Counter-Strike: Global Offensive'}, 'count': 72},
 {'_id': {'name': 'DayZ'}, 'count': 72},
 {'_id': {'name': 'Dead by Daylight'}, 'count': 72},
 {'_id': {'name': 'Destiny 2'}, 'count': 72},
 {'_id': {'name': 'Dota 2'}, 'count': 72},
 {'_id': {'name': 'Escape From Tarkov'}, 'count': 72},
 {'_id': {'name': 'FIFA 19'}, 'count': 72},
 {'_id': {'name': 'FINAL FANTASY XIV Online'}, 'count': 72},
 {'_id': {'name': 'Fortnite'}, 'count': 72},
 {'_id': {'name': 'Grand Theft Auto V'}, 'count': 72},
 {'_id': {'name': 'Hearthstone'}, 'count': 72},
 {'_id': {'name': 'Heroes of the Storm'}, 

Find games streamed between May 16 and May 17, group by name, compute average viewers, sort by average viewers, descending.

In [14]:
pipeline = [{"$unwind": "$data"},
            { "$match" : {'$and' : [
                {'timestamp' : {'$gt' : '2019-05-16'}},
                {'timestamp' : {'$lt' : '2019-05-17'}}
            ]}},
            { "$group": {"_id": {"name": "$data.game.name"},
                         "avgViewers": { "$avg": '$data.viewers'}}},
           {"$sort": SON([("avgViewers", -1), ("_id", -1)])}]

In [15]:
list(db.games.aggregate(pipeline))

[{'_id': {'name': 'League of Legends'}, 'avgViewers': 99282.22222222222},
 {'_id': {'name': 'World of Warcraft'}, 'avgViewers': 96331.04166666667},
 {'_id': {'name': 'Fortnite'}, 'avgViewers': 93188.27777777778},
 {'_id': {'name': 'Dota 2'}, 'avgViewers': 69667.31944444444},
 {'_id': {'name': 'Grand Theft Auto V'}, 'avgViewers': 68391.97222222222},
 {'_id': {'name': 'Just Chatting'}, 'avgViewers': 54825.15277777778},
 {'_id': {'name': 'Hearthstone'}, 'avgViewers': 30174.722222222223},
 {'_id': {'name': 'Overwatch'}, 'avgViewers': 28826.01388888889},
 {'_id': {'name': 'Counter-Strike: Global Offensive'},
  'avgViewers': 23758.180555555555},
 {'_id': {'name': "PLAYERUNKNOWN'S BATTLEGROUNDS"},
  'avgViewers': 21956.930555555555},
 {'_id': {'name': 'Apex Legends'}, 'avgViewers': 13560.5},
 {'_id': {'name': "Tom Clancy's The Division 2"},
  'avgViewers': 11504.138888888889},
 {'_id': {'name': 'Auto Chess'}, 'avgViewers': 11153.944444444445},
 {'_id': {'name': 'Paca Plus'}, 'avgViewers': 108

Similar to before

In [16]:
pipeline = [{"$unwind": "$data"},
            { "$match" :  {'timestamp' : {'$gt' : '2019-05-16 09'}}},
            { "$group": {"_id": {"name": "$data.game.name"},
                         "count": { "$sum": 1 }}},
           {"$sort": SON([("count", -1), ("_id", 1)])}]

In [17]:
list(db.games.aggregate(pipeline))

[{'_id': {'name': 'Secret of Mana'}, 'count': 41},
 {'_id': {'name': 'Board Games'}, 'count': 40},
 {'_id': {'name': 'Honkai Impact 3'}, 'count': 40},
 {'_id': {'name': 'Microsoft Solitaire Collection'}, 'count': 39},
 {'_id': {'name': 'The Walking Dead'}, 'count': 39},
 {'_id': {'name': 'X-Plane 11'}, 'count': 39},
 {'_id': {'name': 'Identity V'}, 'count': 38},
 {'_id': {'name': 'Outlast'}, 'count': 38},
 {'_id': {'name': 'Beauty & Body Art'}, 'count': 37},
 {'_id': {'name': "Donkey Kong Country 2: Diddy's Kong Quest"}, 'count': 37},
 {'_id': {'name': 'Fish!'}, 'count': 37},
 {'_id': {'name': 'Forza Motorsport 6: Apex'}, 'count': 37},
 {'_id': {'name': 'Gothic'}, 'count': 37},
 {'_id': {'name': 'Half-Life'}, 'count': 37},
 {'_id': {'name': 'Hand of Fate 2'}, 'count': 37},
 {'_id': {'name': 'Hue'}, 'count': 37},
 {'_id': {'name': 'Jump Force'}, 'count': 37},
 {'_id': {'name': 'NBA'}, 'count': 37},
 {'_id': {'name': 'TV Calibration'}, 'count': 37},
 {'_id': {'name': 'Tales from the Bord

Similar to before

In [240]:
pipeline = [{"$unwind": "$data"},
            { "$match" :  {'timestamp' : {'$gt' : '2019-05-16 15:50'}}},
            { "$group": {"_id": {"name": "$data.game.name"},
                         "count": { "$sum": 1 }}},
           {"$sort": SON([("count", 1), ("_id", 1)])}]

In [241]:
list(db.games.aggregate(pipeline))

[{'_id': {'name': '300 Heroes'}, 'count': 2},
 {'_id': {'name': '3D Creation Station'}, 'count': 2},
 {'_id': {'name': '60 Seconds!'}, 'count': 2},
 {'_id': {'name': '7 Days to Die'}, 'count': 2},
 {'_id': {'name': 'A Dance of Fire and Ice'}, 'count': 2},
 {'_id': {'name': 'A Hat in Time'}, 'count': 2},
 {'_id': {'name': 'A Plague Tale: Innocence'}, 'count': 2},
 {'_id': {'name': 'A Story About My Uncle'}, 'count': 2},
 {'_id': {'name': 'A Way Out'}, 'count': 2},
 {'_id': {'name': 'A.V.A: Dog Tag'}, 'count': 2},
 {'_id': {'name': 'AO Tennis'}, 'count': 2},
 {'_id': {'name': 'APB Reloaded'}, 'count': 2},
 {'_id': {'name': 'ARK'}, 'count': 2},
 {'_id': {'name': 'ASMR'}, 'count': 2},
 {'_id': {'name': 'ATLAS'}, 'count': 2},
 {'_id': {'name': 'Aaero'}, 'count': 2},
 {'_id': {'name': 'Abyss Horizon'}, 'count': 2},
 {'_id': {'name': 'Abyss: The Wraiths of Eden'}, 'count': 2},
 {'_id': {'name': 'Ace Combat 7'}, 'count': 2},
 {'_id': {'name': 'Adventure Quest 3D'}, 'count': 2},
 {'_id': {'name

Similar to before

In [18]:
pipeline = [{"$unwind": "$data"},
            { "$match" :  {'timestamp' : {'$gt' : '2019-05-16 17:00'}}},
            { "$group": {"_id": {"name": "$data.game.name"},
                         "count": { "$sum": 1 }}},
           {"$sort": SON([("count", 1), ("_id", 1)])}]

In [19]:
list(db.games.aggregate(pipeline))

[{'_id': {'name': 'Stream Avatars'}, 'count': 1},
 {'_id': {'name': '11-11: Memories Retold'}, 'count': 2},
 {'_id': {'name': '60 Seconds!'}, 'count': 2},
 {'_id': {'name': 'AFK Arena'}, 'count': 2},
 {'_id': {'name': 'Abyss: The Wraiths of Eden'}, 'count': 2},
 {'_id': {'name': 'Afro Samurai'}, 'count': 2},
 {'_id': {'name': 'Age of Empires Online'}, 'count': 2},
 {'_id': {'name': 'Age of Empires: Definitive Edition'}, 'count': 2},
 {'_id': {'name': 'Agony'}, 'count': 2},
 {'_id': {'name': "Alan Wake's American Nightmare"}, 'count': 2},
 {'_id': {'name': 'Alchemist'}, 'count': 2},
 {'_id': {'name': 'Alicemare'}, 'count': 2},
 {'_id': {'name': 'Alliance of Valiant Arms'}, 'count': 2},
 {'_id': {'name': 'Anime Music Quiz'}, 'count': 2},
 {'_id': {'name': 'Anno 1404'}, 'count': 2},
 {'_id': {'name': 'Anno 2205'}, 'count': 2},
 {'_id': {'name': 'Another Eden'}, 'count': 2},
 {'_id': {'name': 'Araya'}, 'count': 2},
 {'_id': {'name': 'Arc Rise Fantasia'}, 'count': 2},
 {'_id': {'name': 'Arc

In [191]:
pipeline = [{"$unwind": "$data"},
            { "$match" :  {'timestamp' : {'$gt' : '2019-05-16 20:00'}}},
            { "$group": {"_id": {"name": "$data.game.name"},
                         "count": { "$sum": 1 }}},
           {"$sort": SON([("count", -1), ("_id", 1)])}]

In [192]:
list(db.games.aggregate(pipeline))

[{'_id': {'name': '!WOW!'}, 'count': 5},
 {'_id': {'name': '#killallzombies'}, 'count': 5},
 {'_id': {'name': '.hack//G.U. Last Recode'}, 'count': 5},
 {'_id': {'name': '.hack//INFECTION - Part 1'}, 'count': 5},
 {'_id': {'name': '007: The World is Not Enough'}, 'count': 5},
 {'_id': {'name': '100% Orange Juice'}, 'count': 5},
 {'_id': {'name': '39 Days to Mars'}, 'count': 5},
 {'_id': {'name': '3D Creation Station'}, 'count': 5},
 {'_id': {'name': '428: In the Blocked City, Shibuya'}, 'count': 5},
 {'_id': {'name': '7 Days to Die'}, 'count': 5},
 {'_id': {'name': '8 Ball Pool'}, 'count': 5},
 {'_id': {'name': "A Bastard's Tale"}, 'count': 5},
 {'_id': {'name': 'A Chair in a Room: Greenwater'}, 'count': 5},
 {'_id': {'name': 'A Dance of Fire and Ice'}, 'count': 5},
 {'_id': {'name': 'A Hat in Time'}, 'count': 5},
 {'_id': {'name': 'A Plague Tale: Innocence'}, 'count': 5},
 {'_id': {'name': 'A Way Out'}, 'count': 5},
 {'_id': {'name': 'A.V.A: Dog Tag'}, 'count': 5},
 {'_id': {'name': 'A