<div style="text-align:left;"><img src="images/elden_ring.jpeg" style="display:inline-block;"/></div>

# Elden Ring

Elden Ring is a video game released in February 2022 by FromSoftware Inc. The company is known for creating very difficult Role Playing Games such as Demon's Souls, Dark Souls 1-3, Bloodborne and Sekiro. Elden Ring follows this tradition of throwing bosses in your way that aim to humble you. 

Using Twitter API, I will be extracting tweets which mention Elden Ring in their text, store the data in mongoDB, clean it and analyze it. There are famous streamers which will without a doubt have mentioned Elden Ring in their Tweets - Before, during and after the release of the game. Using the data 'users' which connect the tweets to the posters, I will be analyzing some of what these streamers said. 

## Summary, Overview

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. 

<img src="images/mongodb.svg" style="height:400px;">

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 


##  Requirements & Configuration

In [2]:
#!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-0.20.0-py3-none-any.whl (17 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-0.20.0


In [6]:
import pymongo
from pprint import pprint
import pandas as pd
import requests
import json
import os
import time

import numpy as np
from dotenv import dotenv_values

Twitter API is special in numerous ways. It is not possible to simply call the request() function on an URL. At least not to get the data I want. Twitter usually only allows to extract tweets 14 days back. However, I have an academic account which gives me full archive access. But because Twitter only returns a max of 500 tweets per request, we have to call the API multiple times until no more tweets matching our search criteria are returned. For this, Twitter sends a token with each request which allows us to continue where we left off. We will look at tweets starting from 11 June 2019 - which is when Elden Ring was first announced - up to 10 May 2022. 

To access the Twitter API - as well as our mongodb database, we will need different credentials. It is good practice to not include passwords and usernames in the code. The dotenv library provides an easy way to access these credentials from outside this notebook. The following code block extracts the credentials we have saved in a .env file and stores them in variables.

In [7]:
config = dotenv_values(".env")
USER = config['USER']
PASSWORD = config['PASSWORD']
BEARER_TOKEN = config['BEARER_TOKEN']

In [8]:
# API and Database details
API_URL = "https://api.twitter.com/2/tweets/search/all"
CNX_STR = "mongodb+srv://" + USER + ":" + PASSWORD + "@cluster0.tbqzv.mongodb.net"
DB_NAME = "elden_ring"
COLL_TWEETS = "tweets"
COLL_USERS = "users"

In [9]:
# connection to MongoDB
client = pymongo.MongoClient(CNX_STR)
db = client[DB_NAME]
twitter = db[COLL_TWEETS]
users = db[COLL_USERS]

## ETL

### Remove all existing documents -> Reset collection

IMPORTANT: I will not provide my bearer token with this project. Only reset the collections if you have access to the full archive search from Twitter. 

In [31]:
#twitter.drop()
#twitter.count_documents({})

0

In [13]:
#users.drop()
#users.count_documents({})

0

### Define Query Parameters

In [44]:
# define query parameters 
query = "elden ring lang:en -is:retweet -is:reply"  # returns every tweet containing the words elden and ring which have been classified as english, excluding retweets and replies
start_time = "2019-06-11T00:00:00.000Z"  # Elden Ring announcement date (use date close to end_time for testing purposes)
end_time = "2022-05-10T23:59:59.000Z"
max_results = "500"
tweet_fields = "created_at,author_id,geo,in_reply_to_user_id,lang,public_metrics" # https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet
user_fields = 'username,location,public_metrics' 
file_counter = 0
expansions = 'author_id'

# put query parameters in a list
query_params = {'query': query,'tweet.fields': tweet_fields, 'user.fields': user_fields,  \
                'start_time': start_time, 'end_time': end_time, 'max_results': max_results,\
                'expansions': expansions}



headers = {"Authorization": "Bearer " + BEARER_TOKEN}

###  Fetch data

In [45]:
tweet = []
user = []
while True:
    # get results according to url and query
    response = requests.request("GET", API_URL, headers=headers, params=query_params)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)

    # combine data to one
    json_response = response.json()
    if 'data' in json_response:
        tweet = tweet + json_response['data']
        user = user + json_response['includes']['users']
        
    # write data into mongoDB collection
    twitter.insert_many(json_response['data'])
    users.insert_many(json_response['includes']['users'])
    
    # check if more data available, if yes continue process
    if 'meta' in json_response:
        if 'next_token' in json_response['meta']:
            query_params['next_token'] = json_response['meta']['next_token']
            next_token = json_response['meta']['next_token']
          #  logging.info("Fetching next few tweets, next_token: ", query_params['next_token'])
            time.sleep(5)
        else:
            if 'next_token' in query_params:
                del query_params['next_token']
            break
    else:
        if 'next_token' in query_params:
            del query_params['next_token']
        break

There's a limit of 10 million tweets I can access with an academic Twitter account. It also took several hours to complet the request. I will save the data in a txt file to ensure I don't have to run the code again in case something goes wrong. 

In [33]:
# open file in write mode
with open('data/tweets.txt', 'w') as fp:
    for item in tweet:
        # write each tweet on a new line
        fp.write("%s\n" % item)
        
    print('Done')

Done


In [12]:
# open file in write mode
with open('data/users.txt', 'w') as fp:
    for item in user:
        # write each user on a new line
        fp.write("%s\n" % item)
        
    print('Done')

Done


Executing the next cells loads the data back into memory

In [37]:
tweet = []

# open file and read the content in a list
with open('data/tweets.txt', 'r') as fp:
    for line in fp:
        # remove linebreak 
        # linebreak is the last character of each line
        x = line[:-1]

        # add current item to the list
        tweet.append(x)
    

In [3]:
user = []

# open file and read the content in a list
with open(r'data/users.txt', 'r') as fp:
    for line in fp:
        # remove linebreak 
        # linebreak is the last character of each line
        x = line[:-1]

        # add current item to the list
        user.append(x)

### Insert into MongoDB

In [30]:
twitter.insert_many(tweet);


In [46]:
# count number of documents inserted
twitter.count_documents({})

2457

In [12]:
c = jokes.aggregate([
      {"$limit": 1},
])

for doc in c:
     pprint(f"{doc}"[:500])

("{'_id': ObjectId('612d3d433179130d1c6be6b7'), 'id': 2, 'joke': 'MacGyver can "
 'build an airplane out of gum and paper clips. Chuck Norris can kill him and '
 "take it.', 'categories': []}")


In [47]:
c = twitter.aggregate([
      {"$limit": 5},
])

pd.DataFrame(c)

Unnamed: 0,_id,id,public_metrics,text,lang,created_at,author_id
0,627f5ac7e12173451122df69,1524177426024128513,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",I won 6 achievements in Elden Ring for 110 poi...,en,2022-05-10T23:59:50.000Z,1657944079
1,627f5ac7e12173451122df6a,1524177422928986113,"{'retweet_count': 0, 'reply_count': 3, 'like_c...",So uh I guess my boyfriend is reddit viral for...,en,2022-05-10T23:59:50.000Z,3116155202
2,627f5ac7e12173451122df6b,1524177395380473856,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",WORLDWIDE GIVEAWAY! \n\nEnter for a chance to ...,en,2022-05-10T23:59:43.000Z,823286174173175808
3,627f5ac7e12173451122df6c,1524177351503974400,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",Check out my friend IckyMarshmallow playing so...,en,2022-05-10T23:59:33.000Z,520580934
4,627f5ac7e12173451122df6d,1524177327646814210,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",✧Stomach Flu &amp; Stream✧ | Elden Ring First ...,en,2022-05-10T23:59:27.000Z,1405357750339702787


### Transform

In [14]:
# assign id to _id and remove id
c = jokes.aggregate([
    {"$project": {"_id": "$id", "joke": 1, "categories": 1}},
])
pd.DataFrame(c)

Unnamed: 0,joke,categories,_id
0,MacGyver can build an airplane out of gum and ...,[],2
1,Chuck Norris doesn't read books. He stares the...,[],3
2,"If you ask Chuck Norris what time it is, he al...",[],4
3,"Since 1940, the year Chuck Norris was born, ro...",[],6
4,Chuck Norris sheds his skin twice a year.,[],12
...,...,...,...
554,Chuck Norris can build a snowman out of rain.,[],615
555,Chuck Norris made the sun by rubbing his hands...,[],616
556,"Chuck Norris doesn't turn on his faucet, he st...",[],617
557,Chuck Norris puts sunglasses on to protect the...,[],618


## Data analysis

Quisque sit amet turpis lectus. Phasellus tincidunt mi metus, et ornare ipsum consectetur eu. Cras accumsan purus vel leo viverra, at mollis neque interdum. Sed non ultrices odio, vitae sodales neque. Quisque diam odio, gravida quis auctor ut, aliquet ac ex. Integer venenatis elit ex, vitae imperdiet tortor malesuada quis. Vestibulum dignissim est sed libero viverra interdum.

### Categories

In [87]:
c = jokes.aggregate([
    {"$project": {"joke": 0}},
    {"$unwind": "$categories"},
    {"$group": {"_id": "$categories", "count": {"$sum": 1}}},
 ])

pd.DataFrame(c)

Unnamed: 0,_id,count
0,nerdy,105


In [88]:
c = jokes.aggregate([
    {"$match": {"categories":  {"$in" : ["nerdy"]}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,id,joke,categories
0,612cfd7d9d461a7ec9a683d6,20,The Chuck Norris military unit was not used in...,[nerdy]
1,612cfd7d9d461a7ec9a683db,26,Chuck Norris is the only human being to displa...,[nerdy]
2,612cfd7d9d461a7ec9a683e4,36,Chuck Norris originally appeared in the &quot;...,[nerdy]
3,612cfd7d9d461a7ec9a68402,69,Scientists have estimated that the energy give...,[nerdy]
4,612cfd7d9d461a7ec9a68549,412,Chuck Norris knows the last digit of pi.,[nerdy]
...,...,...,...,...
100,612cfd7d9d461a7ec9a685d5,565,Chuck Norris can make a class that is both abs...,[nerdy]
101,612cfd7d9d461a7ec9a685d6,566,Chuck Norris could use anything in java.util.*...,[nerdy]
102,612cfd7d9d461a7ec9a685d7,567,Code runs faster when Chuck Norris watches it.,[nerdy]
103,612cfd7d9d461a7ec9a685d8,584,Only Chuck Norris shuts down websites without ...,[nerdy]


### Jokes

Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec in risus sed augue blandit tincidunt eu nec leo. Phasellus suscipit ex ut luctus auctor. Mauris efficitur finibus nunc, gravida pulvinar metus commodo eget. Quisque quis orci vehicula, maximus tellus sit amet, dignissim ligula. Proin auctor, tellus eget tempus imperdiet, nunc nisi laoreet tellus, nec viverra ipsum quam in quam. 

Nam ut pellentesque arcu. Ut faucibus elit enim, nec tincidunt massa mattis id. Cras tortor urna, tempus eu viverra quis, suscipit sed magna. Mauris eget eleifend leo, ut tristique justo. In quis lectus eu neque euismod bibendum non in mi. In lobortis iaculis pulvinar. Morbi et mi neque. Etiam maximus elementum metus, non auctor dui eleifend ac.

In [89]:
c = jokes.aggregate([
    {"$match": {"joke":  {"$regex" : "chuck",  "$options": ""}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,id,joke,categories
0,612cfd7d9d461a7ec9a68404,72,How much wood would a woodchuck chuck if a woo...,[]
1,612cfd7d9d461a7ec9a6856e,456,All browsers support the hex definitions #chuc...,[nerdy]


In [90]:
c = jokes.aggregate([
    {"$project": {"joke": 1}},
    {"$match": {"joke":  {"$not": {"$regex" : "chuck",  "$options": "i"}}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,joke
0,612cfd7d9d461a7ec9a6843b,"There is in fact an 'I' in Norris, but there i..."
1,612cfd7d9d461a7ec9a6843d,An anagram for Walker Texas Ranger is KARATE W...
2,612cfd7d9d461a7ec9a6844d,"Superman once watched an episode of Walker, Te..."
3,612cfd7d9d461a7ec9a68450,Movie trivia: The movie &quot;Invasion U.S.A.&...
4,612cfd7d9d461a7ec9a68457,"Once you go Norris, you are physically unable ..."
5,612cfd7d9d461a7ec9a6848d,Crime does not pay - unless you are an underta...
6,612cfd7d9d461a7ec9a6854a,Those aren't credits that roll after Walker Te...


In [91]:
c = jokes.aggregate([
    {"$match": {"joke":  {"$regex" : "chuck",  "$options": ""}}},
])

pd.DataFrame(c)

Unnamed: 0,_id,id,joke,categories
0,612cfd7d9d461a7ec9a68404,72,How much wood would a woodchuck chuck if a woo...,[]
1,612cfd7d9d461a7ec9a6856e,456,All browsers support the hex definitions #chuc...,[nerdy]


Curabitur vel magna nec ipsum pulvinar imperdiet vitae vitae nisi. Pellentesque mattis ultricies diam eu cursus. Maecenas eleifend ante arcu, at feugiat erat eleifend eu. In volutpat faucibus dui, sed faucibus ligula faucibus et. Maecenas convallis sodales sollicitudin. Ut consectetur, arcu ac imperdiet rutrum, massa nisi sollicitudin odio, vel mattis mi augue et sem. 

Fusce semper porta risus, vitae hendrerit mauris congue vitae. Praesent venenatis varius lacus. Cras tempor augue lectus, at iaculis ex pretium sit amet. In hac habitasse platea dictumst. Nunc pharetra est eu pellentesque hendrerit. Ut nec varius sem. Morbi eu elit id lacus laoreet pharetra.

## Conclusions

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. 

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. 

Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. 

## Remarks

### Learnings

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

In [4]:
%%HTML
<style>
/* display:none  -> hide In/Out column */
/* display:block -> show In/Out column */
div.prompt {display:none}
</style>