# MLT Crypto Working Session

This notebook does not follow a specific order. It simply shows some of the work we did in the afternoon during the meetup.

Big thanks to the organizers!

# Our team's idea

The team's idea was inspired by the following slide we saw during the introduction. Credit: Ashley Smart from Bloomberg.

![Tredn/event price prediction model](img/4.png)

We wanted to create a model that captures the trend and then, using the twitter's sentiment, predict more accurate future values based on the assumption that there is a correlation between Twitter's sentiment and the trading of Cryptocurrencies. 

We just had time to get the tweets, find out the sentiment for the past week and get the bitcoin price for every minute for the past week.

Work to do: create the models!

# Download CSV file with tweets into Colab from Google Drive

This section shows how to download a file in colab from your google drive once the file is public and you have its 'share id'.

This file was created by another member of the group and contains 20 tweets for each day of the past 7 days. Note that the twitter API just serves tweets for the past 7 days.

In [0]:
import requests

def download_file_from_google_drive(id, destination):
    def get_confirm_token(response):
        for key, value in response.cookies.items():
            if key.startswith('download_warning'):
                return value

        return None

    def save_response_content(response, destination):
        CHUNK_SIZE = 32768

        with open(destination, "wb") as f:
            for chunk in response.iter_content(CHUNK_SIZE):
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)

    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)

In [0]:
# share URL -> https://drive.google.com/open?id=XXXXXXXXXXXXXXXXXXXXXX
# TAKE ID FROM SHAREABLE LINK
file_tweets = 'XXXXXXXXXXXXXXXXXXXXXX'
# DESTINATION FILE ON YOUR DISK
destination = './twitter.csv'

# Usage: python google_drive.py drive_file_id destination_file_path
download_file_from_google_drive(file_tweets, destination)

In [27]:
!ls -la

total 48
drwxr-xr-x 1 root root  4096 Sep  2 02:27 .
drwxr-xr-x 1 root root  4096 Sep  2 01:25 ..
drwxr-xr-x 4 root root  4096 Aug 30 21:29 .config
-rw-r--r-- 1 root root  2325 Sep  2 02:27 gcp.json
drwxr-xr-x 2 root root  4096 Aug 30 21:39 sample_data
-rw-r--r-- 1 root root   254 Sep  2 02:20 twitter_credentials.json
-rw-r--r-- 1 root root 22286 Sep  2 04:42 twitter.csv


In [28]:
!head twitter.csv

Fri Aug 31 23:59:57 +0000 2018,Haha @Eminem dropped that new album and name dropped $btc in it. Legit #bitcoin #notalike
Fri Aug 31 23:59:51 +0000 2018,"RT @coingecko: Have you tried comparing coins side by side?
Which coin would you compare?
https://t.co/YaqydA8HMh
.
#cryptocurrency #coins…"
Fri Aug 31 23:59:47 +0000 2018,RT @cryptocomicon: Chris DeRose spends an 86 minutes explaining why he likes #Monero in a world dominated by bitcoin maximalist ideologues.…
Fri Aug 31 23:59:45 +0000 2018,RT @BitcoinDood: DNA: The Safest Way to Store your Bitcoin Fortune https://t.co/VxyAjj9Dy2
Fri Aug 31 23:59:42 +0000 2018,"RT @CharlieShrem: As bitcoin grows, more and more people will “claim” to be Satoshi and amass cult like following. Even if Satoshi “returne…"
Fri Aug 31 23:59:38 +0000 2018,"Recover your loss by copy this guy. Deposit Bitcoin


In [32]:
# the file is a two-column csv: date, tweet

import pandas as pd

tweets = pd.read_csv ("./twitter.csv", header=None)

tweets.head()

Unnamed: 0,0,1
0,Fri Aug 31 23:59:57 +0000 2018,Haha @Eminem dropped that new album and name d...
1,Fri Aug 31 23:59:51 +0000 2018,RT @coingecko: Have you tried comparing coins ...
2,Fri Aug 31 23:59:47 +0000 2018,RT @cryptocomicon: Chris DeRose spends an 86 m...
3,Fri Aug 31 23:59:45 +0000 2018,RT @BitcoinDood: DNA: The Safest Way to Store ...
4,Fri Aug 31 23:59:42 +0000 2018,"RT @CharlieShrem: As bitcoin grows, more and m..."


In [31]:
print (tweets.shape)

(140, 2)


# Get tweets using Twython

In this section we show how to get tweets using the Twython lib.

In [33]:
!pip install Twython



In [0]:
import json
from twython import Twython

# Enter your keys/secrets as strings in the following fields
credentials = {}  
credentials['CONSUMER_KEY'] = 'XXXXXXXXXXXXXXXX'
credentials['CONSUMER_SECRET'] = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'  
credentials['ACCESS_TOKEN'] = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx'
credentials['ACCESS_SECRET'] = 'XXXXXXXXXXXXXXXXXXXXX'

# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:  
    json.dump(credentials, file)

# Load credentials from json file
with open("twitter_credentials.json", "r") as file:  
    creds = json.load(file)

python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])

In [35]:
query = {'q': 'bitcoin',
        'result_type': 'popular',
        'count': 10,
        'lang': 'en',
         'until': '2018-08-28'
        }

dict_ = {'user': [], 'date': [],'text': [] }  

for status in python_tweets.search(**query)['statuses']:  
    dict_['user'].append(status['user']['screen_name'])
    dict_['date'].append(status['created_at'])
    dict_['text'].append(status['text'])

# Structure data in a pandas DataFrame for easier manipulation
df = pd.DataFrame(dict_)  
df.head(5)  

Unnamed: 0,date,text,user
0,Thu Aug 23 21:04:19 +0000 2018,BREAKING: The U.S. Securities and Exchange Com...,coindesk
1,Mon Aug 27 16:28:15 +0000 2018,Warren Buffett is reportedly exploring the ide...,APompliano
2,Thu Aug 23 10:55:42 +0000 2018,We didn't need Silk Road\nWe didn't need Mt Go...,alistairmilne
3,Sat Aug 25 20:05:00 +0000 2018,"It's ""HODL"" and it's the most important acrony...",CNBC
4,Thu Aug 23 21:16:27 +0000 2018,FULL STORY: SEC Says It Will 'Review' Bitcoin ...,coindesk


# Get tweets using Twitter's python API

In this section we show how to get the tweets using the official twitter python lib.

In [36]:
!pip install python-twitter



In [0]:
import twitter

api = twitter.Api(consumer_key='XXXXXXXXXXXXXXXXXXXXX',
  consumer_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
  access_token_key='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
  access_token_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')

In [38]:
search = api.GetSearch("bitcoin") # Replace happy with your search
for tweet in search:
    print(tweet.id, tweet.text)

(1035530107862102017, u'\u201dBitcoin would greatly simplify the movement of global money, it has the potential to become the first worldwide cu\u2026 https://t.co/yrl3m2a9iZ')
(1035543074758447104, u"The rapper Eminem has given bitcoin's public awareness a boost with a mention of the cryptocurrency on his just-lau\u2026 https://t.co/ykgTwkihnb")
(1035824009110323200, u'JUST IN: A new version of Bitcoin Core is coming, including a new key language and more. CoinDesk assembles the ear\u2026 https://t.co/O6ulmwGHFX')
(1036113799030304768, u'RT @kubitx: How can I access my money if I buy Crypto?\n\nCheck out our solutions for major headache most Crypto lovers often ask\n\n#Crypto #C\u2026')
(1036113781783494657, u'RT @Pink_About_it: The Democrat party is dying---- \n\nAm I the only one who sees the irony in them holding a political rally at a funeral?\u2026')
(1036113776364466178, u'RT @CryptoSouk: Crypto Souk - Creating a Cryptocurrency Trading Can Be Accessed By Everyone\n\nhttps://t.co

# Sentiment analysis using GCP

Google offers a lot of services in their GCP. One of them is sentiment analysis. In this section we use GCP to get the sentiment of the tweets we collected.

This API outputs a score based on the content we sent. In our case, the content is the text in the tweets.

In [0]:
#https://drive.google.com/open?id=XXXXXXXXXXXXXXXXXXXXXXXXXXX
# TAKE ID FROM SHAREABLE LINK
gcp_credentials = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
# DESTINATION FILE ON YOUR DISK
destination = './gcp_credentials.json'

# Usage: python google_drive.py drive_file_id destination_file_path
download_file_from_google_drive(gcp_credentials, destination)

In [40]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "/content/gcp_credentials.json"
!echo $GOOGLE_APPLICATION_CREDENTIALS

/content/gcp_credentials.json


In [0]:
from google.cloud import language_v1
from google.cloud.language_v1 import enums
import six


def sample_analyze_sentiment(content):
    '''
    content = 'Your text to analyze, e.g. Hello, world!'
    '''
    client = language_v1.LanguageServiceClient()

    if isinstance(content, six.binary_type):
        content = content.decode('utf-8')

    type_ = enums.Document.Type.PLAIN_TEXT
    document = {'type': type_, 'content': content}

    response = client.analyze_sentiment(document)
    sentiment = response.document_sentiment
    return sentiment.score

In [48]:
content = 'Bitcoin value is going to raise! Great chance to become rich!!'
score = sample_analyze_sentiment (content)
print ('Example 1: \"{0}\"'.format(content))
print('Score: {}'.format(score))

content = 'Bitcoin value is gonna drop! Don''t buy!!'
score = sample_analyze_sentiment (content)
print ('Example 2: \"{0}\"'.format(content))
print('Score: {}'.format(score))

Example 1: "Bitcoin value is going to raise! Great chance to become rich!!"
Score: 0.40000000596
Example 2: "Bitcoin value is gonna drop! Dont buy!!"
Score: -0.20000000298


In [49]:
scores = []

for tweet in tweets[1]:
    scores.append(sample_analyze_sentiment (tweet))
    
scores

[0.6000000238418579,
 0.0,
 -0.4000000059604645,
 0.8999999761581421,
 -0.20000000298023224,
 -0.20000000298023224,
 0.4000000059604645,
 0.5,
 0.0,
 0.5,
 0.0,
 0.699999988079071,
 -0.800000011920929,
 0.30000001192092896,
 -0.10000000149011612,
 0.0,
 0.10000000149011612,
 0.0,
 0.5,
 0.0,
 0.0,
 0.20000000298023224,
 0.10000000149011612,
 0.4000000059604645,
 0.0,
 0.20000000298023224,
 0.800000011920929,
 0.20000000298023224,
 0.10000000149011612,
 0.0,
 0.0,
 0.20000000298023224,
 0.10000000149011612,
 0.10000000149011612,
 0.20000000298023224,
 0.0,
 0.10000000149011612,
 -0.20000000298023224,
 0.0,
 0.20000000298023224,
 0.20000000298023224,
 0.5,
 0.10000000149011612,
 0.10000000149011612,
 0.0,
 0.4000000059604645,
 0.10000000149011612,
 0.5,
 -0.20000000298023224,
 0.30000001192092896,
 0.10000000149011612,
 0.8999999761581421,
 0.20000000298023224,
 -0.10000000149011612,
 0.20000000298023224,
 0.30000001192092896,
 0.0,
 0.5,
 -0.5,
 0.10000000149011612,
 0.10000000149011612

Once we have the sentiment for all the tweets, let's calculate the general sentiment for the day...

In [50]:
import numpy as np

npscore = np.array(scores)

npscore.reshape(7,20).mean(1)

array([1.39999999e-01, 1.35000002e-01, 1.85000001e-01, 1.10000002e-01,
       3.50000009e-02, 1.49011612e-09, 3.50000020e-02])