Pull data from API here

PS: 
- given Bitcoin on-chain data, what are the factors that most affect price movement?
- What categories can be identified in the Bitcoin addresses/wallets? (use clustering technique)


Dataset: APIs from Coinmetrics.io, Nansen.AI, Dune Analytics, Graph Protocol.

Analysis piece based on the Bitcoin cryptocurrency on-chain data to analyze important factors and their relationship with asset price. Especial focus on wallets and addresses to classify them into meaningful clusters (i.e. exchanges, miners, whales, institutional investors, retail investors etc) and "old-coin" movement (coins that were purchased a while ago and have not moved, another subset of early 'OG' investors in the space).
This could be replicated to other digital assets, such as Ethereum and its universe of tokens, depending on structure of data pull from APIs cited above.



Lots of EDA and subsequent predictive modeling with regression models + classification modeling for clustering.

A linear regression, a decision tree ensemble, and a neural net should cover your bases in terms of prediction models (LIightGBM is probably the best decision tree ensemble). Your PS is all about factors, so you'll want to pay attention to feature importances.



## Imports
---

In [1]:
# Standards
import pandas as pd
import numpy as np

# API
import requests

# Automating
import time
import datetime
import warnings
import sys

from time import sleep

In [2]:
# pip install coinmetrics-api-client

## API Scraping
---

### Function to Get Posts

Created a custom function that pulls Reddit posts via Pushshift's API.

In [35]:
def get_price_data(ticker):
    """ str -> dataframe
    Return a dataframe of time series trading data for a publicly traded company.
    ticker is a string representing a publicly traded company.
    """
    # Save the URL of API location.
    base_url = "https://www.alphavantage.co/query"
    # Create a request object using the base url, function, ticker symbol, and api key.
    req = requests.get(
        base_url,
        params={
            "function": "TIME_SERIES_DAILY",
            "symbol": ticker,
            "apikey": "PYMPLZKV3ZXBT6RT"}
    )
    # Save the request object as a dictionary by calling .json on it.
    data = req.json()
    # Overwrite the dataframe to return a dictionary of Time Series data.
    data = data['Time Series (Daily)']
    # Convert the dictionary to a dataframe, transpose it, and save to a variable.
    df = pd.DataFrame(data).T
    # Return the dataframe.
    return df

In [2]:
def get_posts(subreddit, n_iter, epoch_right_now): # subreddit name and number of times function should run
    pass
    # store base url variable
    base_url = 'https://api.pushshift.io/reddit/search/submission/?subreddit='
    
    # instantiate empty list    
    df_list = []
    
    # save current epoch, used to iterate in reverse through time
    current_time = epoch_right_now
    
    # set up for loop
    for post in range(n_iter):
        
        # instantiate get request
        res = requests.get(
            
            # requests.get takes base_url and params
            base_url, 
            
            # parameters for get request
            params = {
                
                # specify subreddit
                'subreddit': subreddit,
                
                # specify number of posts to pull
                'size': 500,
                
                # restrict based on default language of subreddit
                'lang': True,
                
                # pull everything from current time backward
                'before': current_time}
        )
        
        # take data from most recent request, store as df
        df = pd.DataFrame(res.json()['data'])
        
        # pull specific columns from dataframe for analysis
        df = df.loc[:, ['title',
                        'author',
                        'selftext',
                        'subreddit',
                        'media_only',                  
                        'score',
                        'created_utc',
                        'id']]
        
        # append to empty dataframe list
        df_list.append(df)
        
        # add wait time to not overload the API
        sleep_time = np.random.randint(1, 3) # to make it look more random
        time.sleep(sleep_time)
        
        # set current time counter back to last epoch in recently grabbed df
        current_time = df['created_utc'].min()     # current time set to before the last post in current request
        
    # return one dataframe for all requests
    return pd.concat(df_list, axis = 0)

In [3]:
# r/CryptoCurrency
crypto_posts = get_posts('CryptoCurrency', 800, 1601815751)

Unnamed: 0,title,author,selftext,subreddit,media_only,score,created_utc,id
0,CHAINLINK and BINANCE: DISRUPTIVE INNOVATION P...,Subayal_Khan,[removed],CryptoCurrency,False,1,1601815750,j4yewp
1,Alternative to a hardware wallet for long term...,crockrox,What would you suggest for someone who wants t...,CryptoCurrency,False,1,1601815458,j4ycrn
2,Alternative to a hardware wallet for long term...,UndeservedLot,[removed],CryptoCurrency,False,1,1601815020,j4y9fk
3,Uniswap - earning Uni tokens,OriginalGravity8,I've got some liquidity in a pool on Uniswap (...,CryptoCurrency,False,1,1601814845,j4y850
4,If I'm someone who cannot afford to buy a whol...,treemull93,I am torn on the functions of BTC. If I put 10...,CryptoCurrency,False,1,1601814822,j4y7yv
...,...,...,...,...,...,...,...,...
95,Voyager (cryptocurrency trade platform) is off...,webnowcompany,[removed],CryptoCurrency,False,1,1570041462,dcedh6
96,eBook Details: Paperback: 590 pages Publisher:...,webnowcompany,[removed],CryptoCurrency,False,1,1570041461,dcedh0
97,UTRUST Integrates Dash Enabling Cryptocurrency...,NibiruHybrid,,CryptoCurrency,False,0,1570041169,dceaxw
98,Bloomberg: Bitcoin Isn't the World's Most-Used...,suriyaa,What’s the world’s most widely used cryptocurr...,CryptoCurrency,False,1,1570040870,dce8hb


In [4]:
# Export data to csv
crypto_posts.to_csv('data/crypto_posts.csv')