## Coinbase Pro (GDAX) Account Report.
Contains information about all orders (buys,sells,trades) made using, and transfers into an out of, the Coinbase Pro (formerly GDAX) platform.
### How to get:
- login to Coinbase.com
- Go to: https://accounts.coinbase.com/profile 
- Select "statements"
- Click on the "Coinbase Pro" tab
- "Generate custom report" with:
    - "account"
    - "all portfolios"
    - "all accounts"
    - Select the desired year



#### Record types

- deposit: Crypto or money has been deposited in the trading (GDAX/Pro) account. Always (I think) from Coinbase. Corresponds to a "Pro/Exchange Deposit" in the Coinbase tx report.
- withdrawal: Crypto or money move FROM Pro/GDAX to coinbase. Shown as a "Pro/Exchange Withdrawal" in the CB tx report.

An "order" consists of 2 "match" records and a "fee" record:
- match: Describes one of the 2 assets (including USD) involved in a sale/trade. Include the asset name and amount. THe sign of the amount tell which way the trade/sale is going. (<0 means it is outgoing, >0 is incoming)
- fee: Often 0, the amount and currency of any fee charged.

### Notes

An "order" is any trade, purchase, or sale that has occurred. A single order ("sell 3 BTC for USD") may have been performed by executing multiple smaller "trades" - but it's the order we are interested.

This code will parse the report and assemble a list of `OrderInfo` instances describing the orders.

All of the records for a given order have the same "order id" field entry.

There will be at least 1, and may be many, "trades" in an order. All records for a given trade will have the same "trade id" value.

Every trade cosists of 2 "match" records, one with a positive "amount" which represents the item received, and another with a negative "amount", representing the item or currency paid of given in exchange.

A trade may or may not have a "fee" record - describing any fee paid.

For each order we want:

- market:        'BTC-USD', for instance
- timestamp:     the timestamp of the latest executed trade (arbitrary choice, in reality)
- order_id:      the coinbase order ID
- unit_given:     what was sold/given? ('BTC', 'ETH'...)
- amount_given:   the amount of the currency being sold
- unit_received: payment unit ( 'USD', USDC, ETH... )
- amount_received: the amount received in the trades (no fees)
- unit_price:    In practice this is calculated as (amount_received / amount_given) 
- fees_paid:     the total fees for the order


In [None]:
# allow import of local fifo-tool stuff
import os
import sys
sys.path.insert(0, os.path.abspath('../src'))

In [None]:
from typing import Dict,List
from datetime import datetime
import json
import numpy as np
import pandas as pd

from models.acquisition import Acquisition
from models.disposition import Disposition
from models.stash import Stash

In [None]:
def read_report_csv(file_path):
    """Read a coinbase pro 'accounts' report and return a pandas dataframe.
        conversions done:
            'time' - parsed into a datestamp
            'trade id' - read as a string
    """
    date_flds = ['time']
    forced_dtypes = {'trade id': str}
    return pd.read_csv(file_path, skiprows=0, parse_dates=date_flds, dtype=forced_dtypes)


In [None]:
#help(pd.read_csv)

In [None]:
def get_orders(accounts_rpt):
    """Given a report dataframe, select the appropriate records and group them
        into per-order dataframes. Return a list of them
    """
    mask = (accounts_rpt['type']=='match') | (accounts_rpt['type']=='fee')
    matches = accounts_rpt[mask] # get the matches and fees
    orders_df = matches.groupby('order id') # group into orders
    orders = [orders_df.get_group(id) for id in  orders_df.groups.keys()]
    return orders


In [None]:

class CbProOrderInfo:

    def __init__(self, order_id: str, timestamp: float, unit_given: str,
                unit_received: str, amount_given: float, amount_received: float,
                unit_price: float,  fees: float) -> "OrderInfo":
        self.order_id: str = order_id
        self.timestamp: float = timestamp
        self.unit_given: str = unit_given
        self.unit_received: str = unit_received
        self.amount_given = amount_given
        self.amount_received = amount_received
        self.unit_price = unit_price
        self.fees = fees

    def to_json_dict(self) -> Dict:
        return  {
            "order_id": self.order_id,
            "timestamp": self.timestamp,
            "unit_given": self.unit_given,
            "unit_received": self.unit_received,
            "amount_given": self.amount_given,
            "amount_received": self.amount_received,
            "unit_price": self.unit_price,
            "fees": self.fees
        }

    def to_disposition(self, asset) -> 'Disposition':
        assert self.unit_given == asset, f'Order is not a {asset} disposition'
        return Disposition(
            self.timestamp,
            self.unit_given, # asset_type sold
            self.amount_given, # asset_amount
            self.unit_price, # asset_price,
            self.fees, # fees
            f'CB Pro Order Id: {self.order_id}', # reference
            "" #comment
        )

    def to_acquisition(self, asset) -> 'Acquisition':
        assert self.unit_received == asset, f'Order is not a {asset} acquisition'
        return Acquisition(
            self.timestamp,
            self.unit_received, # asset_type bought
            self.amount_received, # asset_amount
            self.unit_price, # asset_price,
            self.fees, # fees
            f'CB Pro Order Id: {self.order_id}', # reference            
            "" #comment
        )

def parse_order(order, asset):
    """ Parse the trades in an order dataframe.

        returns a CbProOrderInfo instance if the order involves the given asset
        An order is a DataFrame

    """
    ID_LBL = 'order id'
    UNIT_LBL = 'amount/balance unit'
    order_id = order[ID_LBL].values[0]  # is in every record
    timestamp = max(order['time']).timestamp()
    # matches are about the item, fees are about fees
    matches = order[order['type'] == 'match']
    fees =  order[order['type'] == 'fee']

    # The item being received (bought, usually) has a positive amount, 
    # the one given has a negative one
    units = np.unique(order[UNIT_LBL].values) 
    # units 2-element array containing the 2 units
    mask0 = order[UNIT_LBL] == units[0]
    mask1 = order[UNIT_LBL] == units[1]
    amounts = (order[mask0]['amount'].values.sum(), order[mask1]['amount'].values.sum())
    # we are going to assume that 1 of the amounts is negative, the oher positive
    (given_idx, rcvd_idx) = (0,1) if amounts[1]>0 else (1,0)
    unit_given = units[given_idx]
    amount_given = abs(amounts[given_idx])
    unit_received = units[rcvd_idx]
    amount_received = abs(amounts[rcvd_idx])
    # "price" is always cash/asset, so we do need to know whether received or given is the asset
    if unit_given == asset:    
        unit_price = amount_received/amount_given # we sold crypto
    else:
        unit_price = amount_given/amount_received # we bought crypto
    fees = abs(fees['amount'].sum()) # fees are reported as < 0. We ALWAYS want to describe a fee as positive
    if unit_given == asset or unit_received == asset:  # ignore if it's  not the asset we're tracking 
        return CbProOrderInfo(order_id, timestamp, unit_given, unit_received,
                              amount_given, amount_received, unit_price, fees)
    else:
        return None



In [None]:
def process_file( year: str, assets: List[str]) -> None:
    filebase = f'local_data/cbpro-account-{year}'
    main_df = read_report_csv(filebase+'.csv')
    orders = get_orders(main_df)
    
    for asset in assets:
        infos = [i for i in [parse_order(o, asset) for o in orders] if i] # filter out Nones
        acqs = []
        disps = []
        for info in infos:
            if info.unit_given == asset:  # it's a sale/disposition
                disps.append( info.to_disposition(asset) )
                #print("Disp!")
            else: # purchase/acquisition
                acqs.append( info.to_acquisition(asset) )
                #print("Acq!")
        data = Stash(asset, f"Coinbase Pro {asset} orders - {year}", acqs, disps)
        
        #json.dumps(jd)
        with open(filebase+f'-{asset}.json', 'w') as f:
            jd = data.to_json_dict()
            json.dump(jd, f, indent=2)

In [None]:
for yr in ['2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']:
    process_file(yr, ['BTC', 'ETH'] ) 

In [None]:
# Working/testing/messing about starts here

In [None]:
o = orders[2]
infos[0].to_json_dict()

In [None]:
#matches = o[o['type']=='match']
#matches

In [None]:
units = np.unique(o['amount/balance unit'].values)
units

In [None]:
o['order id'].values[0] # order id

In [None]:
mask0 = o['amount/balance unit'] == units[0]
mask1 = o['amount/balance unit'] == units[1]
o[mask0]['amount'].values.sum(),  o[mask1]['amount'].values.sum(),

In [None]:
o['amount/balance unit'].values[1] # currency used

In [None]:
max(o['time']) # timestamp

In [None]:
btc_mask = o['amount/balance unit']=='BTC'

In [None]:
o[btc_mask]['amount'].sum() # amount of BTC

In [None]:
o[~btc_mask]['amount'].sum() # amount of $

In [None]:
[ o[o['amount/balance unit']!='BTC']['amount'].sum() for o in orders]

In [None]:
mask1 = main_df['type']=='match'

In [None]:
mask2 = main_df['type']=='fee'

In [None]:
mask1.value_counts()

In [None]:
mask2.value_counts()

In [None]:
foo = mask1 | mask2
foo.values, foo.value_counts()

In [None]:
mask1.values

In [None]:
mask2.values

In [None]:
foo.values