# Venmo Transactional Data

    Recently I came across an interesting dataset that was collected using the Venmo API and scraping public transactions on Venmo from July-October 2018 and then again in January-February 2019. A more complete description of the dataset can be found [here, at Sa7mon's github](https://github.com/sa7mon/venmo-data)

### Reading in the data

The data is stored in binary json, or bson. First, I will read in only a subset of the entire dataset to get started. I'd like to store it in a pandas dataframe and ultimately, export some other aggregations as csv files.

In [23]:
import pandas as pd
import bson #dont use pip install bson, use pip install pymongo instead
import sys

### Creating a one-time approach

First, I start by writing line by line and testing along the way. Once I have a functional bit of code, I will turn that into a function

In [24]:
# Simple data processing

#set the source of the data
venmo_transactions = bson.decode_file_iter(
    open('F:/Datasets/venmo/venmo.bson', 'rb'))

#create empty dict to store items of interest
conversion_dict = dict()

#loop through transactions and store info of interest

stop_at = 50000  #set number of iterations, and therefore records, to process
for c, d in enumerate(venmo_transactions):
    if d['payment'] != None:
        if d['payment']['target'] == None:
            target_username = d['payment']['target']['user']['username']
            target_user_id = d['payment']['target']['user']['id']
        else:
            target_username = None
            target_user_id = None
        target_type = d['payment']['target']['type']
        actor_username = d['payment']['actor']['username']
        actor_user_id = d['payment']['actor']['id']
        note = d['payment']['note']
        transaction_id = d['payment']['id']
        date_created = d['date_created']
        overall_type = d['type']

    else:
        target_type = None
        actor_username = None
        actor_user_id = None
        note = None
        transaction_id = None

    record = {
        'transaction_id': transaction_id,
        'actor_user_id': actor_user_id,
        'actor_username': actor_username,
        'target_user_id': target_user_id,
        'target_username': target_username,
        'target_type': target_type,
        'overall_type': overall_type,
        'transaction_note': note,
        'date_created': date_created
    }
    conversion_dict[c] = record

    if c == stop_at:  #exit on stop_at iteration
        break

#create a dataframe from the dictionary
generated_df = pd.DataFrame.from_dict(conversion_dict, orient='index')

#export dataframe as csv
generated_df.to_csv(
    'C:/Users/Stuart/Documents/GitHub/venmo/data/output/smallerdf.csv')

### Defining the read/export as a function

Hooray! The single use approach worked but I dont want to have to change constants throughout the code if I want an export of a different size. With that, I'll define a function to do the same thing. It wont be highly generalized as navigating the json via the python dictionary is pretty specific. I'm not sure how one would get around that in a flexible way.

In [25]:
def read_export_venmo_bson(filepath='',
                           exportpath='',
                           filename='venmo_export',
                           records=1000):
    """ reads bson venmo data from local file at filepath, 
        captures transaction details and stores as exported csv at exportpath with the filename and '.csv'"""

    venmo_transactions = bson.decode_file_iter(open(filepath, 'rb'))

    #create empty dict to store items of interest
    conversion_dict = dict()

    #loop through transactions and store info of interest
    for c, d in enumerate(venmo_transactions):
        if c == records:  #exit on records iteration

            #generate dataframe from dictionary storing select info from above
            generated_df = pd.DataFrame.from_dict(conversion_dict,
                                                  orient='index')

            #export to exportpath as csv
            generated_df.to_csv(str(exportpath) + str(filename) + '.csv')
            print('Function ran successfully.', str(records),
                  'records exported into table at:',
                  exportpath + filename + '.csv')

            break
        else:
            if d['payment'] != None:
                if d['payment']['target'] == None:
                    target_username = d['payment']['target']['user'][
                        'username']
                    target_user_id = d['payment']['target']['user']['id']
                else:
                    target_username = None
                    target_user_id = None
                target_type = d['payment']['target']['type']
                actor_username = d['payment']['actor']['username']
                actor_user_id = d['payment']['actor']['id']
                note = d['payment']['note']
                transaction_id = d['payment']['id']
                date_created = d['date_created']
                overall_type = d['type']

            else:
                target_type = None
                actor_username = None
                actor_user_id = None
                note = None
                transaction_id = None

            record = {
                'transaction_id': transaction_id,
                'actor_user_id': actor_user_id,
                'actor_username': actor_username,
                'target_user_id': target_user_id,
                'target_username': target_username,
                'target_type': target_type,
                'overall_type': overall_type,
                'transaction_note': note,
                'date_created': date_created
            }
            conversion_dict[c] = record

### Running the function

Time to see how it does!

In [26]:
bson_filepath = 'F:/Datasets/venmo/venmo.bson'
export_filepath = 'C:/Users/Stuart/Documents/GitHub/venmo/data/output/'
filename = 'transactions'
read_export_venmo_bson(bson_filepath, export_filepath, filename, 50000)

Function ran successfully. 50000 records exported into table at: C:/Users/Stuart/Documents/GitHub/venmo/data/output/transactions.csv
