### Import necessary libraries, packages, and dependencies 
- Use os to access the operating system
- Use pandas to create DataFrames and perform analysis 
- Use plotly to generate visuals
- Use dotenv to access .env variables 
- Use sqlalchemy to connect to postgres

In [1]:
import os
import pandas as pd
import plotly.express as px
from dotenv import load_dotenv
from sqlalchemy import create_engine

### Connect to database and retrieve necessary data
- Connect to the 'arbi_db' database 
- Retrieve all data from the 'luffy' table

In [2]:
# Load environment variables from .env file
load_dotenv()

# Access the variables 
psql_username = os.getenv('PSQL_USERNAME')
psql_password = os.getenv('PSQL_PASSWORD')
psql_host = os.getenv('PSQL_HOST')
psql_port = os.getenv('PSQL_PORT')
db_name = os.getenv('DB_NAME')

# Define the database url
db_url = f"postgresql://{psql_username}:{psql_password}@{psql_host}:{psql_port}/{db_name}" 

# Create the engine object
engine = create_engine(db_url)

# Write ethe SQL query
query = 'SELECT * FROM luffy'

# Read the SQL query into a DataFrame
luffy_df = pd.read_sql(query, engine)

# Show the DataFrame's head
luffy_df.head()


Unnamed: 0,trade_count,current_datetime,currency,volume,buy_exchange,buy_price,total_purchase_amount,sell_exchange,sell_price,total_sale_amount,profit,spread_percentage,wallet_balance
0,1,2024-02-28T13:20:01.816371,bitcoin,0.028168,Poloniex,59567.4,1677.892044,Binance,59971.29,1689.268801,11.376757,0.478039,5653.376757
1,2,2024-02-28T13:20:01.816371,bitcoin,0.028168,Poloniex,59567.4,1677.892044,Bitstamp,60206.0,1695.880102,17.988058,0.872063,5671.364815
2,3,2024-02-28T13:20:01.816371,bitcoin,0.028168,Poloniex,59567.4,1677.892044,Gemini,60166.89,1694.778454,16.886409,0.806406,5688.251224
3,4,2024-02-28T13:20:01.816371,bitcoin,0.028168,Poloniex,59567.4,1677.892044,Kraken,60078.0,1692.274604,14.38256,0.65718,5702.633784
4,5,2024-02-28T13:20:01.816371,ethereum,0.257774,Poloniex,3238.89,834.901748,Binance,3281.81,845.965409,11.063662,1.125145,5713.697445


### Data Analysis
- Profit Distribution across Trades
- Hourly Profit Trend
- Spread Percentage Analysis 
- Exchange Performance 
- Time Series Analysis

-Note: Limited data collection - Program ran every 30 seconds over a 4 hour window

In [3]:
# Convert the current_datetime column to only show the date
luffy_df['current_datetime'] = pd.to_datetime(luffy_df['current_datetime'])
luffy_df['current_datetime'] = luffy_df['current_datetime'].apply(lambda x: x.time())

# Set the index to the current_datetime column
luffy_df.set_index('current_datetime', inplace=True)

luffy_df

Unnamed: 0_level_0,trade_count,currency,volume,buy_exchange,buy_price,total_purchase_amount,sell_exchange,sell_price,total_sale_amount,profit,spread_percentage,wallet_balance
current_datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
13:20:01.816371,1,bitcoin,0.028168,Poloniex,59567.4000,1677.892044,Binance,59971.29000,1689.268801,11.376757,0.478039,5653.376757
13:20:01.816371,2,bitcoin,0.028168,Poloniex,59567.4000,1677.892044,Bitstamp,60206.00000,1695.880102,17.988058,0.872063,5671.364815
13:20:01.816371,3,bitcoin,0.028168,Poloniex,59567.4000,1677.892044,Gemini,60166.89000,1694.778454,16.886409,0.806406,5688.251224
13:20:01.816371,4,bitcoin,0.028168,Poloniex,59567.4000,1677.892044,Kraken,60078.00000,1692.274604,14.382560,0.657180,5702.633784
13:20:01.816371,5,ethereum,0.257774,Poloniex,3238.8900,834.901748,Binance,3281.81000,845.965409,11.063662,1.125145,5713.697445
...,...,...,...,...,...,...,...,...,...,...,...,...
16:20:20.079548,1935,solana,66.168303,Poloniex,112.6040,7450.815576,Kraken,113.26000,7494.221983,43.406407,0.382573,153979.734274
16:20:20.079548,1936,xrp,39420.325342,Poloniex,0.5669,22347.382436,Binance,0.56980,22461.701380,114.318943,0.311554,154094.053218
16:20:20.079548,1937,xrp,39420.325342,Poloniex,0.5669,22347.382436,Bitstamp,0.57048,22488.507201,141.124765,0.431505,154235.177983
16:20:20.079548,1938,xrp,39420.325342,Poloniex,0.5669,22347.382436,Gemini,0.57031,22481.805746,134.423309,0.401517,154369.601292


##### Profit Distribution across Trades
- Calculate and visualize the distribution of profits across all trades. This provides insights into the consistency and variability of returns. 

In [4]:
profit_dist = px.histogram(
    luffy_df, 
    x='profit', 
    nbins=20, 
    title='Profit Distribution', 
    labels={'profit': 'Profit', 'count':'Frequency'})

profit_dist.show()

##### 30 Sec Profit Trend
- Examine the 30 second trend of profits to identify patterns and understand the overall performance of the arbitrage over time.

In [5]:
thirty_sec_profit = luffy_df.groupby('current_datetime')['profit'].sum().reset_index()
thirty_sec_profit_plot = px.line(
    thirty_sec_profit, 
    x='current_datetime',
    y='profit', 
    title='30 Second Profit Trend',
    labels={'current_datetime':'Time', 'profit':'30 Sec Profit'})

thirty_sec_profit_plot.show()

##### Spread Percentage Analysis
- Explore the spread percentage to understand how much profit is generated relative to the spread. Helps to access the efficieny of the arbitrage strategy. 

In [6]:
spread_percentage_plot = px.scatter(
    luffy_df, 
    x='spread_percentage',
    y='profit',
    title='Spread Percentage vs Profit',
    labels={'spread_percentage':'Spread Percentage', 'profit':'Profit'})

spread_percentage_plot.show()

##### Exchange Performance
- Evaluate the performance of each exchange in terms of the number of trades and profit. This helps in optimizing the selection of exchanges and diversifying risks.

In [7]:
exchange_performance = luffy_df.groupby('buy_exchange')['profit'].agg(['count', 'sum']).reset_index()
exchange_performance_plot = px.bar(
    exchange_performance,
    x='buy_exchange',
    y=['count', 'sum'],
    barmode='stack',
    title='Exchange Performance',
    labels={'buy_exchange':'Exchange', 'value':'Value'}
)

exchange_performance_plot.show()

##### Average Spread Percentage by Coin
- Calculate the average spread percentge for each coin to identify coins with more favorable arbitrage opportunities. This helps in optimizing the selection of coins and diversifying risks.

In [8]:
avg_spread_by_coin = luffy_df.groupby('currency')['spread_percentage'].mean().reset_index()
avg_spread_by_coin_plot = px.bar(
    avg_spread_by_coin,
    x='currency',
    y='spread_percentage',
    title='Average Spread Percentage for Each Coin',
    labels={'spread_percentage':'Average Spread Percentage', 'currency':'Coin'}
)

avg_spread_by_coin_plot.show()

##### Time Series Analysis
- Analyze the time series data to identify any patterns or trends in the arbitrage opportunities.

In [9]:
time_series = luffy_df.reset_index()

time_series_plot = px.line(
    time_series,
    x='current_datetime', 
    y='profit', 
    title='Time Series Analysis',
    labels={'current_datetime': 'Time', 'profit':'Profit'}
)
time_series_plot.show()