## Crypto Arbitrage

In this Challenge, you'll take on the role of an analyst at a high-tech investment firm. The vice president (VP) of your department is considering arbitrage opportunities in Bitcoin and other cryptocurrencies. As Bitcoin trades on markets across the globe, can you capitalize on simultaneous price dislocations in those markets by using the powers of Pandas?

For this assignment, you’ll sort through historical trade data for Bitcoin on two exchanges: Bitstamp and Coinbase. Your task is to apply the three phases of financial analysis to determine if any arbitrage opportunities exist for Bitcoin.

This aspect of the Challenge will consist of 3 phases.

1. Collect the data.

2. Prepare the data.

3. Analyze the data. 



###  Import the required libraries and dependencies.

In [1]:
import pandas as pd
import os
import io
from pathlib import Path
%matplotlib inline

#os.path.exists(image_pathname_for_plot_background) #os.path.exists(image_pathname_for_plot_background) ## Collect the Data

To collect the data that you’ll need, complete the following steps:

Instructions. 

1. Using the Pandas `read_csv` function and the `Path` module, import the data from `bitstamp.csv` file, and create a DataFrame called `bitstamp`. Set the DatetimeIndex as the Timestamp column, and be sure to parse and format the dates.

2. Use the `head` (and/or the `tail`) function to confirm that Pandas properly imported the data.

3. Repeat Steps 1 and 2 for `coinbase.csv` file.

### Step 1: Using the Pandas `read_csv` function and the `Path` module, import the data from `bitstamp.csv` file, and create a DataFrame called `bitstamp`. Set the DatetimeIndex as the Timestamp column, and be sure to parse and format the dates.

In [2]:
f_file= r"..\Resources\bitstamp.csv"


In [3]:
os.getcwd()

'C:\\Users\\mwj\\Desktop\\FinTech-Workspace\\_ChallengeIII_new'

In [4]:
f_file


'..\\Resources\\bitstamp.csv'

In [5]:
os.path.exists(f_file) 

True

In [6]:
# Read in the CSV file called "bitstamp.csv" using the Path module. 
# The CSV file is located in the Resources folder.
# Set the index to the column "Date"
# Set the parse_dates and infer_datetime_format parameters
#csvpath = Path("Resources/bitstamp.csv")
bitstamp = pd.read_csv(f_file, parse_dates=True, infer_datetime_format=True)
bitstamp = bitstamp.set_index("Timestamp")

### Step 2: Use the `head` (and/or the `tail`) function to confirm that Pandas properly imported the data.

In [7]:
# Use the head (and/or tail) function to confirm that the data was imported properly.
bitstamp.head(10)

Unnamed: 0_level_0,Open,High,Low,Close,BTC Volume,USD Volume,Weighted Price
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-01 00:00:00,13681.04,13681.04,13637.93,$13646.48,3.334553,45482.128785,13639.647479
2018-01-01 00:01:00,13646.48,13658.75,13610.18,$13658.75,2.663188,36361.390888,13653.332816
2018-01-01 00:02:00,13616.93,13616.93,13610.06,$13610.22,0.084653,1152.144036,13610.136247
2018-01-01 00:03:00,13610.27,13639.09,13610.27,$13639.09,7.182986,97856.416478,13623.361128
2018-01-01 00:04:00,13635.35,13636.35,13620.0,$13620.0,1.069665,14582.660932,13632.923329
2018-01-01 00:05:00,13620.0,13634.15,13610.0,$13610.0,4.716162,64226.303028,13618.341726
2018-01-01 00:06:00,13610.0,13650.18,13590.42,$13600.56,26.432759,360108.15563,13623.555198
2018-01-01 00:07:00,13593.99,13595.41,13566.93,$13580.0,10.674241,144961.61118,13580.507983
2018-01-01 00:08:00,13580.0,13580.0,13547.59,$13579.0,19.32237,261942.83355,13556.454543
2018-01-01 00:09:00,13571.28,13571.28,13550.0,$13565.0,0.120942,1641.166577,13569.829917


### Step 3: Repeat Steps 1 and 2 for `coinbase.csv` file.

In [8]:
# Read in the CSV file called "coinbase.csv" using the Path module. 
# The CSV file is located in the Resources folder.
# Set the index to the column "Timestamp"
# Set the parse_dates and infer_datetime_format parameters


#bitstamp = pd.read_csv(f_file, parse_dates=True, infer_datetime_format=True)
#bitstamp = bitstamp.set_index("Timestamp")

In [9]:
f_file= r'.\Resources\bitstamp.csv'

In [10]:
f_file2= r"..\Resources\coinbase.csv"
os.path.exists(f_file2) 

True

In [11]:
#os.chdir()

In [12]:
f_file2= Path("Resources/coinbase.csv")


In [13]:

csvpath = Path("Resources/coinbase.csv")
print(csvpath)
coinbase = pd.read_csv(f_file2, parse_dates=True, infer_datetime_format=True)
coinbase = coinbase.set_index("Timestamp")

Resources\coinbase.csv


FileNotFoundError: [Errno 2] No such file or directory: 'Resources\\coinbase.csv'

In [None]:
# Use the head (and/or tail) function to confirm that the data was imported properly.
coinbase.head(3)

In [None]:
coinbase.tail(3)

## Prepare the Data

To prepare and clean your data for analysis, complete the following steps:

1. For the bitstamp DataFrame, replace or drop all `NaN`, or missing, values in the DataFrame.

2. Use the `str.replace` function to remove the dollar signs ($) from the values in the Close column.

3. Convert the data type of the Close column to a `float`.

4. Review the data for duplicated values, and drop them if necessary.

5. Repeat Steps 1–4 for the coinbase DataFrame.

### Step 1: For the bitstamp DataFrame, replace or drop all `NaN`, or missing, values in the DataFrame.

In [None]:
# For the bitstamp DataFrame, replace or drop all NaNs or missing values in the DataFrame
bitstamp = bitstamp.dropna(how="any")

### Step 2: Use the `str.replace` function to remove the dollar signs ($) from the values in the Close column.

In [None]:
# Use the str.replace function to remove the dollar sign, $
bitstamp.loc[:,'Close'] = bitstamp['Close'].str.replace("$", "",regex=True)

### Step 3: Convert the data type of the Close column to a `float`.

In [None]:
# Convert the Close data type to a float
bitstamp['Close'] = bitstamp['Close'].astype(float)

In [None]:
bitstamp.dtypes

### Step 4: Review the data for duplicated values, and drop them if necessary.

In [None]:
# Review the data for duplicate values, and drop them if necessary
bitstamp = bitstamp.drop_duplicates()

### Step 5: Repeat Steps 1–4 for the coinbase DataFrame.

In [None]:
# Repeat Steps 1–4 for the coinbase DataFrame
coinbase = coinbase.dropna(how="any")
coinbase.loc[:,'Close'] = coinbase[:,'Close'].str.replace("$", "")
coinbase['Close'] = coinbase[:,'Close'].astype(float)
coinbase = coinbase.drop_duplicates()

## Analyze the Data

Your analysis consists of the following tasks: 

1. Choose the columns of data on which to focus your analysis.

2. Get the summary statistics and plot the data.

3. Focus your analysis on specific dates.

4. Calculate the arbitrage profits.

### Step 1: Choose columns of data on which to focus your analysis.

Select the data you want to analyze. Use `loc` or `iloc` to select the following columns of data for both the bitstamp and coinbase DataFrames:

* Timestamp (index)

* Close


In [None]:
# Use loc or iloc to select `Timestamp (the index)` and `Close` from bitstamp DataFrame
bitstamp_sliced = bitstamp[['Close']]
# Review the first five rows of the DataFrame
bitstamp_sliced.head(5)

In [None]:
# Use loc or iloc to select `Timestamp (the index)` and `Close` from coinbase DataFrame
coinbase_sliced = coinbase [['Close']]

# Review the first five rows of the DataFrame
coinbase_sliced.head(5)

### Step 2: Get summary statistics and plot the data.

Sort through the time series data associated with the bitstamp and coinbase DataFrames to identify potential arbitrage opportunities. To do so, complete the following steps:

1. Generate the summary statistics for each DataFrame by using the `describe` function.

2. For each DataFrame, create a line plot for the full period of time in the dataset. Be sure to tailor the figure size, title, and color to each visualization.

3. In one plot, overlay the visualizations that you created in Step 2 for bitstamp and coinbase. Be sure to adjust the legend and title for this new visualization.

4. Using the `loc` and `plot` functions, plot the price action of the assets on each exchange for different dates and times. Your goal is to evaluate how the spread between the two exchanges changed across the time period that the datasets define. Did the degree of spread change as time progressed?

In [None]:
# Generate the summary statistics for the bitstamp DataFrame
bitstamp_sliced.describe()

In [None]:
# Generate the summary statistics for the coinbase DataFrame
coinbase_sliced.describe()

In [None]:
# Create a line plot for the bitstamp DataFrame for the full length of time in the dataset 
# Be sure that the figure size, title, and color are tailored to each visualization
bitstamp_sliced.plot(figsize=(8, 6), title = "Bitstamp", color='tomato')


In [None]:
# Create a line plot for the coinbase DataFrame for the full length of time in the dataset 
# Be sure that the figure size, title, and color are tailored to each visualization
coinbase_sliced.plot(figsize=(6, 4), title = "Coinbase", color='mediumblue')

In [None]:
# Overlay the visualizations for the bitstamp and coinbase DataFrames in one plot
# The plot should visualize the prices over the full lenth of the dataset
# Be sure to include the parameters: legend, figure size, title, and color and label
combined_df = bitstamp_sliced.join(coinbase_sliced, lsuffix='_Bitstamp', rsuffix='_Coinbase')
new_index = combined_df.index
new_index = pd.to_datetime(new_index)
combined_df = combined_df.set_index(new_index)

In [None]:
combined_df.plot(figsize=(14,8), title='Bitstamp & Coinbase', color=['tomato', 'mediumblue'])

In [None]:
# Using the loc and plot functions, create an overlay plot that visualizes 
# the price action of both DataFrames for a one month period early in the dataset
# Be sure to include the parameters: legend, figure size, title, and color and label
new_index = combined_df.index
new_index = pd.to_datetime(new_index)
combined_df = combined_df.set_index(new_index)

In [None]:
# Using the loc and plot functions, create an overlay plot that visualizes 
# the price action of both DataFrames for a one month period later in the dataset
# Be sure to include the parameters: legend, figure size, title, and color and label 
combined_df["03/2018"].plot(figsize=(14, 8), title='Bitstamp & Coinbase', color=['tomato', 'mediumblue'])

In [None]:
combined_df["01/2018"].plot(figsize=(14, 8), title='Bitstamp & Coinbase', color=['tomato', 'mediumblue'])

**Question** Based on the visualizations of the different time periods, has the degree of spread change as time progressed?

**Answer** Yes, in January we can see higher level of spread change than in March.

### Step 3: Focus Your Analysis on Specific Dates

Focus your analysis on specific dates by completing the following steps:

1. Select three dates to evaluate for arbitrage profitability. Choose one date that’s early in the dataset, one from the middle of the dataset, and one from the later part of the time period.

2. For each of the three dates, generate the summary statistics and then create a box plot. This big-picture view is meant to help you gain a better understanding of the data before you perform your arbitrage calculations. As you compare the data, what conclusions can you draw?

In [None]:
# Create an overlay plot that visualizes the two dataframes over a period of one day early in the dataset. 
# Be sure that the plots include the parameters `legend`, `figsize`, `title`, `color` and `label` 
combined_df["01/10/2018"].plot(figsize=(14, 8), title='Bitstamp & Coinbase', color=['tomato', 'mediumblue'])

In [None]:
# Using the early date that you have selected, calculate the arbitrage spread 
# by subtracting the bitstamp lower closing prices from the coinbase higher closing prices
arbitrage_spread_early = combined_df.loc['01/10/2018', 'Close_Bitstamp'] - combined_df.loc['01/10/2018', 'Close_Coinbase']

# Generate summary statistics for the early DataFrame
arbitrage_spread_early = arbitrage_spread_early.to_frame()
arbitrage_spread_early.describe()

In [None]:
# Visualize the arbitrage spread from early in the dataset in a box plot
arbitrage_spread_early.plot(kind='box')

In [None]:
# Create an overlay plot that visualizes the two dataframes over a period of one day from the middle of the dataset. 
# Be sure that the plots include the parameters `legend`, `figsize`, `title`, `color` and `label` 
combined_df["02/10/2018"].plot(figsize=(14, 8), title='February 10, 2018', color=['tomato', 'mediumblue'])

In [None]:
# Using the date in the middle that you have selected, calculate the arbitrage spread 
# by subtracting the bitstamp lower closing prices from the coinbase higher closing prices
arbitrage_spread_middle = combined_df.loc['02/10/2018', 'Close_Bitstamp'] - combined_df.loc['02/10/2018', 'Close_Coinbase']


# Generate summary statistics 
arbitrage_spread_middle = arbitrage_spread_middle.to_frame()
arbitrage_spread_middle.describe()

In [None]:
# Visualize the arbitrage spread from the middle of the dataset in a box plot
arbitrage_spread_middle.plot(kind='box')

In [None]:
# Create an overlay plot that visualizes the two dataframes over a period of one day from late in the dataset. 
# Be sure that the plots include the parameters `legend`, `figsize`, `title`, `color` and `label` 
combined_df["06//2018"].plot(figsize=(30, 20), title='March 10', color=['tomato', 'mediumblue'])

In [None]:
# Using the date from the late that you have selected, calculate the arbitrage spread 
# by subtracting the bitstamp lower closing prices from the coinbase higher closing prices
arbitrage_spread_late = combined_df.loc['03/10/2018', 'Close_Bitstamp'] - combined_df.loc['03/10/2018', 'Close_Coinbase']


# Generate summary statistics 
arbitrage_spread_late = arbitrage_spread_late.to_frame()
arbitrage_spread_late.describe()

In [None]:
# Visualize the arbitrage spread from late in the dataset in a box plot
arbitrage_spread_late.plot(kind='box')

### Step 4: Calculate the Arbitrage Profits

Calculate the potential profits for each date that you selected in the previous section. Your goal is to determine whether arbitrage opportunities still exist in the Bitcoin market. Complete the following steps:

1. For each of the three dates, measure the arbitrage spread between the two exchanges by subtracting the lower-priced exchange from the higher-priced one. Then use a conditional statement to generate the summary statistics for each arbitrage_spread DataFrame, where the spread is greater than zero.

2. For each of the three dates, calculate the spread returns. To do so, divide the instances that have a positive arbitrage spread (that is, a spread greater than zero) by the price of Bitcoin from the exchange you’re buying on (that is, the lower-priced exchange). Review the resulting DataFrame.

3. For each of the three dates, narrow down your trading opportunities even further. To do so, determine the number of times your trades with positive returns exceed the 1% minimum threshold that you need to cover your costs.

4. Generate the summary statistics of your spread returns that are greater than 1%. How do the average returns compare among the three dates?

5. For each of the three dates, calculate the potential profit, in dollars, per trade. To do so, multiply the spread returns that were greater than 1% by the cost of what was purchased. Make sure to drop any missing values from the resulting DataFrame.

6. Generate the summary statistics, and plot the results for each of the three DataFrames.

7. Calculate the potential arbitrage profits that you can make on each day. To do so, sum the elements in the profit_per_trade DataFrame.

8. Using the `cumsum` function, plot the cumulative sum of each of the three DataFrames. Can you identify any patterns or trends in the profits across the three time periods?

(NOTE: The starter code displays only one date. You'll want to do this analysis for two additional dates).

#### 1. For each of the three dates, measure the arbitrage spread between the two exchanges by subtracting the lower-priced exchange from the higher-priced one. Then use a conditional statement to generate the summary statistics for each arbitrage_spread DataFrame, where the spread is greater than zero.

*NOTE*: For illustration, only one of the three dates is shown in the starter code below.

In [None]:
time_index = pd.to_datetime(bitstamp.index)
bitstamp = bitstamp.set_index(time_index)

In [None]:
# For the date early in the dataset, measure the arbitrage spread between the two exchanges
# by subtracting the lower-priced exchange from the higher-priced one
arbitrage_spread_early = bitstamp.loc['01/10/2018', 'High'] - bitstamp.loc['01/10/2018', 'Low']
arbitrage_spread_middle = bitstamp.loc['02/10/2018', 'High'] - bitstamp.loc['02/10/2018', 'Low']
arbitrage_spread_late = bitstamp.loc['03/10/2018', 'High'] - bitstamp.loc['03/10/2018', 'Low']



# Use a conditional statement to generate the summary statistics for each arbitrage_spread DataFrame
arbitrage_spread_early[arbitrage_spread_early > 0].to_frame().describe()

In [None]:
arbitrage_spread_middle[arbitrage_spread_middle > 0].to_frame().describe()

In [None]:
arbitrage_spread_late[arbitrage_spread_late > 0].to_frame().describe()

#### 2. For each of the three dates, calculate the spread returns. To do so, divide the instances that have a positive arbitrage spread (that is, a spread greater than zero) by the price of Bitcoin from the exchange you’re buying on (that is, the lower-priced exchange). Review the resulting DataFrame.

In [None]:
# For the date early in the dataset, calculate the spread returns by dividing the instances when the arbitrage spread is positive (> 0) 
# by the price of Bitcoin from the exchange you are buying on (the lower-priced exchange).

# Review the spread return DataFrame
bitstamp['arbitrage spread'] = bitstamp['High'] - bitstamp['Low']
bitstamp['spread return'] = bitstamp['arbitrage spread'] / bitstamp['Low']

spread_return_early = bitstamp.loc['01/10/2018'].loc[bitstamp['arbitrage spread'] > 0, 'spread return']
spread_return_early


In [None]:
bitstamp['arbitrage spread'] = bitstamp['High'] - bitstamp['Low']
bitstamp['spread return'] = bitstamp['arbitrage spread'] / bitstamp['Low']

spread_return_middle = bitstamp['02/10/2018'].loc[bitstamp['arbitrage spread'] > 0, 'spread return']
spread_return_middle

In [None]:
bitstamp['arbitrage spread'] = bitstamp['High'] - bitstamp['Low']
bitstamp['spread return'] = bitstamp['arbitrage spread'] / bitstamp['Low']

spread_return_late = bitstamp.loc['03/10/2018'].loc[bitstamp['arbitrage spread'] > 0, 'spread return']
spread_return_late

#### 3. For each of the three dates, narrow down your trading opportunities even further. To do so, determine the number of times your trades with positive returns exceed the 1% minimum threshold that you need to cover your costs.

In [None]:
# For the date early in the dataset, determine the number of times your trades with positive returns 
# exceed the 1% minimum threshold (.01) that you need to cover your costs
profitable_trades_early = spread_return_early[spread_return_early > 0.01]
profitable_trades_middle = spread_return_middle[spread_return_middle > 0.01]
profitable_trades_late = spread_return_late[spread_return_late > 0.01]


# Review the first five profitable trades
profitable_trades_early.head(5)

#### 4. Generate the summary statistics of your spread returns that are greater than 1%. How do the average returns compare among the three dates?

In [None]:
# For the date early in the dataset, generate the summary statistics for the profitable trades
# or you trades where the spread returns are are greater than 1%
profitable_trades_early.to_frame().describe()

In [None]:
profitable_trades_middle.to_frame().describe()

In [None]:
profitable_trades_late.to_frame().describe()

#### 5. For each of the three dates, calculate the potential profit, in dollars, per trade. To do so, multiply the spread returns that were greater than 1% by the cost of what was purchased. Make sure to drop any missing values from the resulting DataFrame.

In [None]:
# For the date early in the dataset, calculate the potential profit per trade in dollars 
# Multiply the profitable trades by the cost of the Bitcoin that was purchased
profit_early = spread_return_early * bitstamp['BTC Volume']

# Drop any missing values from the profit DataFrame
profit_per_trade_early = profit_early.dropna()

# View the early profit DataFrame
profit_per_trade_early

In [None]:
profit_middle = spread_return_middle * bitstamp['BTC Volume']
profit_per_trade_middle = profit_middle.dropna()
profit_per_trade_middle

In [None]:
profit_late = spread_return_late * bitstamp['BTC Volume']
profit_per_trade_late = profit_late.dropna()
profit_per_trade_late

#### 6. Generate the summary statistics, and plot the results for each of the three DataFrames.

In [None]:
# Generate the summary statistics for the early profit per trade DataFrame
profit_per_trade_early.to_frame().describe()

In [None]:
# Plot the results for the early profit per trade DataFrame
profit_per_trade_early.plot(figsize=(10,8))

In [None]:
profit_per_trade_middle.to_frame().describe()

In [None]:
profit_per_trade_middle.plot(figsize=(10,8))

In [None]:
profit_per_trade_late.to_frame().describe()

In [None]:
profit_per_trade_late.plot(figsize=(10,8))

#### 7. Calculate the potential arbitrage profits that you can make on each day. To do so, sum the elements in the profit_per_trade DataFrame.

In [None]:
# Calculate the sum of the potential profits for the early profit per trade DataFrame
profit_per_trade_early.sum()

In [None]:
profit_per_trade_middle.sum()

In [None]:
profit_per_trade_late.sum()

#### 8. Using the `cumsum` function, plot the cumulative sum of each of the three DataFrames. Can you identify any patterns or trends in the profits across the three time periods?

In [None]:
# Use the cumsum function to calculate the cumulative profits over time for the early profit per trade DataFrame
cumulative_profit_early = profit_per_trade_early.cumsum()

In [None]:
# Plot the cumulative sum of profits for the early profit per trade DataFrame
cumulative_profit_early.plot()

In [None]:
cumulative_profit_middle = profit_per_trade_middle.cumsum()

In [None]:
cumulative_profit_middle.plot()

In [None]:
cumulative_profit_late = profit_per_trade_late.cumsum()

In [None]:
cumulative_profit_late.plot()

**Question:** After reviewing the profit information across each date from the different time periods, can you identify any patterns or trends?
    
**Answer:** From January to March, the profit income from buying Bitcoin decreased by more than 2 time. The best month for trade was January out of these 3.