# 1.1 Create Labels

This is the first part of our complete workflow. Labels are created for the Bitcoin historical prices dataset.

At the end of the workflow, the dataset will contain the following columns:
- Date
- Open
- High
- Low
- Close
- Volume
- Market Cap
- is_profitable


In [1]:
# !pip install pandas

In [2]:
import pandas as pd

In [3]:
def is_profitable(row, threshold):
    if row['next_close'] >= (row['Close']*(threshold)):
        row['is_profitable'] = 'PROFITABLE'
    else:
        row['is_profitable'] = "UNPROFITABLE"
    return row

In [4]:
profitable_threshold = 1.02

## Import datasets

In [5]:
df = pd.read_csv('../original-datasets/bitcoin.csv')

## Process data for classification tasks

### Prepare historical prices dataset

In [6]:
df['Date'] = pd.to_datetime(df['Date'])
df.sort_values(by='Date', inplace=True)
df = df.reset_index(drop=True)

### Generate labels for each sample

In [7]:
df['next_close'] = df['Close'].shift(-1)
df = df.apply(is_profitable, axis=1, threshold=profitable_threshold)
df.drop('next_close', axis=1, inplace=True)

## Export datasets to csv

In [8]:
df.to_csv('../generated-datasets/classification-dataset.csv')