# Imported Required Data and Libraries

In [1]:
import pandas as pd
df = pd.read_csv(
    filepath_or_buffer = '../Data/DOGE-USD.csv',
    index_col = 0,
    parse_dates = [0]
)
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-09-17,0.000293,0.000299,0.00026,0.000268,0.000268,1463600.0
2014-09-18,0.000268,0.000325,0.000267,0.000298,0.000298,2215910.0
2014-09-19,0.000298,0.000307,0.000275,0.000277,0.000277,883563.0
2014-09-20,0.000276,0.00031,0.000267,0.000292,0.000292,993004.0
2014-09-21,0.000293,0.000299,0.000284,0.000288,0.000288,539140.0


In [2]:
df.shape

(2501, 6)

We can see that we have 2501 data points and 7 features about each.

# Null values

In [3]:
df.apply(lambda l: l.isna().sum(), axis = 1).value_counts()

0    2497
6       4
dtype: int64

In the above code cell, we compute the number of NaN values across each data point.  
We group the same to see that we have 4 data points with 6 NaNs which means that we can get rid of those.  

In [4]:
df.dropna(inplace = True)
df.shape

(2497, 6)

That leaves us with 2497 data points.  

## Adj Close

In [5]:
(df.Close != df['Adj Close']).sum()

0

As Adj Close and Close are just duplicates of each other, we can drop Adj Close from our feature list.

In [6]:
df.drop(
    columns = 'Adj Close',
    inplace = True
)
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-09-17,0.000293,0.000299,0.00026,0.000268,1463600.0
2014-09-18,0.000268,0.000325,0.000267,0.000298,2215910.0
2014-09-19,0.000298,0.000307,0.000275,0.000277,883563.0
2014-09-20,0.000276,0.00031,0.000267,0.000292,993004.0
2014-09-21,0.000293,0.000299,0.000284,0.000288,539140.0


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2497 entries, 2014-09-17 to 2021-07-22
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Open    2497 non-null   float64
 1   High    2497 non-null   float64
 2   Low     2497 non-null   float64
 3   Close   2497 non-null   float64
 4   Volume  2497 non-null   float64
dtypes: float64(5)
memory usage: 117.0 KB


Now that we have suitable datatypes for all our features, we can export the cleaned dataset.

# Volume

In [8]:
df.drop(
    columns = 'Volume',
    inplace = True
)
df.head()

Unnamed: 0_level_0,Open,High,Low,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014-09-17,0.000293,0.000299,0.00026,0.000268
2014-09-18,0.000268,0.000325,0.000267,0.000298
2014-09-19,0.000298,0.000307,0.000275,0.000277
2014-09-20,0.000276,0.00031,0.000267,0.000292
2014-09-21,0.000293,0.000299,0.000284,0.000288



OHLC analysis is a stratergy devised to forecast and aquire a prediction framework for open market data. The 'Volume' feature shows the amount traded in a single day and is innaplicable to OHLC analysis. 

# Clean Dataset

In [9]:
df.to_csv(
    path_or_buf = '../Data/Clean_DogeCoin.csv', 
    index = True
)

We store the dataset back to the Data folder so that we can import and use the same for the ML model.