<a href="https://colab.research.google.com/github/filipchudzynski/stock-market-non-gaussianity-analyzer_v2/blob/main/basic_analysis_of_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# get the data from www.histdata.com

In [1]:
! pip install selenium pandas openpyxl webdriver-manager

Collecting selenium
  Downloading selenium-4.40.0-py3-none-any.whl.metadata (7.7 kB)
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl.metadata (12 kB)
Collecting trio<1.0,>=0.31.0 (from selenium)
  Downloading trio-0.32.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket<1.0,>=0.12.2 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting trio-typing>=0.10.0 (from selenium)
  Downloading trio_typing-0.10.0-py3-none-any.whl.metadata (10 kB)
Collecting types-certifi>=2021.10.8.3 (from selenium)
  Downloading types_certifi-2021.10.8.3-py3-none-any.whl.metadata (1.4 kB)
Collecting types-urllib3>=1.26.25.14 (from selenium)
  Downloading types_urllib3-1.26.25.14-py3-none-any.whl.metadata (1.7 kB)
Collecting urllib3<3.0,>=2.6.3 (from urllib3[socks]<3.0,>=2.6.3->selenium)
  Downloading urllib3-2.6.3-py3-none-any.whl.metadata (6.9 kB)
Collecting sortedcontainers (from trio<1.0,>=0.31.0->selenium)
  Downl

In [2]:
!apt-get update
!apt-get install -y wget unzip
!wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
!apt-get install -y ./google-chrome-stable_current_amd64.deb


0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (91.189.92.23)] [Co                                                                               Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:3 https://cli.github.com/packages stable InRelease [3,917 B]
Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:8 https://cli.github.com/packages stable/main amd64 Packages [356 B]
Get:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]
Get:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease [24.6 kB]
Get:11 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packa

In [3]:
# run in terminal !CHROME_VERSION=$(google-chrome --version | grep -oP '\d+\.\d+\.\d+')


!wget -q "https://storage.googleapis.com/chrome-for-testing-public/144.0.7559.132/linux64/chromedriver-linux64.zip" -O chromedriver.zip
!unzip chromedriver.zip
!mv chromedriver-linux64/chromedriver /usr/local/bin/
!chmod +x /usr/local/bin/chromedriver


Archive:  chromedriver.zip
  inflating: chromedriver-linux64/LICENSE.chromedriver  
  inflating: chromedriver-linux64/THIRD_PARTY_NOTICES.chromedriver  
  inflating: chromedriver-linux64/chromedriver  


In [4]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from pathlib import Path

def get_driver(download_dir="/content/downloads"):
    Path(download_dir).mkdir(exist_ok=True)

    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")

    prefs = {
        "download.default_directory": download_dir,
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing.enabled": True,
    }
    options.add_experimental_option("prefs", prefs)

    return webdriver.Chrome(options=options)



In [5]:
import time
import zipfile
from pathlib import Path
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

DOWNLOAD_DIR = "/content/downloads"
BASE_URL = "https://www.histdata.com/download-free-forex-historical-data/?/excel/1-minute-bar-quotes/spxusd/{year}"

driver = get_driver(DOWNLOAD_DIR)  # reuse the headless driver you already defined

def download_year(year):
    print(f"Processing {year}...")
    driver.get(BASE_URL.format(year=year))


    time.sleep(2)

    # Debug: show what page we actually loaded
    print("Current URL:", driver.current_url)
    print("Page title:", driver.title)


    # 1. Close cookie banner if present
    try:
        accept_btn = WebDriverWait(driver, 5).until(
            EC.element_to_be_clickable((By.ID, "cookie_action_close_header"))
        )
        accept_btn.click()
        time.sleep(1)
        print("Cookie banner dismissed")
    except:
        print("No cookie banner found")

    # Extract the visible year from the page
    try:
        year_text = driver.find_element(By.XPATH, "//p[b[contains(text(),'Year/Month')]]").text
        print("Page shows:", year_text)
    except:
        print("Could not read Year/Month from page")

    # Extract the download filename text
    try:
        file_label = driver.find_element(By.ID, "a_file").text
        print("Download link text:", file_label)
    except:
        print("Could not read download link text")

    # Now proceed with clicking
    try:
        link = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.ID, "a_file"))
        )
    except Exception as e:
        print(f"Could not find download link for {year}: {e}")
        return None

    # 3. Click the download link
    link.click()
    print("Clicked download link")

    # 4. Wait for ZIP to appear
    zip_path = Path(DOWNLOAD_DIR) / f"{year}.zip"
    for _ in range(40):
        for f in Path(DOWNLOAD_DIR).glob(f"*{year}.zip"):
            print(f"{f} found")
            f.rename(zip_path)
            print("Downloaded ZIP →", zip_path)
            return zip_path
        time.sleep(1)

    print("Failed to download", year)
    return None


def extract_and_convert(zip_path):
    year = zip_path.stem
    with zipfile.ZipFile(zip_path, "r") as z:
        xlsx_name = next(n for n in z.namelist() if n.lower().endswith(".xlsx"))
        xlsx_path = Path(DOWNLOAD_DIR) / f"{year}.xlsx"
        z.extract(xlsx_name, DOWNLOAD_DIR)
        (Path(DOWNLOAD_DIR) / xlsx_name).rename(xlsx_path)
        print("Extracted XLSX →", xlsx_path)

    df = pd.read_excel(xlsx_path)
    display(df.head())
    csv_path = Path(DOWNLOAD_DIR) / f"{year}.csv"
    df.to_csv(csv_path, index=False)
    print("Converted to CSV →", csv_path)

# run it
for year in range(2020, 2025):
    zip_file = download_year(year)
    if zip_file:
        extract_and_convert(zip_file)

driver.quit()


Processing 2020...
Current URL: https://www.histdata.com/download-free-forex-historical-data/?/excel/1-minute-bar-quotes/spxusd/2020
Page title: Download Free Forex Historical Data – HistData.com
Cookie banner dismissed
Page shows: Year/Month: 2020
 
Download link text: HISTDATA_COM_XLSX_SPXUSD_M1_2020.zip
Clicked download link
/content/downloads/HISTDATA_COM_XLSX_SPXUSD_M12020.zip found
Downloaded ZIP → /content/downloads/2020.zip
Extracted XLSX → /content/downloads/2020.xlsx


  warn("Workbook contains no default style, apply openpyxl's default")


Unnamed: 0,2020-01-01 18:00:00,3234.552,3235.537,3234.534,3234.837,0
0,2020-01-01 18:01:00,3234.837,3234.837,3233.337,3233.337,0
1,2020-01-01 18:02:00,3233.337,3233.852,3233.334,3233.74,0
2,2020-01-01 18:03:00,3233.74,3234.6,3233.552,3234.346,0
3,2020-01-01 18:04:00,3234.346,3234.352,3233.852,3233.855,0
4,2020-01-01 18:05:00,3233.855,3234.1,3233.837,3233.9,0


Converted to CSV → /content/downloads/2020.csv
Processing 2021...
Current URL: https://www.histdata.com/download-free-forex-historical-data/?/excel/1-minute-bar-quotes/spxusd/2021
Page title: Download Free Forex Historical Data – HistData.com
No cookie banner found
Page shows: Year/Month: 2021
 
Download link text: HISTDATA_COM_XLSX_SPXUSD_M1_2021.zip
Clicked download link
/content/downloads/HISTDATA_COM_XLSX_SPXUSD_M12021.zip found
Downloaded ZIP → /content/downloads/2021.zip
Extracted XLSX → /content/downloads/2021.xlsx


  warn("Workbook contains no default style, apply openpyxl's default")


Unnamed: 0,2021-01-03 18:00:00,3758.942,3762.397,3756.854,3757.954,0
0,2021-01-03 18:01:00,3758.137,3759.436,3756.231,3756.254,0
1,2021-01-03 18:02:00,3756.231,3756.231,3749.936,3752.242,0
2,2021-01-03 18:03:00,3752.446,3754.754,3751.139,3752.636,0
3,2021-01-03 18:04:00,3752.631,3754.297,3752.434,3753.433,0
4,2021-01-03 18:05:00,3753.449,3754.242,3752.942,3753.436,0


Converted to CSV → /content/downloads/2021.csv
Processing 2022...
Current URL: https://www.histdata.com/download-free-forex-historical-data/?/excel/1-minute-bar-quotes/spxusd/2022
Page title: Download Free Forex Historical Data – HistData.com
No cookie banner found
Page shows: Year/Month: 2022
 
Download link text: HISTDATA_COM_XLSX_SPXUSD_M1_2022.zip
Clicked download link
/content/downloads/HISTDATA_COM_XLSX_SPXUSD_M12022.zip found
Downloaded ZIP → /content/downloads/2022.zip
Extracted XLSX → /content/downloads/2022.xlsx


  warn("Workbook contains no default style, apply openpyxl's default")


Unnamed: 0,2022-01-02 18:00:00,4779.636,4785.539,4779.636.1,4781.251,0
0,2022-01-02 18:01:00,4781.136,4784.099,4780.133,4782.736,0
1,2022-01-02 18:02:00,4782.836,4784.242,4781.342,4781.348,0
2,2022-01-02 18:03:00,4781.648,4782.151,4780.142,4780.242,0
3,2022-01-02 18:04:00,4780.145,4781.699,4779.836,4781.633,0
4,2022-01-02 18:05:00,4781.233,4782.699,4781.133,4781.854,0


Converted to CSV → /content/downloads/2022.csv
Processing 2023...
Current URL: https://www.histdata.com/download-free-forex-historical-data/?/excel/1-minute-bar-quotes/spxusd/2023
Page title: Download Free Forex Historical Data – HistData.com
No cookie banner found
Page shows: Year/Month: 2023
 
Download link text: HISTDATA_COM_XLSX_SPXUSD_M1_2023.zip
Clicked download link
/content/downloads/HISTDATA_COM_XLSX_SPXUSD_M12023.zip found
Downloaded ZIP → /content/downloads/2023.zip
Extracted XLSX → /content/downloads/2023.xlsx


  warn("Workbook contains no default style, apply openpyxl's default")


Unnamed: 0,2023-01-02 18:00:00,3872.998,3877.176,3863.86,3865.983,0
0,2023-01-02 18:01:00,3866.128,3867.372,3865.378,3865.98,0
1,2023-01-02 18:02:00,3865.878,3866.878,3865.36,3865.881,0
2,2023-01-02 18:03:00,3865.742,3865.742,3862.86,3863.613,0
3,2023-01-02 18:04:00,3863.363,3863.363,3860.742,3860.878,0
4,2023-01-02 18:05:00,3861.113,3864.378,3860.875,3864.119,0


Converted to CSV → /content/downloads/2023.csv
Processing 2024...
Current URL: https://www.histdata.com/download-free-forex-historical-data/?/excel/1-minute-bar-quotes/spxusd/2024
Page title: Download Free Forex Historical Data – HistData.com
No cookie banner found
Page shows: Year/Month: 2024
 
Download link text: HISTDATA_COM_XLSX_SPXUSD_M1_2024.zip
Clicked download link
Failed to download 2024


# data quality

In [6]:
import pandas as pd
from pathlib import Path

DATA_DIR = Path("/content/downloads")

csv_files = sorted(DATA_DIR.glob("*.csv"))
dfs = []
print(csv_files)
for f in csv_files:
    df = pd.read_csv(f, header=None)
    df.columns = [ "timestamp", "open", "high", "low", "close","vol" ]
    dfs.append(df)

# data = pd.concat(dfs, ignore_index=True)
# print("Loaded rows:", len(data))
for df in dfs:
  display(df.head())


[PosixPath('/content/downloads/2020.csv'), PosixPath('/content/downloads/2021.csv'), PosixPath('/content/downloads/2022.csv'), PosixPath('/content/downloads/2023.csv')]


  df = pd.read_csv(f, header=None)


Unnamed: 0,timestamp,open,high,low,close,vol
0,2020-01-01 18:00:00,3234.552,3235.537,3234.534,3234.837,0
1,2020-01-01 18:01:00,3234.837,3234.837,3233.337,3233.337,0
2,2020-01-01 18:02:00,3233.337,3233.852,3233.334,3233.74,0
3,2020-01-01 18:03:00,3233.74,3234.6,3233.552,3234.346,0
4,2020-01-01 18:04:00,3234.346,3234.352,3233.852,3233.855,0


Unnamed: 0,timestamp,open,high,low,close,vol
0,2021-01-03 18:00:00,3758.942,3762.397,3756.854,3757.954,0
1,2021-01-03 18:01:00,3758.137,3759.436,3756.231,3756.254,0
2,2021-01-03 18:02:00,3756.231,3756.231,3749.936,3752.242,0
3,2021-01-03 18:03:00,3752.446,3754.754,3751.139,3752.636,0
4,2021-01-03 18:04:00,3752.631,3754.297,3752.434,3753.433,0


Unnamed: 0,timestamp,open,high,low,close,vol
0,2022-01-02 18:00:00,4779.636,4785.539,4779.636.1,4781.251,0
1,2022-01-02 18:01:00,4781.136,4784.099,4780.133,4782.736,0
2,2022-01-02 18:02:00,4782.836,4784.242,4781.342,4781.348,0
3,2022-01-02 18:03:00,4781.648,4782.151,4780.142,4780.242,0
4,2022-01-02 18:04:00,4780.145,4781.699,4779.836,4781.633,0


Unnamed: 0,timestamp,open,high,low,close,vol
0,2023-01-02 18:00:00,3872.998,3877.176,3863.86,3865.983,0
1,2023-01-02 18:01:00,3866.128,3867.372,3865.378,3865.98,0
2,2023-01-02 18:02:00,3865.878,3866.878,3865.36,3865.881,0
3,2023-01-02 18:03:00,3865.742,3865.742,3862.86,3863.613,0
4,2023-01-02 18:04:00,3863.363,3863.363,3860.742,3860.878,0


In [26]:
test_df = dfs[0]
test_row = { "timestamp": "2020-01-01 00:00:00", "open":np.nan , "high": 101.0, "low": 99.5, "close": 100.5, "vol": 123 }
test_df = pd.concat([df, pd.DataFrame([test_row])], ignore_index=True)
print("Display column, which has nan")
display(test_df.isna().any(axis=0))
print("Display row which has nan")
test_df[test_df.isna().any(axis=1)]

Display column, which has nan


Unnamed: 0,0
timestamp,False
open,True
high,False
low,False
close,False
vol,False


Display row which has nan


Unnamed: 0,timestamp,open,high,low,close,vol
291440,2020-01-01 00:00:00,,101.0,99.5,100.5,123


## nans

In [36]:
for df in dfs:
  for val_to_check in [0,np.nan]:
    print(f"checking {val_to_check}")
    faulty_rows = df.eq(val_to_check).any()
    if faulty_rows.any():
      display(faulty_rows)
      print(f"{len(df[df.eq(val_to_check).any(axis=1)])} out of {len(df)} are {val_to_check}")
    else:
      print(f"no {val_to_check} found")



checking 0


Unnamed: 0,0
timestamp,False
open,False
high,False
low,False
close,False
vol,True


334261 out of 334261 are 0
checking nan
no nan found
checking 0


Unnamed: 0,0
timestamp,False
open,False
high,False
low,False
close,False
vol,True


333588 out of 333588 are 0
checking nan
no nan found
checking 0


Unnamed: 0,0
timestamp,False
open,False
high,False
low,False
close,False
vol,True


341671 out of 341671 are 0
checking nan
no nan found
checking 0


Unnamed: 0,0
timestamp,False
open,False
high,False
low,False
close,False
vol,True


291440 out of 291440 are 0
checking nan
no nan found


## zeros

In [47]:

def show_duplicate_timestamp_groups(df, max_groups=5):
    if "timestamp" not in df.columns:
        print("No timestamp column found.")
        return

    # Ensure proper datetime parsing
    df = df.copy()
    df["timestamp"] = pd.to_datetime(df["timestamp"])

    # Count occurrences
    counts = df["timestamp"].value_counts()

    # Keep only timestamps that appear more than once
    dup_ts = counts[counts > 1].index

    if len(dup_ts) == 0:
        print("No duplicate timestamps found.")
        return

    print(f"Found {len(dup_ts)} duplicated timestamps.")
    print(f"Showing up to {max_groups} groups:\n")

    shown = 0
    for ts in dup_ts:
        group = df[df["timestamp"] == ts].sort_values("timestamp")

        print(f"--- Duplicate group for timestamp: {ts} ---")
        display(group)

        shown += 1
        if shown >= max_groups:
            break

def detect_gaps(df, expected_freq="1min"):
    """
    Detect missing timestamps in a 1‑minute OHLCV dataset.
    Reports:
      - number of gaps
      - average gap length
      - longest gap length
      - example gap (first)
      - longest gap (before/after rows)
      - list of all dates where gaps occur (collapsed into ranges)
    """

    # Ensure timestamp is datetime and sorted
    df = df.copy()
    df["timestamp"] = pd.to_datetime(df["timestamp"])
    df = df.sort_values("timestamp")

    expected_delta = pd.Timedelta(expected_freq)
    actual_delta = df["timestamp"].diff()

    # Boolean mask of gaps
    gap_mask = actual_delta > expected_delta

    if not gap_mask.any():
        print("- No timestamp gaps detected.")
        return

    # Extract gap sizes
    gap_sizes = actual_delta[gap_mask]
    num_gaps = len(gap_sizes)
    avg_gap = gap_sizes.mean()
    longest_gap = gap_sizes.max()

    print(f"- Number of gaps: {num_gaps}")
    print(f"- Average gap length: {avg_gap}")
    print(f"- Longest gap: {longest_gap}")

    # -----------------------------
    # Example gap (first occurrence)
    # -----------------------------
    first_gap_idx = gap_sizes.index[0]
    before_first = df.loc[first_gap_idx - 1]
    after_first = df.loc[first_gap_idx]

    print("\n  Example gap (first occurrence):")
    print(f"  Gap from {before_first['timestamp']} → {after_first['timestamp']}")
    display(pd.DataFrame([before_first, after_first]))

    # -----------------------------
    # Longest gap
    # -----------------------------
    longest_gap_idx = gap_sizes.idxmax()
    before_longest = df.loc[longest_gap_idx - 1]
    after_longest = df.loc[longest_gap_idx]

    print("\n  Longest gap:")
    print(f"  Gap from {before_longest['timestamp']} → {after_longest['timestamp']}")
    display(pd.DataFrame([before_longest, after_longest]))

    # -----------------------------
    # List all dates where gaps occur
    # -----------------------------
    gap_dates = df.loc[gap_mask, "timestamp"].dt.date

    # Collapse into ranges
    ranges = []
    start = prev = None

    for d in gap_dates:
        if start is None:
            start = prev = d
        elif (d - prev).days == 1:
            prev = d
        else:
            ranges.append((start, prev))
            start = prev = d

    if start is not None:
        ranges.append((start, prev))

    print("\n  Dates with gaps:")
    to_print = []
    for s, e in ranges:
        if s == e:
            to_print.append(f"  • {s}")
        else:
            to_print.append(f"  • {s} – {e}")
    print(to_print)



def column_issue_stats(df, mask, issue_name, max_examples=5):
    """
    df: DataFrame
    mask: boolean DataFrame (same shape as df) where True marks the issue
    issue_name: string label ('NaN', 'zero', 'negative', etc.)
    """
    total_rows = len(df)
    cols_with_issue = {}

    for col in df.columns:
        col_mask = mask[col]
        count = col_mask.sum()
        if count > 0:
            cols_with_issue[col] = {
                "count": int(count),
                "percent_rows": float(count / total_rows * 100.0),
                "examples": df[col_mask].head(max_examples)
            }

    if not cols_with_issue:
        print(f"- No {issue_name} issues found.")
        return

    print(f"- Columns with {issue_name} issues:")
    for col, stats in cols_with_issue.items():
        print(
            f"    • {col}: {stats['count']} rows "
            f"({stats['percent_rows']:.4f}% of all rows)"
        )
        print("      Example rows:")
        display(stats["examples"])


def deep_quality_report(df, name="DataFrame"):
    print(f"\n========== Quality Report for {name} ==========")
    total_rows = len(df)
    print(f"Total rows: {total_rows}")

    # 1. NaNs
    print("\n[NaNs]")
    nan_mask = df.isna()
    column_issue_stats(df, nan_mask, "NaN")

    # 2. Zeros (numeric columns only)
    print("\n[Zeros]")
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    zero_mask = df[numeric_cols].eq(0)
    zero_mask = zero_mask.reindex(columns=df.columns, fill_value=False)
    column_issue_stats(df, zero_mask, "zero")

    # 3. Negative values (numeric columns only)
    print("\n[Negative values]")
    neg_mask = df[numeric_cols].lt(0)
    neg_mask = neg_mask.reindex(columns=df.columns, fill_value=False)
    column_issue_stats(df, neg_mask, "negative")

    # 4. Duplicate timestamps
    print("\n[Duplicate timestamps]")
    if "timestamp" in df.columns:
        dup_mask = df["timestamp"].duplicated(keep=False)
        dup_count = dup_mask.sum()

        if dup_count > 0:
            print(
                f"- {dup_count} duplicated timestamp rows "
                f"({dup_count/total_rows*100:.4f}% of all rows)"
            )
            print("  Example duplicates:")
            show_duplicate_timestamp_groups(df)
        else:
            print("- No duplicate timestamps found.")
    else:
        print("- 'timestamp' column not found.")

    # 5 gaps
    print("\n[Timestamp Gaps]")
    detect_gaps(df)


    print("============================================\n")


for i, df in enumerate(dfs):
  deep_quality_report(df, name=f"df[{i}]")


Total rows: 334261

[NaNs]
- No NaN issues found.

[Zeros]
- Columns with zero issues:
    • vol: 334261 rows (100.0000% of all rows)
      Example rows:


Unnamed: 0,timestamp,open,high,low,close,vol
0,2020-01-01 18:00:00,3234.552,3235.537,3234.534,3234.837,0
1,2020-01-01 18:01:00,3234.837,3234.837,3233.337,3233.337,0
2,2020-01-01 18:02:00,3233.337,3233.852,3233.334,3233.74,0
3,2020-01-01 18:03:00,3233.74,3234.6,3233.552,3234.346,0
4,2020-01-01 18:04:00,3234.346,3234.352,3233.852,3233.855,0



[Negative values]
- No negative issues found.

[Duplicate timestamps]
- 118 duplicated timestamp rows (0.0353% of all rows)
  Example duplicates:
Found 59 duplicated timestamps.
Showing up to 5 groups:

--- Duplicate group for timestamp: 2020-10-25 19:25:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
273153,2020-10-25 19:25:00,3446.442,3446.599,3445.533,3446.248,0
273212,2020-10-25 19:25:00,3446.442,3446.599,3445.533,3446.248,0


--- Duplicate group for timestamp: 2020-10-25 19:26:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
273154,2020-10-25 19:26:00,3446.236,3446.236,3445.248,3445.542,0
273213,2020-10-25 19:26:00,3446.236,3446.236,3445.248,3445.542,0


--- Duplicate group for timestamp: 2020-10-25 19:27:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
273155,2020-10-25 19:27:00,3445.539,3445.551,3444.539,3445.039,0
273214,2020-10-25 19:27:00,3445.539,3445.551,3444.539,3445.039,0


--- Duplicate group for timestamp: 2020-10-25 19:28:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
273156,2020-10-25 19:28:00,3445.039,3445.039,3444.242,3444.736,0
273215,2020-10-25 19:28:00,3445.039,3445.039,3444.242,3444.736,0


--- Duplicate group for timestamp: 2020-10-25 19:29:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
273157,2020-10-25 19:29:00,3444.754,3445.099,3444.533,3445.045,0
273216,2020-10-25 19:29:00,3444.754,3445.099,3444.533,3445.045,0



[Timestamp Gaps]
- Number of gaps: 5283
- Average gap length: 0 days 00:37:12.538330494
- Longest gap: 3 days 04:46:00

  Example gap (first occurrence):
  Gap from 2020-01-01 18:16:00 → 2020-01-01 18:18:00


Unnamed: 0,timestamp,open,high,low,close,vol
16,2020-01-01 18:16:00,3237.555,3237.555,3237.337,3237.337,0
17,2020-01-01 18:18:00,3237.337,3238.034,3237.337,3238.034,0



  Longest gap:
  Gap from 2020-12-24 13:14:00 → 2020-12-27 18:00:00


Unnamed: 0,timestamp,open,high,low,close,vol
329254,2020-12-24 13:14:00,3703.146,3703.354,3702.133,3702.145,0
329255,2020-12-27 18:00:00,3691.046,3696.439,3688.248,3695.451,0



  Dates with gaps:
['  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 2020-01-01', '  • 20

Unnamed: 0,timestamp,open,high,low,close,vol
0,2021-01-03 18:00:00,3758.942,3762.397,3756.854,3757.954,0
1,2021-01-03 18:01:00,3758.137,3759.436,3756.231,3756.254,0
2,2021-01-03 18:02:00,3756.231,3756.231,3749.936,3752.242,0
3,2021-01-03 18:03:00,3752.446,3754.754,3751.139,3752.636,0
4,2021-01-03 18:04:00,3752.631,3754.297,3752.434,3753.433,0



[Negative values]
- No negative issues found.

[Duplicate timestamps]
- 120 duplicated timestamp rows (0.0360% of all rows)
  Example duplicates:
Found 60 duplicated timestamps.
Showing up to 5 groups:

--- Duplicate group for timestamp: 2021-10-31 19:14:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
276520,2021-10-31 19:14:00,4618.945,4619.954,4617.942,4619.633,0
276580,2021-10-31 19:14:00,4618.945,4619.954,4617.942,4619.633,0


--- Duplicate group for timestamp: 2021-10-31 19:13:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
276519,2021-10-31 19:13:00,4619.242,4619.648,4618.933,4619.454,0
276579,2021-10-31 19:13:00,4619.242,4619.648,4618.933,4619.454,0


--- Duplicate group for timestamp: 2021-10-31 19:12:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
276518,2021-10-31 19:12:00,4619.136,4619.154,4618.936,4619.136,0
276578,2021-10-31 19:12:00,4619.136,4619.154,4618.936,4619.136,0


--- Duplicate group for timestamp: 2021-10-31 19:11:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
276517,2021-10-31 19:11:00,4618.642,4619.445,4618.642,4618.933,0
276577,2021-10-31 19:11:00,4618.642,4619.445,4618.642,4618.933,0


--- Duplicate group for timestamp: 2021-10-31 19:10:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
276516,2021-10-31 19:10:00,4618.499,4619.139,4618.454,4618.948,0
276576,2021-10-31 19:10:00,4618.499,4619.139,4618.454,4618.948,0



[Timestamp Gaps]
- Number of gaps: 7455
- Average gap length: 0 days 00:26:10.229376257
- Longest gap: 3 days 01:47:00

  Example gap (first occurrence):
  Gap from 2021-01-03 18:10:00 → 2021-01-03 18:12:00


Unnamed: 0,timestamp,open,high,low,close,vol
10,2021-01-03 18:10:00,3757.454,3759.736,3755.799,3756.236,0
11,2021-01-03 18:12:00,3756.731,3757.948,3756.731,3757.442,0



  Longest gap:
  Gap from 2021-12-23 16:13:00 → 2021-12-26 18:00:00


Unnamed: 0,timestamp,open,high,low,close,vol
327053,2021-12-23 16:13:00,4725.239,4725.254,4724.536,4725.099,0
327054,2021-12-26 18:00:00,4728.939,4734.299,4727.636,4730.054,0



  Dates with gaps:
['  • 2021-01-03', '  • 2021-01-03', '  • 2021-01-03', '  • 2021-01-03', '  • 2021-01-03', '  • 2021-01-03', '  • 2021-01-03 – 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04', '  • 2021-01-04 – 2021-01-05', '  • 2021-01-05 – 2021-01-06', '  • 2021-01-06 – 2021-01-07', '  • 2021-01-07', '  • 2021-01-10 – 2021-01-11', '  • 2021-01-11', '  • 2021-01-11', '  • 2021-01-11 – 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12', '  • 2021-01-12 – 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2021-01-13', '  • 2

Unnamed: 0,timestamp,open,high,low,close,vol
0,2022-01-02 18:00:00,4779.636,4785.539,4779.636.1,4781.251,0
1,2022-01-02 18:01:00,4781.136,4784.099,4780.133,4782.736,0
2,2022-01-02 18:02:00,4782.836,4784.242,4781.342,4781.348,0
3,2022-01-02 18:03:00,4781.648,4782.151,4780.142,4780.242,0
4,2022-01-02 18:04:00,4780.145,4781.699,4779.836,4781.633,0



[Negative values]
- No negative issues found.

[Duplicate timestamps]
- 120 duplicated timestamp rows (0.0351% of all rows)
  Example duplicates:
Found 60 duplicated timestamps.
Showing up to 5 groups:

--- Duplicate group for timestamp: 2022-10-30 19:25:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
283624,2022-10-30 19:25:00,3892.378,3893.118,3892.356,3893.118,0
283684,2022-10-30 19:25:00,3892.378,3893.118,3892.356,3893.118,0


--- Duplicate group for timestamp: 2022-10-30 19:24:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
283623,2022-10-30 19:24:00,3892.505,3893.115,3892.359,3892.628,0
283683,2022-10-30 19:24:00,3892.505,3893.115,3892.359,3892.628,0


--- Duplicate group for timestamp: 2022-10-30 19:23:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
283622,2022-10-30 19:23:00,3893.406,3893.406,3892.363,3892.622,0
283682,2022-10-30 19:23:00,3893.406,3893.406,3892.363,3892.622,0


--- Duplicate group for timestamp: 2022-10-30 19:22:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
283621,2022-10-30 19:22:00,3892.875,3894.368,3892.875,3893.362,0
283681,2022-10-30 19:22:00,3892.875,3894.368,3892.875,3893.362,0


--- Duplicate group for timestamp: 2022-10-30 19:21:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
283620,2022-10-30 19:21:00,3894.641,3894.641,3892.889,3893.128,0
283680,2022-10-30 19:21:00,3894.641,3894.641,3892.889,3893.128,0



[Timestamp Gaps]
- Number of gaps: 1234
- Average gap length: 0 days 02:26:30.729335494
- Longest gap: 3 days 01:47:00

  Example gap (first occurrence):
  Gap from 2022-01-02 23:25:00 → 2022-01-02 23:27:00


Unnamed: 0,timestamp,open,high,low,close,vol
325,2022-01-02 23:25:00,4786.199,4786.199,4785.833,4785.854,0
326,2022-01-02 23:27:00,4785.654,4785.654,4785.639,4785.639,0



  Longest gap:
  Gap from 2022-04-14 16:13:00 → 2022-04-17 18:00:00


Unnamed: 0,timestamp,open,high,low,close,vol
97962,2022-04-14 16:13:00,4390.236,4390.299,4390.036,4390.036,0
97963,2022-04-17 18:00:00,4386.454,4387.251,4386.233,4386.748,0



  Dates with gaps:
['  • 2022-01-02', '  • 2022-01-02', '  • 2022-01-02', '  • 2022-01-02 – 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03', '  • 2022-01-03 – 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04', '  • 2022-01-04 – 2022-01-05', '  • 2022-01-05', '  • 2022-01-05 – 2022-01-06', '  • 2022-01-06', '  • 2022-01-09', '  • 2022-01-09', '  • 2022-01-09 – 2022-01-10', '  • 2022-01-10 – 2022-01-11', '  • 2022-01-11', '  • 2022-01-11', '

Unnamed: 0,timestamp,open,high,low,close,vol
0,2023-01-02 18:00:00,3872.998,3877.176,3863.86,3865.983,0
1,2023-01-02 18:01:00,3866.128,3867.372,3865.378,3865.98,0
2,2023-01-02 18:02:00,3865.878,3866.878,3865.36,3865.881,0
3,2023-01-02 18:03:00,3865.742,3865.742,3862.86,3863.613,0
4,2023-01-02 18:04:00,3863.363,3863.363,3860.742,3860.878,0



[Negative values]
- No negative issues found.

[Duplicate timestamps]
- 120 duplicated timestamp rows (0.0412% of all rows)
  Example duplicates:
Found 60 duplicated timestamps.
Showing up to 5 groups:

--- Duplicate group for timestamp: 2023-10-29 19:33:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
235915,2023-10-29 19:33:00,4133.607,4133.616,4132.996,4133.231,0
235975,2023-10-29 19:33:00,4133.607,4133.616,4132.996,4133.231,0


--- Duplicate group for timestamp: 2023-10-29 19:32:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
235914,2023-10-29 19:32:00,4133.373,4133.752,4133.237,4133.746,0
235974,2023-10-29 19:32:00,4133.373,4133.752,4133.237,4133.746,0


--- Duplicate group for timestamp: 2023-10-29 19:31:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
235913,2023-10-29 19:31:00,4133.231,4133.746,4132.987,4133.499,0
235973,2023-10-29 19:31:00,4133.231,4133.746,4132.987,4133.499,0


--- Duplicate group for timestamp: 2023-10-29 19:30:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
235912,2023-10-29 19:30:00,4132.241,4132.993,4132.241,4132.987,0
235972,2023-10-29 19:30:00,4132.241,4132.993,4132.241,4132.987,0


--- Duplicate group for timestamp: 2023-10-29 19:29:00 ---


Unnamed: 0,timestamp,open,high,low,close,vol
235911,2023-10-29 19:29:00,4133.002,4133.002,4132.497,4132.557,0
235971,2023-10-29 19:29:00,4133.002,4133.002,4132.497,4132.557,0



[Timestamp Gaps]
- Number of gaps: 7003
- Average gap length: 0 days 00:33:36.472940168
- Longest gap: 3 days 01:47:00

  Example gap (first occurrence):
  Gap from 2023-01-03 16:14:00 → 2023-01-03 18:00:00


Unnamed: 0,timestamp,open,high,low,close,vol
1334,2023-01-03 16:14:00,3819.066,3819.271,3817.279,3817.781,0
1335,2023-01-03 18:00:00,3820.439,3820.439,3817.276,3818.2,0



  Longest gap:
  Gap from 2023-12-22 16:13:00 → 2023-12-25 18:00:00


Unnamed: 0,timestamp,open,high,low,close,vol
286645,2023-12-22 16:13:00,4751.023,4751.577,4751.023,4751.282,0
286646,2023-12-25 18:00:00,4751.949,4755.877,4751.814,4755.877,0



  Dates with gaps:
['  • 2023-01-03', '  • 2023-01-03', '  • 2023-01-03', '  • 2023-01-03', '  • 2023-01-03', '  • 2023-01-03 – 2023-01-04', '  • 2023-01-04', '  • 2023-01-04 – 2023-01-05', '  • 2023-01-05', '  • 2023-01-05', '  • 2023-01-05', '  • 2023-01-05 – 2023-01-06', '  • 2023-01-06', '  • 2023-01-08', '  • 2023-01-08', '  • 2023-01-08', '  • 2023-01-08', '  • 2023-01-08 – 2023-01-09', '  • 2023-01-09', '  • 2023-01-09', '  • 2023-01-09', '  • 2023-01-09 – 2023-01-10', '  • 2023-01-10', '  • 2023-01-10', '  • 2023-01-10', '  • 2023-01-10 – 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11', '  • 2023-01-11 – 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-12', '  • 2023-01-15', '  • 2023-01-15', '  • 2023-01-15', '  • 2